You have no idea what an average or exemplar instance

This preview shows page 47 - 50 out of 55 pages.

you have no idea what an “average” or “exemplar” instance from each class looks like o It does not build models explicitly SPSS Procedure for KNN - Analyze, classify, nearest neighbor o Specify which variables, number of neighbors, distance computation and save predicted value, probability, and partition variable Method 7: Ensemble Methods Ensemble Methods - Combining predictions from competing models often gives better predictive accuracy than individual models - Shown to be empirically successful in wide variety of applications o Several types of anomaly detections, in which you need strong predictions: fraud detection, malware detection, etc . - Steps: o 1) train multiple, separate models using the training data o 2) predict outcome for a previously unseen sample by aggregating predictions made by the multiple models - Useful for classification and regression o Classification aggregate predictions by voting o Regression aggregate predictions by averaging - Model types: o Heterogeneous ex. neural net combined with SVM combined decision tree combined with … o Homogeneous most common practice
Individual models referred to as base classifiers (or regressors) more robust than decision trees Ex. ensemble of 1000 decision trees Why does ensembles work? - Assumes classifiers are independent – base classifiers always have some significant degree of correlation in their predictions - Expected performance is guaranteed to be no worse than the average of the individual classifiers - The more uncorrelated the individual classifiers are, the better the ensemble Base classifiers – important properties - Diversity – lack of correlation o Predictions vary significantly between classifiers o Ways to create diverse base classifiers Ex. random sampling of points and/or features - Accuracy o Better than random - Computational fast o Usually need to compute large numbers of classifiers Ensemble methods – Increase the Accuracy - Use a combination of models to increase accuracy - Combine a series of k learned models, M1, M2, … Mk, with the aim of creating an improved model M - Popular ensemble methods o Bagging – averaging the prediction over a collection of classifiers o Boosting – weighted vote with a collection of classifiers o Random Forest – combining a set of decision tree classifiers
Bagging – Bootstrap Aggregation - Analogy – Diagnosis based on multiple doctors’ majority vote - Training o Given a set D of d tuples, at each iteration I, a training set Di of d tuples is sampled with replacement from D (bootstrap) o A classifier model Mi, is learned for each training set Di - Classification o Classify an unknown sample x o Each classifier M1 returns its class prediction o The bagged classifier M counts the votes and assigns the class with the most votes X - Prediction o Can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple - Accuracy o Often significantly better than a single classifier derived from D o Proved improved accuracy in prediction Boosting - Analogy

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture