This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Performance and Performance and Generalization Generalization PR , ANN, & ML 2 Classifier Performance x Intuitively, performance of classifiers (learning algorithms) depends on b Complexity of the classifiers (e.g., how many layers and how many neurons per layers) b Training samples (generally more is better) b Training procedures (e.g., how many searches/epochs are allowed) b Etc. PR , ANN, & ML 3 Generalization Performance x You can make a classifier performs very well on any training data set b Given enough structure complexity b Given enough training cycles x But how does it do on a validation (unseen) data set? b Or how is the generalization performance? PR , ANN, & ML 4 Generalization Performance (cont.) x First, try to do better on unseen data by doing better on training data might not work x Because overfitting can be a problem b You can fit the training data arbitrarily well, but there is no prediction of what it will do on data not seen x Example: curve fitting B Using a large network or complicated classifier does not necessarily lead to good generalization (they almost always lead to good training results) PR , ANN, & ML 5 Generalization Performance (cont.) x In fact, some relations must exist in the data set even when the data set is made of random numbers x Example: given n people, each b Has a credit card b Has a phone b The credit card and phone number association is captured by an n1degree polynomial b But can you extrapolate (predict other credit card, phone number association)? b A problem of overfitting PR , ANN, & ML 6 Intuitively x Meaningful associations usually imply b Simplicity (capacity) h The association function should be simple h More generally to determine how much capability a classifier possesses b Repeatability ( stability ) h The association function should not change drastically when different training data sets are used to derive the function, or E(f)=0 (over different data set) h Average salary of Ph.D. is higher than that of highschool dropout – simple and repeatable relation (not sensitive to the particular training data set) PR , ANN, & ML 7 Generalization Performance (cont.) x So does that mean we should always prefer simplicity? x Occam’s Razor: nature prefers simplicity b Explanations should not be multiplied beyond necessity x Sometimes, it is a bias or preference over the forms and parameters of a classifier PR , ANN, & ML 8 No free lunch theorem x Under very general assumption, one should not prefer one classifier (or learning algorithm) over another for the generalization performance x Why? b Because given certain training data, there is no telling ( in general ) what unseen data will behave PR , ANN, & ML 9 Example x Training data might not provide any information about F( x ) x There are multiple (2 5 ) target functions that are consistent with the n=3 patterns in training set x Each inversion of F (F) will make one good and the other bad1 1 1 1111 1 1 1101 11 1011 1 1 1001 11 011 1...
View
Full
Document
This note was uploaded on 08/06/2008 for the course CS 290I taught by Professor Wang during the Spring '07 term at UCSB.
 Spring '07
 WANG
 Algorithms, Machine Learning

Click to edit the document details