
Fit training data very closely

Able to predict very well on training data

Doesn’t generalize well to the test set
o
High error on test data
Always easier to fit (and overfit) with more features
Evaluation
: determining how well a model fits the data
In the standard (holdout) framework we can only do this once
12
Hyperparameter

Variable part of the model which isn’t set during learning on training data

Needs to be tuned on a validation set
Model Parameter

Required by the model when making predictions

Values define the ‘skill’ of the model on your problem

Are learned from data

Often not set manually by the practitioner

Often saved as part of the learned model
Example, the coefficient (betas) in a linear or logistic regression
Model Hyperparameter

Often used in processes to help estimate model parameters.

Often specified by the practitioner.

Can be set using heuristics.

Often tuned for a given predictive modeling problem.
Example, the K in Knearest neighbors
If you have to specify a model parameter manually then it is probably a model hyperparameter.
How can we control fit in KNN?
by changing K

Smaller K – more fit

Larger K – less fit
KNN hyperparameters

K

Neighbor weights

Distance metric
Grid search: hyperparameter optimization
Systematic search for best hyperparameter settings

Manually choose values to test for each hyperparameter

Check validation accuracy/error for all combinations

Number of parameters expand, but often behave parallel (relatively independent)

Lots of computation!
Evaluating model performance
Metrics for a Regression Task:

Coefficient of determination (R2)

Mean Absolute Error

Root Mean Squared Error
Metrics for Classification Task:

Confusion Matrix

Accuracy, Precision, Recall, F1 score

Area Under ROC Curve
R2: coefficient of determination

How well model f predicts targets relative to mean

Equivalent to proportion of variance explained by f
13

The mean is not always a the suitable baseline
Error: difference between true value y and predicted value
Err = f(x) – y

Errors may be positive or negative

Summing errors can be misleading

Square/Absolute values prior to summing
Root Mean Squared Error (RMSE) or Mean Absolute Error (MAE)
Conclusion: MAE is easier to interpret, so present results, but the RMSE has some nice properties
(often use RMSE for training your model).
Confusion Matrix
Assignment to a class: y ε {0,1}
0 = ”negative class” (N) e.g legit, benign “dominant class”
1 = “positive class” (P) e.g. fraudulent, malignant “rare class”
Accuracy
Precision: hit rate
Recall: true positive rate
F or F1score
14
ROC curves (area under the curve)

Widely used measure for classification

Measure between 0.5 and 1.0

Can inspect outcome for different thresholds (criterion)
Cross Validation
To estimate performance of the learned model from available data using one algorithm

Often motivated by small datasets
To compare the performance of two or more different algorithms and find out the best algorithm for
the available data

Often motivated by search optimal solution
You've reached the end of your free preview.
Want to read all 16 pages?
 Fall '19
 Regression Analysis, RMSE