Model Comparison
3. Principle of Model validation - continued
Apr 3, 2020
89

A good regression model predicts values of the response variable very close to
the observed response values. The difference between predicted value and
observed value of a response variable is called as prediction error.
In Ordinary Least Squares (OLS) regression model, four statistics are used to
evaluate the fitness of the model:
1. R squared and Adjusted R squared
2. F test
3. Root Mean Square Error (RMSE)
4.
Mean Absolute Percentage Error (MAPE)
Model Comparison
4. Prediction Error
Apr 3, 2020
90
Regression problems

The above statistics are based on Total Sum of Squares (SST) and
Error Sum of Squares (SSE).
SST measures how far the data are from the mean while SSE
measures how far the data are from the model's predicted values.
i.
R squared is obtained by dividing the difference (between SST and
SSE) by SST. Adjusted R square incorporates the model's degrees of
freedom. Adjusted R square is interpreted as the proportion of total
variance that is explained by the model.
ii.
F test evaluates the null hypothesis that all regression coefficients
are equal to zero versus the alternative that at least one is not. F
test determines whether the proposed relationship between the
dependent variable and the set of independent variables is
statistically reliable.
Model Comparison
4. Prediction Error - continued
Apr 3, 2020
91

iii.
RMSE is the square root of the variance of the residuals. It indicates the
absolute fit of the model to the data. RMSE is an absolute measure of fit.
RMSE can be interpreted as the standard deviation of the unexplained
variance and also has the same unit of measure as the response variable.
iv.
RMSE is a good measure of how accurately the model predicts the
response.
v.
Mean Absolute Percentage Error (MAPE) is a measure of prediction
accuracy of a forecasting method for example in trend estimation.
Refer to
iii.
Mean Absolute Error (MAE) is the mean of absolute errors and it is difficult
to distinguish between big and small errors. MAPE calculates the mean
absolute error in percentage terms, thus allowing us to compare forecasts
of data with different units of measure using different models.
Model Comparison
4. Prediction Error - continued
Apr 3, 2020
92

Model Comparison
4. Prediction Error - continued
Apr 3, 2020
93
Classification problems
In classification problems, we measure how the model the sample data is
correctly classified to their category.
Here, the objective is to find a classifier such as Logistic Regression, CART,
Random Forest etc. that performs well in predicting classes for new data for
which the response is not known. In classification models, error or residual is
the count of misclassified observations.

#### You've reached the end of your free preview.

Want to read all 107 pages?

- Fall '19
- Regression Analysis