Model Comparison 6. a. Example Regression problem Apr 3, 2020 102 Instead of splitting the data into training and testing data, we shall use 10 fold cross validation for each of the models under consideration. Let us compare Ordinary Least Square (OLS) model. regularization techniques such as Ridge and Lasso models. We shall compare RMSE and R squared value to identify the best model.
Model Comparison 6. a. Example Regression problem Apr 3, 2020 103 Inference We observe that OLS model a. has the least median RMSE 10.94 of 10 folds when compared to regularization models such as Ridge or Lasso. b. has the highest mean R squared 57.10% and median R squared 57.25% when compared to regularization model Lasso. Ridge model gives the same mean R squared and median R squared. Click here to view the code.
Model Comparison 6. b. Example Classification problem Apr 3, 2020 104 We shall use Pima Indians diabetes data set*, a summary from a collection of medical reports and indicate the onset of diabetes in the patients within five years. Three models constructed and tuned are CART, Random Forest, Naïve Bayes and K Nearest Neighbor. Each model is automatically tuned and is evaluated using three repeats of 10-fold cross validation. To ensure that each algorithm gets the same data partitions and repeats, a random number seed is set before running each algorithm. Models are trained and an optimal parameter configuration is found for each model. We collect accuracy results from each of the best models. The distributions are summarized in terms of percentiles. Boxplots are drawn to the distributions.
Model Comparison 6. b. Example Classification problem Apr 3, 2020 105 We observe that the Logistic regression gives the best Sensitivity and Specificity when compared to CART, Logistic Regression, Naive Bayes and KNN. With regard to AUROC, Naive Bayes model gives the best prediction!!
Model Comparison 6. b. Example Classification problem Apr 3, 2020 106 Click here to view the code. Visual comparison
1. Business Statistics – a first course – Davind M. Levine, Kathryn A. Szabat, David F Stephan and Dr P K Viswanathan Chapter 12 2. Business Analytics – The Science of Data Driven Decision Making – U Dinesh Kumar Model Comparison 7. Reference Apr 3, 2020 107
You've reached the end of your free preview.
Want to read all 107 pages?
- Fall '19
- Regression Analysis