The other decision trees are called weak because they have lesser ability than the full model and use a simpler model. Each weak decision tree is trained to address the error of the previous tree to finally come up with a robust model. Category Boosting (CatBoost) CatBoost is a fast, scalable, high performance algorithm for gradient boosting on decision trees. It can work with diverse data types to help solve a wide range of problems that businesses face today. Catboost achieves the best results on the benchmark.
Catboost is built with a similar approach and attributes as with Gradient Boost Decision Tree models. The feature that separates CatBoost algorithm from rest is its unbiased boosting with categorical variables. Its power lies in its categorical features preprocessing, prediction time and model analysis. Catboost introduces two critical algorithmic advances - the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. CatBoost handles data very efficiently, few tweaks can be made to increase efficiency like choosing the mode according to data. However, Catboost’s training and optimization times is considerably high. Stacked Generalization (Stacking) Stacking is an ensemble method where a new model is trained to combine the predictions from two or more models already trained on a dataset. It is based on a simple idea: instead of using trivial ensemble functions to aggregate the predictions of all predictors in an ensemble, stacking would train a model to perform this aggregation. The idea is that you can attack a learning problem with different types of models which are capable to learn some part of the problem, but not the whole space of the problem. The procedure starts with splitting the training set into two disjoint sets. Following this we would train several base learners on the first part and test the base learners on the second part. Using these predictions as the inputs, and the correct responses as the outputs, we’ll train a higher-level learner. For example, for a classification problem, we can choose as weak learners a KNN classifier, a logistic regression and an SVM, and decide to learn a neural network as meta-model. Then, the neural network will take as inputs the outputs of our three weak learners and will learn to return final predictions based on it. It is typical to use a simple linear method to combine the predictions for sub models such as simple averaging or voting, to a weighted sum using linear regression or logistic regression. It is important that sub-models produce different predictions, so-called uncorrelated predictions. Stacking is one of the most efficient techniques used in winning data science competitions. Clustering In supervised learning, we know the labels of the data points and their distribution. However, the labels may not always be known. Clustering is the practice of assigning labels to unlabeled data using the patterns that exist in it. Clustering can either be semi parametric or probabilistic. (14) K-Means Clustering
- Fall '19
- Regression Analysis