The other decision trees are called weak because they have lesser ability than the full model and use a simpler model.
Each weak decision tree is trained to address the error of the previous tree to finally come up with a robust model.
Category Boosting (CatBoost)
CatBoost is a fast, scalable, high performance algorithm for gradient boosting on decision trees. It can work with diverse
data types to help solve a wide range of problems that businesses face today. Catboost achieves the best results on the
benchmark.

Catboost is built with a similar approach and attributes as with Gradient Boost Decision Tree models. The feature that
separates CatBoost algorithm from rest is its unbiased boosting with categorical variables. Its power lies in its categorical
features preprocessing, prediction time and model analysis.
Catboost introduces two critical algorithmic advances - the implementation of ordered boosting, a permutation-driven
alternative to the classic algorithm, and an innovative algorithm for processing categorical features.
CatBoost handles data very efficiently, few tweaks can be made to increase efficiency like choosing the mode according
to data. However, Catboost’s training and optimization times is considerably high.
Stacked Generalization (Stacking)
Stacking is an ensemble method where a new model is trained to combine the predictions from two or more models
already trained on a dataset. It is based on a simple idea: instead of using trivial ensemble functions to aggregate the
predictions of all predictors in an ensemble, stacking would train a model to perform this aggregation. The idea is that
you can attack a learning problem with different types of models which are capable to learn some part of the problem,
but not the whole space of the problem.
The procedure starts with splitting the training set into two disjoint sets. Following this we would train several base
learners on the first part and test the base learners on the second part. Using these predictions as the inputs, and the
correct responses as the outputs, we’ll train a higher-level learner.
For example, for a classification problem, we can choose as weak learners a KNN classifier, a logistic regression and an
SVM, and decide to learn a neural network as meta-model. Then, the neural network will take as inputs the outputs of
our three weak learners and will learn to return final predictions based on it.
It is typical to use a simple linear method to combine the predictions for sub models such as simple averaging or voting,
to a weighted sum using linear regression or logistic regression. It is important that sub-models produce different
predictions, so-called uncorrelated predictions. Stacking is one of the most efficient techniques used in winning data
science competitions.
Clustering
In supervised learning, we know the labels of the data points and their distribution. However, the labels may not always be
known. Clustering is the practice of assigning labels to unlabeled data using the patterns that exist in it. Clustering can either be
semi parametric or probabilistic. (14)
K-Means Clustering

#### You've reached the end of your free preview.

Want to read all 36 pages?

- Fall '19
- Regression Analysis