Because this model assumes that each independent variable is
contributing separately.
Later we club all the probabilities to predict the final Target Class.
Attributes values are conditionally independent given the target class.
Hence Naïve Baye’s combine prior probability, posterior and conditional
probability to predict its class label.
Before proceeding to decide whether multi collinearity will affect Naïve
Baye’s or not. Let’s discuss about Multi collinearity.
What is Multi collinearity?
Multi collinearity occurs when
independent variables
in a
regression
model
are correlated. This
correlation
is a problem because independent variables
should be INDEPENDENT. If the degree of correlation between variables is high
enough, it can cause problems when you fit the model and interpret the results.
There are certain reasons why multi collinearity occurs:
It is caused by an inaccurate use of dummy variables.
It is caused by the inclusion of a variable which is computed from other
variables in the data set.
Multi collinearity can also result from the repetition of the same kind of
variable.
9

Assignment – Machine Learning Basics
Generally occurs when the variables are highly correlated to each other.
Result:
Therefore if we use Naïve Baye’s on Multi collinear dataset then the
algorithm will fail.
Let’s understand this with an simple example given below;
Example:
Imagine a person visits a eye glass store to purchase eyeglasses
basing on his eye sight prescription. The glass maker will provide the eyeglass
basing on the prescription the customer provided. The prescription consists of
many variables such as x- axis, y- axis and other eye related measures. If any
one of the data/ observation is missing or damaged by a human cause. How the
glassmaker deliver the desired eyeglass to the customer?
Generally if the data is not present or by mistake if the data is increased
or decreased, this results in drastic change in the vision of eyeglasses.
Hence if multi collinearity exists in the data it will surely affects the Naïve
baye’s assumes that all the variables in the data are independent.
Note
: Multi collinearity can also be detected with the help of tolerance and its
reciprocal, called variance inflation factor (VIF). If the value of tolerance is less
than 0.2 or 0.1 and, simultaneously, the value of VIF 10 and above, then the
multi collinearity is problematic.
5. If we do not define number of trees to be built in random
forest then how many trees random forest internally creates?
Solution:
Random forest is an Ensemble/group that consists of many decision trees
algorithm. It is a supervised classification algorithm.
Basically Random forest is a combination of weak data to produce a strong
data.

#### You've reached the end of your free preview.

Want to read all 11 pages?

- Spring '19
- Thej
- Linear Regression, Regression Analysis, Type I and type II errors