For example, if we have a dataset of different brands which manufacture cars consider properties like mpg, year of manufacture , no of cylinders etc. each property contributes in model independently . Now, Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related . So, here in Naive Bayes calculating correlation analysis is of no means . Even though two variables are highly correlated but that will not hold any value in model. Because this each independent variable/feature contributes separately. Later on , we club all these probabilities to predict final target class. So, we come to a conclusion that Naive Bayes do contains multicollinearity but due to the assumptions of independence there is no need to consider it . Since, there is no relation between the variables assumed already in the theorem, thus it is not possible to have an effect of variables even they show Multicollinearity in the dataset. Mathematically, P(A | B) = { P(B | A) * P(A) } / P(B) Where, P(A | B) = probability of event A given probability of event B P(B | A) = probability of event B given A P(A) = probability of event A P(B) = probability of event B So, the conclusion is no , multicollinearity does not effect naïve bayes model.5. If we do not define number of trees to be built in random forest then how many trees random forest internally creates? Answer : Random Forest: The random forest combines hundreds or thousands of decision trees, trains each one on a slightly different set of the observations, splitting nodes in each tree considering a limited number of the features. The final predictions of the random forest

are made by averaging the predictions of each individual tree. Ensemble model made of many decision trees using bootstrapping, random subsets of features, and average voting to make predictions. This is an example of a bagging ensemble. Always, the next tree in random forest is going to create is based on the error observed in the previous tree. If the new tree is observing the same error or you can say if the algorithm is capturing the same error consistantly then it will stop creating the trees. This is how random forest internally define number of trees to be built . The default number of trees is 10 The default value for n-tree in R Random Forest function is 500 we cannot predict the number of tree the random forest algorithm will create internally. Explanation: The next tree random forest is going to create is based on the error observed in the previous tree. If the new tree is observing the same error or you can say if the algorithm is capturing the same error consistantly then it will stop creating the trees. In R, code to apply random Forest having 100 number of randomly selected decision tree is below: randomForest (targetVariable~. , trainData, importance=TRUE, ntree=100) Now, according to question if we do not provide the number of trees(ntree) intended to be built, then in that case the code will become like: randomForest(targetVariable~. , trainData, importance=TRUE) here, whatever the default number of trees are would get generated by this code. To get this default number of trees generated, consider the screenshot attached below.

#### You've reached the end of your free preview.

Want to read all 6 pages?

- Fall '19
- none