This equation can be used to predict the value of

  • No School
  • AA 1
  • 30

This preview shows page 15 - 18 out of 30 pages.

the line and e is error term. This equation can be used to predict the value of target variable based on given predictor variable(s). The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable Points to Remember: There must be linear relationship between independent and dependent variables Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity . Linear Regression is very sensitive to Outliers . It can terribly affect the regression line and eventually the forecasted values. Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable
Image of page 15

Subscribe to view the full document.

Analytics Handbook Business Technology Club | Indian School of Business 15 In case of multiple independent variables, we can go with forward selection , backward elimination and step wise approach for selection of most significant independent variables. Logistic Regression : Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following equation. odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence ln(odds) = ln(p/(1-p)) logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3 .... +bkXk Point to Remember: It is widely used for classification problems Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square The independent variables should not be correlated with each other i.e. no multi collinearity . However, we have the options to include interaction effects of categorical variables in the analysis and in the model. If the values of dependent variable is ordinal, then it is called as Ordinal logistic regression If dependent variable is multi class then it is known as Multinomial Logistic regression .
Image of page 16