the line and e is error term. This equation can be used to predict the value of target variable based on
given predictor variable(s).
The difference between simple linear regression and multiple linear regression is that, multiple linear
regression has (>1) independent variables, whereas simple linear regression has only 1 independent
variable
Points to Remember:
•
There must be
linear relationship
between independent and dependent variables
•
Multiple regression suffers from
multicollinearity, autocorrelation, heteroskedasticity
.
•
Linear Regression is very sensitive to
Outliers
. It can terribly affect the regression line
and eventually the forecasted values.
•
Multicollinearity can increase the variance of the coefficient estimates and make the estimates
very sensitive to minor changes in the model. The result is that the coefficient estimates are
unstable

** Subscribe** to view the full document.

Analytics Handbook
Business Technology Club | Indian School of Business
15
•
In case of multiple independent variables, we can go with
forward selection
,
backward
elimination
and
step wise approach
for selection of most significant independent variables.
Logistic Regression
: Logistic regression is used to find the probability of event=Success and
event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/
False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following
equation.
odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence ln(odds) =
ln(p/(1-p)) logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3
....
+bkXk
Point to Remember:
•
It is widely used for
classification problems
•
Logistic
regression doesn’t require
linear relationship between dependent and independent
variables. It can handle various types of relationships because it applies a non-linear log
transformation to the predicted odds ratio
•
To avoid over fitting and under fitting, we should include all significant variables. A good
approach to ensure this practice is to use a step wise method to estimate the logistic regression
•
It requires
large sample sizes
because maximum likelihood estimates are less powerful at low
sample sizes than ordinary least square
•
The independent variables should not be correlated with each other i.e.
no multi
collinearity
. However, we have the options to include interaction effects of categorical
variables in the analysis and in the model.
•
If the values of dependent variable is ordinal, then it is called as
Ordinal logistic regression
•
If dependent variable is multi class then it is known as
Multinomial Logistic regression
.