Distances value of more than chi square critical value with degrees of freedom

Distances value of more than chi square critical

This preview shows page 62 - 70 out of 70 pages.

Distances value of more than chi-square critical value (with degrees of freedom is equal to the number of explanatory variables) is classified as outliers.
Image of page 62
Business Analytics – The Science of Data Driven Decision Making Cook’s Distance Cook’s distance measures how much the predicted value of the dependent variable changes for all the observations in the sample when a particular observation is excluded from sample for the estimation of regression parameters. Cook’s distance for simple linear regression is given by where D i is the Cook’s distance measure for i th observation, is the predicted value of j th observation including i th observation, is the predicted value of j th observation after excluding i th observation from the sample, MSE is the Mean–Squared–Error. MSE ) Y Y ( D j 2 j(i) j i j Y ) ( i j Y
Image of page 63
Business Analytics – The Science of Data Driven Decision Making Leverage Value Leverage value of an observation measures the influence of that observation on the overall fit of the regression function. Leverage value for an observation in SLR is given by Leverage value of more than 2/ n or 3/ n is treated as highly influential observation. In Eq. the first term (1/ n ) will tend to zero for large value of n . n i i i i x x x x n h 1 2 2 ) ( ) ( 1
Image of page 64
Business Analytics – The Science of Data Driven Decision Making DFFit and DFBeta DFFit is the change in the predicted value of Y i when case i is removed from the data set. DFBeta is the change in the regression coefficient values when an observation i is removed from the data.
Image of page 65
Business Analytics – The Science of Data Driven Decision Making Confidence Interval for Regression coefficients 0 and 1 The standard error of estimates of and are given by where Where S e is the standard error of residuals and SSX = The interval estimate or (1- )100% confidence interval for and are given by 0 1 X n i i e e SS n X S S 1 2 0 ) ( X e e SS S S ) ( 1 2 2 n Y Y S i i e n i i X X 1 2 ) ( 0 1 ) ( 0 2 , 2 / 0 e n S t ) ( 1 2 , 2 / 1 e n S t
Image of page 66
Business Analytics – The Science of Data Driven Decision Making Confidence Interval for the Expected Value of Y for a Given X Since the point estimates are subjected to higher levels of error, due to uncertainties around estimation of parameters and natural variation in the data around the predicted line, the user would like to know the interval estimate or the confidence interval for the conditional expected value. The confidence interval of the expected value of Y i for a given value of X i is given by Where the term is the standard error of E(Y|X). n i i i e n i X X X X n S t Y 1 2 2 2 , 2 / ) ( ) ( 1 n i i i e X X X X n S 1 2 2 ) ( ) ( 1
Image of page 67
Business Analytics – The Science of Data Driven Decision Making Prediction Interval for the Value of Y for a Given X The prediction interval of Y i for a given value of X i is given by where the term, is the standard error of Yi for a given Xi value n i i i e n i X X X X n S t Y 1 2 2 2 , 2 / ) ( ) ( 1 1 n i i i e X X X X n S 1 2 2 ) ( ) ( 1 1
Image of page 68
Business Analytics – The Science of Data Driven Decision Making For large n , the confidence interval of E ( Y|X
Image of page 69
Image of page 70

You've reached the end of your free preview.

Want to read all 70 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture