homework 9 key4 - To find the prediction interval you need...

Info icon This preview shows pages 1–14. Sign up to view the full content.

Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: To find the prediction interval you need to calculate the standard error for predicting an individual response \Ar: hide variables n = sample size = 20 5 = standard error of the estimate = 41235923395... i mean of explanatoriir variable = 2,286 C111>>>>>>>111111¢>>>>>><<<<<<<>>>>>>>1111111>>>>It SSX = sum of squares of explanatory variable = 20:, - i): = (n-1)5,,2 = 14,145,680 x = point of interest = 2,500 SE? = standard error for predicting an individual response — 2 SE; =s><\/(1+ +%L) _ 1 £2,500 — 2,235)2 — 41275928395... x \l( 1 + 20 + 14’145f680 ) 42160395191... Therefore, the prediction interval is given bv: hide variables = (a = 0.05) critical value in the t distribution with 18 degrees of freedom = 2.101 n.mmm.mm.u H- ‘( H- Ff X to ‘5' II 1,975.49261683... i 2.101 x 42160395191... 1,085.500?1386. . . s E S 2,865.48451QYQ... 1,086 5 r ; 2,865 Rounded as last step :| Feedback [1 out of 2] You are partly correct. 55):: this option should have been selected. 1: you are correct. at: this option should have been selected. n: this option should have been selected. - xi: this option should have been selected. - the sample size: you are oomect. - the level “significance used: you are oorrect. - SST: this is not correct. Discussion Suppose the oonfidenoe Interval for the mean value ofv Is to be oonstructed at the value x = xi, and the size ofthe sample drawn is n. The 100 x (1 - o)% oonfidenoe interval for E(v) is: a * tarzsvlhi wh ere: vi = the predicted value for v when x = xi tap the of2 critical value of the t distribution with n — 2 degrees of freedom 5 = the standard error of the estimate 1 + (x. — W " 55): hi: So there are several factors involved in constructing the oonfidenoe interval. So there are several factors involved in constructing the confidence lnterva1. )q, x and SSX One of the factors involved is how far awav the level of x is from the sample mean value for x. The further awav the level of x is. the wider the confidence interval. There is a specific measure of how 'far away' the level dfx is from X. It ls: the squared difierence between xi and x as a fraction of 55x (the total sum of squared dlfierences between all data points and the mean). This is why the mean of an x values and 55X are Important, while the mean of all v values and SST are not. 111:: standard error ofthe estimate, 5 The confidence interval will be wider if there is a lot ofvariatlon in the sample between the values for v and the values predicted for v bv the prediction line. 111is is because such variation would suggest that the regression is not very accurate. and the interval needs to cover more values in order to (with confidence) cover the expected value of v. The measure of varlatlcn of values for 5! about thelr predlcted values Is the standard error of the estimate, s. The predide value, §i At a given level of x. sav xl. v isa random variable with expected value Bo + le|. Here Be and BI are parameters estimated by the statistics [)0 and [11. You use the prediction line vi = b0 4- blxi to predict the value for v. and so ya is an estimator for the expected value of y at x = in. As such, vi is always at the center of the confidence interval for E0”. Sample size and level of signlflcanee As usual, the sample size (n) and level ofsignifimnoe (a) will affect the confidence interval construction. And as usual, an increase in sample size will decrease the width of the confidence interval, and an increase in the level of significance (and decrease in level of confidence) will also decrease the width of the confidence interval. In particular. as n increases. l/n decreases, thus reducing the width of the interval. its a increases. the critical value to” decreases, thus a1sd reducing the width ofthe interval. : 3 o! 3 ID: MSTSLEAV.03.0010 A regression model has been developed to analyze the relationship between a dependent varialie v and an independent variable x. A prediction interval is to be constructed for an Individual value of y for a given value of x. Seled: whether each mange would increase, decrease, or not affect the width of the prediction interval. Increase Decrease Not affect a} Decreasing the level of confidence l3} Using a larger sample size . I c} Choosing a value of it further away from i ' I d) Havlng less variation In the values of the dependent variable about the predlction line - I [3 outo‘l' 4] a) You are correct. b} You are curred:— c) You are correct. d} This Is not correct. Having less variation in the valua of the dependent variable about the prediction line would decrease the width of the prediction interval. Discussion Discussion The effects these changes would have on the prediction interval for an individual value of y at a given level of x can be investigated in two ways. One way is to look at the formula for the prediction interval. By observing how factors occur in the formula, you can determine how changes in those factors will affect the interval. Alternatively, you can argue 'why' the changes should affect the interval in the way that they do. Suppose the prediction interval Is to be constructed at the value x - XI, and the size ofthe sample drawn is n. The 100 x (1 - om: prediction interval for y is: in i wanna/(1 + hli where: yi the predicted value for y when x = all tauurz = the 0/2 critical value of the t distribution with n - 2 degrees of freedOm s = the standard error of the estimate . _ — 2 hi = l + (xi )0 "' ssx So the width of the interval is determined by several factors. Confidence By decreasing the level ofccnfidenoe (100 x (l - c)%). you are increasing the level ofsignificance. c.111e critical value in any t distribution is decreased if the level of significance is increased. Therefore decreasing the level of confidence will decrease the width of the prediction interval. ‘Ihis can also be explained by the fact that any prediction interval should decrease in width if you are going to decrease the level of confidence you want in that interval. In other words. if you can handle being less sure that your interval does indeed contain the value of y. you can have a shorter interval. Sample size ‘lhe larger your sample size (n) is, the smaller lfn is. The h. term in the calculation ofthe prediction interval is the square root ofa number involving lfn. and so decreasing lfn will decrease hi. Therefore increasing the sample size will decrease the width of the prediction interval. This also makes sense if you consider the fact that having a larger sample dze means that you have more information about the population being considered (the dependent variable y). It is a general fact about prediction intervals that larger sample sizes will typically shorten the wicth of the interval. Changing the value of the Independent variable The above formula for the prediction interval for y is defined for the fixed value x. of the independent variable, x. The value that is awmed for the independent variable will definitely change the position of the interval. In particular. the center of the interval, yi. is determined by xl. But changing x.- will also affect the width ofthe interval. In particular. the difference between X; and the mean value ofx in the sample. i. will be a factor in the width of the interval. The term {xi - if appears in the numerator of hi In the formula for the width of the Interval. Therefore having a value xi further away from i will increase the width of the predctlon interval. The qualitative reasoning behind why valum closer to the mean of x will give more accurate estimations of the mean of y is not as simple as for the other factors The reason has to do with the fact that prediction lines_are straight lines Suppose for the moment that you actua_lly have two prediction lines coming from two different samples that have the same sample mean X for the independent variablearld the same sample mean ‘r for the dependent variable. It is a property of the regression coefficients that both prediction lines will pass through the point (KY). And since they are both straight lines. as they move away from this point they will move away from each other more and more. So. for a value of x that is far away from X, the two prediction lines will have very dilferent values for y. In other words, there is more variation in the predicted values of y for values of x that are far from the mean. Variation in line dependent variable The standard error of the estimate. 5. is a measure of the difference between the values of the dependent variable that occur in the sample (y.) and the values predicted by the prediction line (yl). If this variation increases, then s increases. Therefore a decrease in the variation of values of y about the values predicted will decrease the width of the prediction interval. The reason for the decrease in the width of the prediction interval can be explained by the following fact. If the values of the dependent variable do not vary much from the valua prediIXed for it. your regrfision model is an accurate model in the sense that your ability to predict things about the dependent variable {based on the indepeth variable) is large. So in trying to predict a value for y at some level ofx, your preciction wfll lfl(ely be more accurate. That is, the prediction interval is narrower. :I 1. M3 ID: HSTSLMV.05.0010 Filby develops a regression model to analyze the relationship between the amount of time he studifi for an exam (x) and the mark he gels in an exam (y). He would like to construct a confidence Interval for the average mark he would get If he studied for 8 hours. He would also like to construfl a pl'EtlfllOl’l Interval for the mark he would get if he studied for 3 hours. ‘lhe levels of confidence [or thse Intervals are the same, and the same sample is used for bath intervals. Select the correct statement regarding these two intervals: The two intervals have the same width The confidence interval for the average mark will be wider than the prediction interval for the mark The prediction interval for the mark will be wider than the confidence interval for the average mark There is not enough information to cell which interval will be wider :| Feedback to out of 1] This is not correct. The correct statement regarding these two intervals is: The predcflon Interval for the mark will be wider than the confidence Interval for the average Ina rir. Discussion EI‘Hmlflng means and predicting values Estimating means and predicting values If you develop a prediction line for a relationship: §l= be + '31)! then you can use unis line to predict values of the dependent variable. v.1’ou can also use this line to calculate the expected value of y. This distinction can be subtle. For a fixed value of i: (say XI], y is a random variable. This should not be too surprising, since a fundamental rule in statistical regression is that your predictor variable (x) will not. give you an exact value for your dependent varlabie. Now. given that y is a random variable for x = II. you can calculate Its exoeaed value. This is your alwlated mean value for y at that level of x. But since the prediction line involves the statistil: be and II}. this is only a pith estimate. The find: that y la a random variable for v = x. also means that you can predict a value that y will take. This is not the same as trying to estimate the mean, this is trying to actually predict the value that the random variable will assume. Having said that, It should not be too surprising that the point estimate for the expected value and the prediction for an individual value are the same: on + b110,. The dlflerence between the estimate of the mean and the prediction of an Individual value oedema more dear when you construct Intervals around this value. Intuitiverr the 'prediction' Interval for an individual value of y (at a Iboed value of x) will be rather wide. This Is because you are saying: ’I have a random variable and I want an interval in which 1 am fairly confith the variable will taloe a value.I On the other hand. the confidence interval for the mean should be narrower. since you are onlyI trying to estimate the mean of all values that y might assume {at that level or x). 111i: is indeed the case: the prediction interval for an intividual value will ahvays be wider than the confidence interval for an average value. Mathematically this can be seen in the fact that, with a level of confidence of o and a sample of size n, the widths oi the depiction Interval for y and the confidence Interval for Em at x - vi are: Width ol prediction interval for y: 2 x t mzsv'u + h.) Width of confidence interval for Ely): 2 x tons v’h. where: to}; Is the cf2 critical value of the t distribution with n - 2 degrees of freedom 5 is the standard error of the estimate + (XI - i? hl = 1 I1 ssx :I 1 of 3 ID: M5T.5LR.nv.os.oo2o Both the confidence interval and the predation interval are vital in analyzing simple regression. and they have their own Individual purpose. a) The confidence interval is always the prediction interval at a given level at significance [other than D and 1). b) The point estimate of the mean value I5 always the point estimate of the predicted Individual value. _| Feedback [1 out of 2] a) You are correct. a) This is not correct. The point estimate of the mean value Is always equal to the point estimate or the premixed individual value. Calculation a) The prediction interval looks at a prediction 01 an individual value rather than the mean of the variable. It Is therefore wider than a confidence interval because the mean of a group of values is usually less extreme than the values themselves. an individual prediction can take on a much more extreme value than a mean of these possible values. Themfore. the confidanoe interval. which is concomed with the mean. is always narrower than the pI'EdiCliUn interval at a given level of significance (other than 0 and 1). Note that in the rase of a level or significance or I] (corresponding to 100% confidence) both inbervals are the entire real line from negative infinity to positive infinity and In the case oi a level of significance all (corresponding to 0% confidence) both Intervals are simply the point estimate. b) One of the interesting things about the simple Iinr regression model is that the point estimate of the mean of the response at a gven xl is always equal to the point estimate of an individual value. The reason why this is so is beause it conforms with the intrinsic workings of the simple linear regrfisidn rnddel. Under the simple linear regressiOn modelr the value of the response (y) is equal to a constant value (Bo) and a value induced by its relatIOnshlp to the independent [or explanatory} variable (51):) plus an error term (e): Calculation a) The prediction interval looks at a prediction at an individual value rather than the mean of the variable. It is therefore wider than a confidence interval because the mean of a group of values is usually less extreme than the values themselves. An individual prediction can take on a much more extreme value than a mean of these possible values. Therefore. the confidence interval, which is concerned with the mean, is always narrower than the prediction interval at a g'ven level of significance (other than CI and 1). Note that in the case ofa level ofsignificance of D (corrfiponding to 100% confidence) both intervals are the entire real line from negative infinity to positive infinity and in the case of a level of significance of 1 (correspondng to 0% confidence) both intervals are simply the point estimate. b] One of the interesting things about the simple linear regression model is that the point estimate of the mean 01 the response at a given X] is always equal to the point estimate of an individual value. The reason why this is so is because it conforms with the intrinsic workings of the simple linear regression model. Under the simple linear regression model, the value of the response (y) is equal to a constant value (Bo) and a value induced by its relationship to the independent (or explanatory) variable (pix) plus an error term (a): if = 30 + 51* + 5 By taking the expected value of both sides (for a particular value of x) one obtains the expression of the mean: El'le = Xi] = EiBo + Bixi + E] = EiBo] + E[I31Xi] + Eizl = Bo + Bm The point estimates of Ba and B; are b0 and b1 respectively. Therefore the point estimate of the mean is equal to the point estimate of [so + 81x. which is equal to be + b, xi. However. thls expresion is the equation ofthe regresslon Ilne whlch gives the pclnt estimate or an individual predicted value. Hence the point estimate of the mean value is always equal on the point estimate of the predicted individual value. :I 3 of 3 1o: MST.SLR.AV.04.0019 You have calculated the lower bound of a 99% prediction Interval at x = 5 to be 6.91 and the corresponding upper bound to be 1 8.29, from simple linear regression analysis. Select all the following mnduslons that may be drawn from this Interval: One can be at least 99% confident that the prediction interval includes the mean of the response variable at x = 5 {u m = 5} J One can be at most 99% confident that the prediction interval Includes the mean of the response variable at x = S (uflx = 5) One can be 99% confident that the prediction interval includes the mean of the response variable at x = 5 (ule _ 5) x” One can be 99% confident that the prediction interval includes the individual value of the response variable given x = 5 (yx=5l [1 out of2] You are partly correct. - One can be at least 99% confident that the prediction Interval Includes the mean of the response variable at x - 5 (um,r = 5): this option should have been selected. - One can be 99% confident that the production interval lncludu the Individual value of the response variable given x = 5 {yfis‘}: you are correct. - One can be at rnoet 99% confident that the prediction Interval Includes the rnean of the response variable at x = 5 (um . 5): this is not correct. Discussion The prediction Interval is given by the following formula: Mariam ; 1 tons Vll‘l'hl) D is: use ion The prediction Interval is given by the following formula: show variables A v e taps \l(1+h|) Where: _—2 hi =1 + (xi x) n 55X In comparison, the confidence Interval is given by the following formula: i} t taps v'hi a prediction Interval ls used to look at the possibilities that a predicted individual value can take. A99% prediction Interval between 6.91 and 18.29 states that one can be 99% confident that this interval Includes the individual value. If the interval was a 99% confidence interval instead, it would state that one can be 99% confident that this interval includes the mean of the rfiponse variable at x = 5. Furthermore, the prediction interval is always wider than the confidence interval on the same variable. such that its lower bound is lower. and its upper bound is higher. a result of this is that wlth a c% prediction Interval, one would be at least c% confident that it will also contain the mean of the response variable. 111ls is because a c‘ll: prediction Interval Is always wider than a c% confidence Interval. with the exception of a 100% {where the ranges are from positive to negative infinity) and 0% (no range) interval. what this means is that fora 99% prediriion interval, one would be at least 99% confident that the interval would also include the mean, since it is wider than a 99% confidence interval. ;I 2 of 3 ID: M5T.SLR.AV.94.0010 You have calculated the lower bound of a 95% prediction Interval at x = 2 to be 1.45 and the corresponding upper bound to be 11.71r from simple linear regression analysis. Select all the following conclusions that may be drawn from this Interval: J One can be at least 95% confident that the prediction interval includes the mean of the response variable at x = 2 (u m = 2] of One can be at most 95% confident mat the prediction interval includes the mean of the response va...
View Full Document

  • Fall '13
  • ChristaLSorola

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern