homework 9 key2 - :l 3 ol 3 m nsrsmnsaomozo TWo variables(A...

Info icon This preview shows pages 1–15. Sign up to view the full content.

Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: :l 3 ol' 3 m: nsrsmnsaomozo TWo variables (A and B) are hypothesised to have a linear relationship with one another, as represented by the following equation: A =Bo+BlB+s Data was gathered for thae two variables and the correlation coefficient (r) was calculated to be —o.96. Select all from the following statements that are true: Variable A is known as the explanatory variable, and variable B is known as the response variable. J The linear regression line provides a strong fit to the observed data. Causation between A and B cannot be implied from the correlation that exists between them. Variable B causes variable A since B is the independent variable. ‘/ Be is known as the sample intercept. J The proportion of variability in A that is explained by the regression model is equal to 92.16%. The graph of the linear relationship between A and B slopes up. 3 Feedback [1 out of 2] You are partly oorrett. - Causation between Aand 3 cannot be implied from the correlation that exists between fliem.: this option should have been selected. - The proportion ofvariablllty In Athat ls plained by the regression model ls equal to 92.16%»: you are correct. - The linear regression line provides a strong lit to the observed data.: you are correct. - Bu ls known as Hie sample intercept: this is not correct. Discussion 3 Feedback [1 out of 2] You are partly correct. Causation between Aand B cannot be Implied iron] the correlation that exists between them.: this option should have been selected. The proportion oi variability In A that Is explained by the regression model Is equal to 92.16%; you are correct. The linear regression line provides a strong fit to the observed dab“ you are correct. Ba la known as the sample Intercept: this is not correct. Variable A is the response variable since It 'raspcnds' to changes in variable B. which is the explanatory variable. Despite this. one cannot claim that the explanatory 'causes' the response variable or vice versa. The fat: that the two are correlated does not automatically imply that they are causally linked. Therefore the statement that causation between A and B cannot be Implied from the eorrelation that exists between them, is true. The correlation coefficient 0') reflects the strength of the relationship between two variables. If r is positive. the relationship is positive: that is. if B increases. A increases and vioe versa. If r is equal to zero, there is no ooneiation at all. [fr is negative, the relationship is negative and A will decrease if B increases. Sinoe the r has been calculated as being equal to 43.96, the gaph of the linear relationship slopes down. The coefiicient of determination (R2) is similar to the correlation coefficient in that it also detem'lina the strength of the linear relationship. While it is not concomed with the direction of the slope, it provides a measurEment of the proportion of variability in the observed responses that are explained by the regression model. The statistic R2 can be calculated from r to be equal to 9.9216. A value for R2 of 0.9216 means that 92.16% of the variability can be explained by the simple linear relation. This is also evidence in favour of the claim that the regression line Is a strong fit to the data. Therefore. the statements that the proportion of variability In a that Is explained by the regression model is equal to 92.16% and that the linear regression line provides a wrong fit to the observed daba. are true. :I 3 of 3 ID: M5T.SI.R.REE.03.0040 A biologist is studying the levels of heavy metal contaminants among a population ofthe South Nakaratuan Chubby Bat. The biologist is Interated in constructing a simple linear regression model to investigate the relationship between weight of an animal and the level ofheavv metal contamination. In the proposed regression model the level of contaminant is the response variable and weight is the explanatory variable. The contaminant level is measured in parts per billion (ppb) and weight in grams. A random sample of 20 individuals is selecbed and measurements are taken. Contaminant study in :Iw 9a W2. South Nakaratuan chubby Bat Contaminant level Weight (Pub) (9) 196 11? 166 106 209 133 204 128 175 110 178 112 216 143 207 131 236 149 191 118 189 128 154 105 176 10? 223 141 241 145 183 123 151 110 151 110 203 122 220 13? 256 106 Plotting the data, the researcher notices an obvious outlier. They decide to do the regression analysis with and without the outlier and compare the results. Calculate the slope (bl) and intercept (be) of the simple regression equation using the data provided. Give your answers to 2 decimal places. a) Slope = bl = 0.34 b) Intercept = bg = 56.5? Find the proportion of variation In the values of contaminant level that is explained by the regression model. Give your answer as a decimal to 2 decimal places. c) R2 = 0.42 Repeat this process omitting the outlier: d) Hope = bl = 0.53 e) Intercept = on = 21.42 Find the proportion of variation in the values of oontaminant level that is explained by the regression model. Give your answer as a decimal to 2 decimal places. i) R2 = 0.86 [2 outof 6] a) 111ls is not correct. Slope = bl = 1.24 b) 111ls is not correct. Intercept = pa = 45.72 Feedback [2 out of 6] a) 111is is not correct. Slope = bl = 1.24 b) 111is is not correct. Intercept = bu = 45.72 c) You are correct. d) This is not correct. Slope = bl = 1.62 e) This is not correct. Intercept = b0 = -6.57 f) You are correct. Calculation The outlier is the sample data pointy = 256, x = 106. Using level of contaminant as the response variable and weight as the explanatory variable (given in the question), entering all of the data (including the outlier) into a suitable software package, you should obtain the following results: Regression coefficients Estimate Intercept 45.71595145... Slope 1.23573545... Regression model R2 0.4131427... Ornitting the outlier, you should obtain the following results: Regression coefficients Estimate Intercept 45.71595145... Slope 1.23573545... Regression model R?- 0.4181427... Omitting the outlier, you should obtain the following results: Regression coefficients Estimate Intercept 4557382963... Slope 1.62147796... Regression model RZ 0.85616524 Therefore, you can find the required values to 2 decimal places to be: 3) bl = 1.24 b) DD = 45.72 c) R2 = 0.42 d) :31 = 1.62 e) bu = -6.57 f) R1 = 0.86 1 M 3 ID: MST.SLR.REE.04.0020b 1he regrfision equation: § = 5 + 1.2): was calculated from a sample. It is part ofa regression model that has been developed in order to predict the score in an end-of-year exam based on the score in a mid-year exam for a particular university course. In the sample, mid-year scores ranged from 50 to 80. Select whether or not each of the following ooncluslons are correct from the regression analysis: a) For an increase by one in the mid-year score, the predicted increase in end-of-year score is 1.2. - b) If a student achieves a score of 45 in the mid-year exam then we know that the student will achieve a score of 59 in the end—of—year exam. c) If a student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of F‘? in _ the end-of-year exam. Feedback [1 out of 3] a) This is not correct. For an increase by one in the mid-year score, the predicted increase in end-of-year score is 1.2, is correct. In) You are correct. c) This is not correct. Ifa student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of 77 in the end-of-year exam. is not correct. Discussion Feedback [1 out of 3] a) This is not correct. For an increase by one In the mid-year score, the predicted increase In end-of-year score ls 1.2. is correct. Is) You are correct. c) This is not oon'ect. If a student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of T? in the end-oi-vear exam, is not correct. Discussion Interpreting a regression equation in general, you use simple linear regrefilon in order to predict the value that a dependent variable (y) will assume based on the value assumed by an independent variable (x). A simple linear regression equation is always of the form: §=ba+b1x The values be and b1 are constants. 'Ihe value y is the predicted value for y at a given value of x. So this is the equation you use to predict values ofthe dependent variable. The value he is the value predicted for y if x assumes the value CI provided that x = 0 is within the range of values that are originally sampled. 'Ihe value D1 is the ’slope' of the equation: it Is the amount that y is p‘edlcted to increase by If 3: Is increased by 1. Note that the regression equation can only provide predicted values for y. At a given value ofx, there is no guarantee that y will assume the value predicted by the regression equation. a) This conclusion ls correct. As is explained in generality above. the coefliclent in front of x in the regression equation is known as the slope of the equation. In this qufition, the slope is 1.2. This is interpreted as: for each unit increase in x, the predicted value ofy wfll increase by 1.2. So for an increase by 1 in the mid-year score corresponds to an increase by 1.2 in the predicted end-of-year score. b) This oondusion is not correct. There are two probth with this conclusion. Firstly. the regrasion equation is only used as a prediction for the end-of-year some. if you are given a mid-year score. there is no guarantee that the end-of-year score will be equal to the one given by the regression equation. Secondly, you cannot use the regression equation to preclct anything if the mid-year score is 45. This is because the sample of mid-year scores that allowed the regression equation to be calculated ranged frorn 50 to 80. You can only use a regrfision equation to precict the dependent variable (end-of-year score) if the value of the independent variable (mid-year score) is within the range of the original sample. c] This oondusion is not correct. The reason for this is the same as one oldie reasons that the conclusion in b) was not correct. In particular. with a mid—year some of 6|]. the regression ecpation would give a predicted value of 3?? for the end-of-year score. But there is no guarantee that the end-of-year score really will be 77. 3| 2 of 3 ID: us-rsmneeornmo A simple linear regrfision model is developed in order to predict cholfiterol level in based on the amount of exercice a person does in a week (x). A random sample of 128 people is collected. For each person the amount oi exerclze they do and their molarerol level are recorded. 111e following regression equation was developed: §= 553.57 - 48.90): filling with this, the following valufi were calculated: SSM = 3,227.19 55: = 1,156.83 Calculate the sample coefficient oi correlation (r) between cholesterol level and amount or exerdzie. Give your answer no 2 dedmal plaos. r = 0.?4 to union] This is not correct. r=-0.85 Calculation The coeflicient of correlation measures the amount oi correlation between two variables. it is closely related to the coeificient of determination from simple linear regression, which measures the proportion of variation ln the dependent variable that is predicted by the lndependent variable. in fact, the coefflclent of correlation Ls denoted r while the coefficient of daermination is denoted R2. And this is consistent: the coefficient of correlation ls equal to the square root of the coefficient of determination. You do have to decide whether to take the positive or negative square root of R2 in order to get the correct coefficient of correlation. if two variables are positively correlated (that is. if one increasing correlates with the other increasing) then the coefiicient of correlation is positive. If the two variables are negatively correlated (that is, if one inaeasing correlaba with the other decreasing) then the coefi‘icient of correlation is negative. [f you are given a regrewion equation, you can determine the qualitative nature of how the variables are correlated by looking at the ooelfirjent of I. In this question. the ooefilclent of x ls 48.9. It is negative. so to calculate the coefilclent of correlation vou take the negative square root oi the coefficient oi determination. The coefficient of determination can be calculated using the following formula: Whigs 55M R2 = _ SST Feethaclt [u out of 1] This is not correct. r = fill-85 Calculation The coefficient of correlation measures the amount of correlation between two variables. It is closely related to the coefficient of determination from simple linear regrfision, which measures the proportion of variation in the dependent variable that is predicted by the independent variable. in fact, the coefficient of correlation is denoted r while the coefficient of determination is denoted R2. And this is consistent: the coefficient of correlation is equal to the square root of the coefficient of determination. You do have to decide whether to take the positive or negative square root of R2 in order to get the correct coefficient of correlation. If two variables are positively correlated {that is, if one increasing correlates with the other increasing} then the coefficient of correlation is positive. If the two variables are negatively correlated (that is, lf one increaSIng correlates wlth the other decreaslng) then the coefficient of correlation is negative. If you are given a regression equation, you can determine the qualitative nature of how the variables are correlated by looking at the coefi'lcient of x. In this quation. the coefficient of x is 48.9. It is negative, so to calculate the coefficient oi correlation you take the negative square root of the coefficient of determination. The coefficient of determination can be calculated using the following formula: symmetries 55M 2 = _ R SST 2 3,227. 19 4, 384. 02 {173612575 . _ Therefore, you calculate the coefficient of correlation (r) as the negative square root of this value: -\/(R2) = -\/(D.?36125?S...) = -0.85?9??71... = 41.86 Rounded as iast step r Two variables, x and y, are investigated for an association and are found to be strongly correlated with positive correlation. A simple linear regression model is Constructed. The regression model can explain 923% ofvan'ation in the values of y based upon the values of x. Select all the statements that can be reasonably drawn from this analysis: J J x and y have a strong association. Furthermore, larger values of x tend to occur with larger values of y. x and v are causally related. Furthermore, due to the positive correlation, larger values in it cause larger values of y to occur the value of y tends to be greater than the value of x J J small values of x tend to occur with small values of y small values of x tend to occur with large values of y the value of x tends to be greater than the value of y J I if x is small then it is impossible for y to be very large 3 Feedback [2 out of z] Legend J You are correct. I This is not oorrect. Discussion A simple linear regrfision model can be constructed for two variablfi that are linearly related. Such a model can be used to Himate valua of the response variable (y) based upon values of the explanatory variable (x). The usefulness and accuracy of the regression model will largely depend upon the strength of the relationship between the two variables and is measured by the correlation between them. A positive correlation will mean that the regression line has positive slope and means that larger values of the explanatory variable tend to be related to larger values of the response variable. Strong oonelation, however, is not a definitive indicator of a casual link between two variables. That is, even if two variables are highly correlated it may not follow that they are casually linked. There may be one or more lurking variables that are casually related to both variables in the regression, which is known as common response. It can also be the case that while the explanatory variable may be casually related to the response variable to some extent, there are one or more lurking variables that are also causally related to the response variable and it is difficult to separate the causation, this situation is known as confounding. So it is not reasonable to conclude that, based upon strong correlation, values of the explanatory variable in any way cause values of the response variable. Causality may only be established through a carefully measured and well designed experimental process. 2| 2 of 3 1o: MST.SLR.REE.DE.I)DZD The following table shows the average petrol price and the number of online shopping orders over a given month: Petrol Price and Onllne Shopping Average Petrol Price Number of Online per gallon over a month (5) Shopping Orders 4.9325 5,682 4. 1??5 4, 13'9 2.24 3,267 1.95 2,548 4.26 4,322 1. 56?5 2,749 4.365 4,711 4.635 4,533 3.805 3,959 2.0725 2,442 The relationship between the average petrol prioe and the number of onllne shopping orders in a given month is proposed to follow the simple regression equation bEIOW: show variables § = be + hlx miculate the proportion of variability In the number of onllne shopping orders that Is not explained by the average petrol price. Give your answer as a peroentage to 1. decimal place. Proportion = 90.1 % Feedback [0 out of 1] 2| Feedback [0 out or 1] This is not correct. Proportion - 9.9% Calculation The statistic R2 represents the proportion of variabilityI in the observed response variable (y) that is explained by the explanatory variable (x) in the regression model. It R2 is very high. the relationship between the response variables and the explanatory variables is veryI strong and vine versa. Therefore, the proportion of variability that is not explained by the pride of petrol in the model is the oomplement of R2 and can be calculated by subtracting R2 from 1 (or 100%}. Using a sofiware package, R2 = 090065662... Calculating the Coefllclent of Determination Therefore; Proportion = 1 — R2 = 1 - 030065662... = 039934338... Muftipr by 100 to convert to a percentage = 933433322... = 9.9% Rounded as a last step :| 1 of 3 ID: HST.§LR.REE.01.0010 The human resources department of the Mean Corporation would like to estimate the size of the annual salary that they should J- offer to university graduates. The CEO of the Mean Corporation has suggested that the salary offered (W) should be calculated based on the Grade Point Average (G) of a student. Based on a random sample of_unlversity graduates. the human resouroes department has calculatedthe mean salary ofiered to university graduata to be W and the mean Grade Point Average of university graduates to be G. The Mean Corporation will carry out a regression analysis to investigate the relationship between salary and Grade Point Average. Cor oration Select the dependent variable in the regression analysis that will be conducted: DE|EDI :| Feedback [0 outof 1] This is not oon'ect. The dependent variable In the regrefiion ana|y5is that will be conducted is w. Discussion Regression analysis involves the development of a model that will predict values of a variable based on the values of other variables. Simple linear regression involves predicting the value of a variable based on the value of one other variable. The regression analysis to be conducted by the Mean Corporation is a simple linear regression analysis because the human resouroes department was Interested in predicting the value of w based only on the oorrespondlng value of G. In a simple linear reg'ession model, the variable that you wish to predict is referred to as the dependent variable because it's predicted value depends on the observed value of another variable. The variable that Is used to make the prediction is referred to as the independent variable because It's value is observed and data not depend on the regrfieion model that is being developed. In the regression analysis that will be conducted by the Mean Corporation, W is the dependent ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern