This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
This** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **:l 3 ol' 3 m: nsrsmnsaomozo TWo variables (A and B) are hypothesised to have a linear relationship with one another, as represented by the following equation: A =Bo+BlB+s Data was gathered for thae two variables and the correlation coefﬁcient (r) was calculated to be —o.96. Select all from the following statements that are true: Variable A is known as the explanatory variable, and variable B is known as the response variable.
J The linear regression line provides a strong fit to the observed data.
Causation between A and B cannot be implied from the correlation that exists between them.
Variable B causes variable A since B is the independent variable.
‘/ Be is known as the sample intercept.
J The proportion of variability in A that is explained by the regression model is equal to 92.16%.
The graph of the linear relationship between A and B slopes up. 3 Feedback [1 out of 2] You are partly oorrett. - Causation between Aand 3 cannot be implied from the correlation that exists between ﬂiem.: this option should have been selected.
- The proportion ofvariablllty In Athat ls plained by the regression model ls equal to 92.16%»: you are correct. - The linear regression line provides a strong lit to the observed data.: you are correct. - Bu ls known as Hie sample intercept: this is not correct. Discussion 3 Feedback [1 out of 2] You are partly correct. Causation between Aand B cannot be Implied iron] the correlation that exists between them.: this option should have been selected.
The proportion oi variability In A that Is explained by the regression model Is equal to 92.16%; you are correct. The linear regression line provides a strong fit to the observed dab“ you are correct. Ba la known as the sample Intercept: this is not correct. Variable A is the response variable since It 'raspcnds' to changes in variable B. which is the explanatory variable. Despite this. one cannot claim that the
explanatory 'causes' the response variable or vice versa. The fat: that the two are correlated does not automatically imply that they are causally linked.
Therefore the statement that causation between A and B cannot be Implied from the eorrelation that exists between them, is true. The correlation coefﬁcient 0') reﬂects the strength of the relationship between two variables. If r is positive. the relationship is positive: that is. if B increases. A
increases and vioe versa. If r is equal to zero, there is no ooneiation at all. [fr is negative, the relationship is negative and A will decrease if B increases. Sinoe
the r has been calculated as being equal to 43.96, the gaph of the linear relationship slopes down. The coeﬁicient of determination (R2) is similar to the correlation coefficient in that it also detem'lina the strength of the linear relationship. While it is not
concomed with the direction of the slope, it provides a measurEment of the proportion of variability in the observed responses that are explained by the
regression model. The statistic R2 can be calculated from r to be equal to 9.9216. A value for R2 of 0.9216 means that 92.16% of the variability can be
explained by the simple linear relation. This is also evidence in favour of the claim that the regression line Is a strong ﬁt to the data. Therefore. the statements that the proportion of variability In a that Is explained by the regression model is equal to 92.16% and that the linear
regression line provides a wrong fit to the observed daba. are true. :I 3 of 3 ID: M5T.SI.R.REE.03.0040 A biologist is studying the levels of heavy metal contaminants among a population ofthe South Nakaratuan Chubby Bat. The biologist is Interated in constructing a
simple linear regression model to investigate the relationship between weight of an animal and the level ofheavv metal contamination. In the proposed regression
model the level of contaminant is the response variable and weight is the explanatory variable. The contaminant level is measured in parts per billion (ppb) and
weight in grams. A random sample of 20 individuals is selecbed and measurements are taken. Contaminant study in :Iw 9a W2.
South Nakaratuan chubby Bat Contaminant level Weight
(Pub) (9)
196 11?
166 106
209 133
204 128
175 110
178 112
216 143
207 131
236 149
191 118
189 128
154 105
176 10?
223 141
241 145
183 123
151 110 151 110 203 122 220 13? 256 106 Plotting the data, the researcher notices an obvious outlier. They decide to do the regression analysis with and without the outlier and compare the results. Calculate the slope (bl) and intercept (be) of the simple regression equation using the data provided. Give your answers to 2 decimal places.
a) Slope = bl = 0.34
b) Intercept = bg = 56.5?
Find the proportion of variation In the values of contaminant level that is explained by the regression model. Give your answer as a decimal to 2 decimal places.
c) R2 = 0.42
Repeat this process omitting the outlier:
d) Hope = bl = 0.53
e) Intercept = on = 21.42
Find the proportion of variation in the values of oontaminant level that is explained by the regression model. Give your answer as a decimal to 2 decimal places. i) R2 = 0.86 [2 outof 6] a) 111ls is not correct.
Slope = bl = 1.24 b) 111ls is not correct.
Intercept = pa = 45.72 Feedback [2 out of 6] a) 111is is not correct.
Slope = bl = 1.24 b) 111is is not correct.
Intercept = bu = 45.72 c) You are correct. d) This is not correct.
Slope = bl = 1.62 e) This is not correct.
Intercept = b0 = -6.57 f) You are correct. Calculation The outlier is the sample data pointy = 256, x = 106. Using level of contaminant as the response variable and weight as the explanatory variable (given in the question), entering all of the data (including the outlier)
into a suitable software package, you should obtain the following results: Regression coefﬁcients
Estimate
Intercept 45.71595145...
Slope 1.23573545...
Regression model
R2 0.4131427... Ornitting the outlier, you should obtain the following results: Regression coefﬁcients
Estimate
Intercept 45.71595145...
Slope 1.23573545...
Regression model
R?- 0.4181427... Omitting the outlier, you should obtain the following results: Regression coefﬁcients
Estimate
Intercept 4557382963...
Slope 1.62147796...
Regression model
RZ 0.85616524 Therefore, you can ﬁnd the required values to 2 decimal places to be: 3) bl = 1.24
b) DD = 45.72
c) R2 = 0.42
d) :31 = 1.62
e) bu = -6.57 f) R1 = 0.86 1 M 3 ID: MST.SLR.REE.04.0020b 1he regrﬁsion equation:
§ = 5 + 1.2): was calculated from a sample. It is part ofa regression model that has been developed in order to predict the score in an end-of-year exam based on the score in a
mid-year exam for a particular university course. In the sample, mid-year scores ranged from 50 to 80. Select whether or not each of the following ooncluslons are correct from the regression analysis: a) For an increase by one in the mid-year score, the predicted increase in end-of-year score is 1.2. - b) If a student achieves a score of 45 in the mid-year exam then we know that the student will achieve a score of 59 in
the end—of—year exam. c) If a student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of F‘? in _
the end-of-year exam. Feedback [1 out of 3] a) This is not correct. For an increase by one in the mid-year score, the predicted increase in end-of-year score is 1.2, is correct.
In) You are correct.
c) This is not correct. Ifa student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of 77 in the end-of-year exam. is not correct. Discussion Feedback [1 out of 3] a) This is not correct. For an increase by one In the mid-year score, the predicted increase In end-of-year score ls 1.2. is correct.
Is) You are correct.
c) This is not oon'ect. If a student achieves a score of 60 in the mid-year exam then we know that the student will achieve a score of T? in the end-oi-vear exam, is not correct.
Discussion Interpreting a regression equation in general, you use simple linear regreﬁlon in order to predict the value that a dependent variable (y) will assume based on the value assumed by an
independent variable (x). A simple linear regression equation is always of the form: §=ba+b1x The values be and b1 are constants. 'Ihe value y is the predicted value for y at a given value of x. So this is the equation you use to predict values ofthe
dependent variable. The value he is the value predicted for y if x assumes the value CI provided that x = 0 is within the range of values that are originally sampled. 'Ihe
value D1 is the ’slope' of the equation: it Is the amount that y is p‘edlcted to increase by If 3: Is increased by 1. Note that the regression equation can only provide predicted values for y. At a given value ofx, there is no guarantee that y will assume the value predicted by
the regression equation. a) This conclusion ls correct. As is explained in generality above. the coeﬂiclent in front of x in the regression equation is known as the slope of the equation. In
this quﬁtion, the slope is 1.2. This is interpreted as: for each unit increase in x, the predicted value ofy wﬂl increase by 1.2. So for an increase by 1 in the
mid-year score corresponds to an increase by 1.2 in the predicted end-of-year score. b) This oondusion is not correct. There are two probth with this conclusion. Firstly. the regrasion equation is only used as a prediction for the end-of-year
some. if you are given a mid-year score. there is no guarantee that the end-of-year score will be equal to the one given by the regression equation.
Secondly, you cannot use the regression equation to preclct anything if the mid-year score is 45. This is because the sample of mid-year scores that allowed
the regression equation to be calculated ranged frorn 50 to 80. You can only use a regrﬁsion equation to precict the dependent variable (end-of-year score)
if the value of the independent variable (mid-year score) is within the range of the original sample. c] This oondusion is not correct. The reason for this is the same as one oldie reasons that the conclusion in b) was not correct. In particular. with a mid—year
some of 6|]. the regression ecpation would give a predicted value of 3?? for the end-of-year score. But there is no guarantee that the end-of-year score
really will be 77. 3| 2 of 3 ID: us-rsmneeornmo A simple linear regrﬁsion model is developed in order to predict cholﬁterol level in based on the amount of exercice a person does in a week (x). A random sample
of 128 people is collected. For each person the amount oi exerclze they do and their molarerol level are recorded. 111e following regression equation was developed: §= 553.57 - 48.90): ﬁlling with this, the following valuﬁ were calculated:
SSM = 3,227.19
55: = 1,156.83 Calculate the sample coefﬁcient oi correlation (r) between cholesterol level and amount or exerdzie. Give your answer no 2 dedmal plaos. r = 0.?4 to union] This is not correct.
r=-0.85 Calculation The coeﬂicient of correlation measures the amount oi correlation between two variables. it is closely related to the coeiﬁcient of determination from simple
linear regression, which measures the proportion of variation ln the dependent variable that is predicted by the lndependent variable. in fact, the coefﬂclent of correlation Ls denoted r while the coefﬁcient of daermination is denoted R2. And this is consistent: the coefficient of correlation ls equal to the square root of the coefﬁcient of determination. You do have to decide whether to take the positive or negative square root of R2 in order to get the correct coefﬁcient of correlation.
if two variables are positively correlated (that is. if one increasing correlates with the other increasing) then the coeﬁicient of correlation is positive. If the two
variables are negatively correlated (that is, if one inaeasing correlaba with the other decreasing) then the coeﬁ‘icient of correlation is negative. [f you are given a regrewion equation, you can determine the qualitative nature of how the variables are correlated by looking at the ooelﬁrjent of I. In this
question. the ooeﬁlclent of x ls 48.9. It is negative. so to calculate the coeﬁlclent of correlation vou take the negative square root oi the coefficient oi
determination. The coefﬁcient of determination can be calculated using the following formula: Whigs 55M R2 = _ SST
Feethaclt [u out of 1] This is not correct. r = ﬁll-85 Calculation The coefﬁcient of correlation measures the amount of correlation between two variables. It is closely related to the coefﬁcient of determination from simple
linear regrﬁsion, which measures the proportion of variation in the dependent variable that is predicted by the independent variable. in fact, the coefﬁcient of
correlation is denoted r while the coefﬁcient of determination is denoted R2. And this is consistent: the coefﬁcient of correlation is equal to the square root of the coefﬁcient of determination. You do have to decide whether to take the positive or negative square root of R2 in order to get the correct coefﬁcient of correlation.
If two variables are positively correlated {that is, if one increasing correlates with the other increasing} then the coefﬁcient of correlation is positive. If the two
variables are negatively correlated (that is, lf one increaSIng correlates wlth the other decreaslng) then the coefficient of correlation is negative. If you are given a regression equation, you can determine the qualitative nature of how the variables are correlated by looking at the coeﬁ'lcient of x. In this
quation. the coefﬁcient of x is 48.9. It is negative, so to calculate the coefﬁcient oi correlation you take the negative square root of the coefﬁcient of
determination. The coefﬁcient of determination can be calculated using the following formula: symmetries 55M
2 = _
R SST 2 3,227. 19
4, 384. 02 {173612575 . _ Therefore, you calculate the coefﬁcient of correlation (r) as the negative square root of this value: -\/(R2) = -\/(D.?36125?S...) = -0.85?9??71... = 41.86 Rounded as iast step r Two variables, x and y, are investigated for an association and are found to be strongly correlated with positive correlation. A simple linear regression model is
Constructed. The regression model can explain 923% ofvan'ation in the values of y based upon the values of x. Select all the statements that can be reasonably drawn from this analysis: J J x and y have a strong association. Furthermore, larger values of x tend to occur with larger values of y.
x and v are causally related. Furthermore, due to the positive correlation, larger values in it cause larger values of y to occur
the value of y tends to be greater than the value of x
J J small values of x tend to occur with small values of y
small values of x tend to occur with large values of y
the value of x tends to be greater than the value of y
J I if x is small then it is impossible for y to be very large 3 Feedback [2 out of z] Legend J You are correct.
I This is not oorrect. Discussion A simple linear regrﬁsion model can be constructed for two variablﬁ that are linearly related. Such a model can be used to Himate valua of the response
variable (y) based upon values of the explanatory variable (x). The usefulness and accuracy of the regression model will largely depend upon the strength of the relationship between the two variables and is measured by the
correlation between them. A positive correlation will mean that the regression line has positive slope and means that larger values of the explanatory variable tend to be related to larger
values of the response variable. Strong oonelation, however, is not a deﬁnitive indicator of a casual link between two variables. That is, even if two variables are highly correlated it may not
follow that they are casually linked. There may be one or more lurking variables that are casually related to both variables in the regression, which is known as
common response. It can also be the case that while the explanatory variable may be casually related to the response variable to some extent, there are one or
more lurking variables that are also causally related to the response variable and it is difﬁcult to separate the causation, this situation is known as confounding. So it is not reasonable to conclude that, based upon strong correlation, values of the explanatory variable in any way cause values of the response variable.
Causality may only be established through a carefully measured and well designed experimental process. 2| 2 of 3 1o: MST.SLR.REE.DE.I)DZD The following table shows the average petrol price and the number of online shopping orders over a given month: Petrol Price and Onllne Shopping Average Petrol Price Number of Online
per gallon over a month (5) Shopping Orders 4.9325 5,682 4. 1??5 4, 13'9 2.24 3,267 1.95 2,548 4.26 4,322 1. 56?5 2,749 4.365 4,711 4.635 4,533 3.805 3,959 2.0725 2,442 The relationship between the average petrol prioe and the number of onllne shopping orders in a given month is proposed to follow the simple regression equation
bEIOW: show variables § = be + hlx
miculate the proportion of variability In the number of onllne shopping orders that Is not explained by the average petrol price. Give your answer as a peroentage to 1. decimal place. Proportion = 90.1 % Feedback [0 out of 1] 2| Feedback [0 out or 1] This is not correct.
Proportion - 9.9% Calculation The statistic R2 represents the proportion of variabilityI in the observed response variable (y) that is explained by the explanatory variable (x) in the regression
model. It R2 is very high. the relationship between the response variables and the explanatory variables is veryI strong and vine versa. Therefore, the proportion of variability that is not explained by the pride of petrol in the model is the oomplement of R2 and can be calculated by subtracting R2
from 1 (or 100%}. Using a soﬁware package,
R2 = 090065662... Calculating the Coeﬂlclent of Determination Therefore;
Proportion = 1 — R2
= 1 - 030065662...
= 039934338... Muftipr by 100 to convert to a percentage
= 933433322...
= 9.9% Rounded as a last step :| 1 of 3 ID: HST.§LR.REE.01.0010 The human resources department of the Mean Corporation would like to estimate the size of the annual salary that they should J- offer to university graduates. The CEO of the Mean Corporation has suggested that the salary offered (W) should be calculated based on the Grade Point Average (G) of a student. Based on a random sample of_unlversity graduates. the human resouroes department has calculatedthe mean salary oﬁered to university graduata to be W and the mean Grade Point Average of university graduates to be G. The Mean Corporation will carry out a regression analysis to investigate the relationship between salary and Grade Point Average. Cor oration Select the dependent variable in the regression analysis that will be conducted: DE|EDI :| Feedback [0 outof 1] This is not oon'ect. The dependent variable In the regreﬁion ana|y5is that will be conducted is w. Discussion Regression analysis involves the development of a model that will predict values of a variable based on the values of other variables. Simple linear regression
involves predicting the value of a variable based on the value of one other variable. The regression analysis to be conducted by the Mean Corporation is a simple
linear regression analysis because the human resouroes department was Interested in predicting the value of w based only on the oorrespondlng value of G. In a simple linear reg'ession model, the variable that you wish to predict is referred to as the dependent variable because it's predicted value depends on the
observed value of another variable. The variable that Is used to make the prediction is referred to as the independent variable because It's value is observed and
data not depend on the regrﬁeion model that is being developed. In the regression analysis that will be conducted by the Mean Corporation, W is the dependent ...

View
Full Document

- Fall '13
- ChristaLSorola
- Linear Regression, Regression Analysis, Errors and residuals in statistics