**Unformatted text preview: **2021/10/4 下午9:58 Multiple Regression | Boundless Statistics Boundless Statistics
Correlation and Regression Multiple Regression 1/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics Write Wit
Ad "Gramm
better." - Fo Ad Grammarly
Learn M Multiple Regression Models
Multiple regression is used to find an equation that best predicts the Y
variable as a linear function of the multiple X variables. LEARNING OBJECTIVES Describe how multiple regression can be used to predict an
unknown Y value based on a corresponding set of X values or
understand functional relationships between the dependent
and independent variables. KEY TAKEAWAYS Key Points One use of multiple regression is prediction or estimation of an unknown Y value corresponding to a set of X
values. 2/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics A second use of multiple regression is to try to understand the functional relationships between the dependent and independent variables, to try to see what
might be causing the variation in the dependent
variable.
The main null hypothesis of a multiple regression is that
there is no relationship between the X variables and the
Y variables–i.e. that the fit of the observed Y values to those predicted by the multiple regression equation is
no better than what you would expect by chance.
Key Terms multiple regression: regression model used to find an equation that best predicts the Y variable as a linear
function of multiple X variables
null hypothesis: A hypothesis set up to be refuted in or- der to support an alternative hypothesis; presumed true
until statistical evidence in the form of a hypothesis test
indicates otherwise. When To Use Multiple Regression
You use multiple regression when you have three or more measurement
variables. One of the measurement variables is the dependent (Y) variable. The rest of the variables are the independent (X) variables. The
purpose of a multiple regression is to find an equation that best predicts
the Y variable as a linear function of the X variables. Multiple Regression For Prediction
One use of multiple regression is prediction or estimation of an unknown
Y value corresponding to a set of X values. For example, let’s say you’re interested in finding a suitable habitat to reintroduce the rare beach tiger
beetle, Cicindela dorsalis dorsalis, which lives on sandy beaches on the
Atlantic coast of North America. You’ve gone to a number of beaches
that already have the beetles and measured the density of tiger beetles
(the dependent variable) and several biotic and abiotic factors, such as 3/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics wave exposure, sand particle size, beach steepness, density of amphipods and other prey organisms, etc. Multiple regression would give
you an equation that would relate the tiger beetle density to a function of
all the other variables. Then, if you went to a beach that didn’t have tiger
beetles and measured all the independent variables (wave exposure,
sand particle size, etc.), you could use the multiple regression equation
to predict the density of tiger beetles that could live there if you introduced them. Multiple Regression For
Understanding Causes
A second use of multiple regression is to try to understand the
functional relationships between
the dependent and independent
variables, to try to see what might
be causing the variation in the
dependent variable. For example,
if you did a regression of tiger Atlantic Beach Tiger Beetle: This is
the Atlantic beach tiger beetle
(Cicindela dorsalis dorsalis), which is
the subject of the multiple
regression study in this atom. beetle density on sand particle
size by itself, you would probably see a significant relationship. If you did
a regression of tiger beetle density on wave exposure by itself, you
would probably see a significant relationship. However, sand particle size
and wave exposure are correlated; beaches with bigger waves tend to
have bigger sand particles. Maybe sand particle size is really important,
and the correlation between it and wave exposure is the only reason for
a significant regression between wave exposure and beetle density. Multiple regression is a statistical way to try to control for this; it can answer
questions like, “If sand particle size (and every other measured variable)
were the same, would the regression of beetle density on wave exposure be significant? ” Null Hypothesis
The main null hypothesis of a multiple regression is that there is no relationship between the X variables and the Y variables– in other words,
that the fit of the observed Y values to those predicted by the multiple
regression equation is no better than what you would expect by chance. 4/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics As you are doing a multiple regression, there is also a null hypothesis for
each X variable, meaning that adding that X variable to the multiple regression does not improve the fit of the multiple regression equation any
more than expected by chance. Estimating and Making Inferences About the Slope
The purpose of a multiple regression is to find an equation that best predicts the Y variable as a linear function of the X variables. LEARNING OBJECTIVES Discuss how partial regression coefficients (slopes) allow us to
predict the value of Y given measured X values. KEY TAKEAWAYS Key Points Partial regression coefficients (the slopes ) and the intercept are found when creating an equation of regression
so that they minimize the squared deviations between
the expected and observed values of Y.
If you had the partial regression coefficients and measured the X variables, you could plug them into the
equation and predict the corresponding value of Y.
The standard partial regression coefficient is the number of standard deviations that Y would change for
every one standard deviation change in X , if all the
1 other X variables could be kept constant.
Key Terms 5/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics standard partial regression coefficient: the number of standard deviations that Y would change for every one
standard deviation change in X , if all the other X vari1 ables could be kept constant
partial regression coefficient: a value indicating the ef- fect of each independent variable on the dependent
variable with the influence of all the remaining variables
held constant. Each coefficient is the slope between the
dependent variable and each of the independent
variables
p-value: The probability of obtaining a test statistic at least as extreme as the one that was actually observed,
assuming that the null hypothesis is true. You use multiple regression when you have three or more measurement
variables. One of the measurement variables is the dependent (Y) variable. The rest of the variables are the independent (X) variables. The
purpose of a multiple regression is to find an equation that best predicts
the Y variable as a linear function of the Xvariables. How It Works
The basic idea is that an equation is found like this: Yexp = a + b1 X1 + b2 X2 + b3 X3 + ⋯ The Y exp is the expected value of Y for a given set of X values. b is the
1 estimated slope of a regression of Y on X , if all of the other X variables
1 could be kept constant. This concept applies similarly for b , b , et
2 3 cetera. a is the intercept. Values of b , et cetera, (the “partial regression
1 coefficients”) and the intercept are found so that they minimize the
squared deviations between the expected and observed values of Y.
How well the equation fits the data is expressed by R , the “coefficient
2 of multiple determination. ” This can range from 0 (for no relationship between the X and Y variables) to 1 (for a perfect fit, i.e. no difference be- 6/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics tween the observed and expected Y values). The p-value is a function of
the R , the number of observations, and the number of X variables.
2 Importance of Slope (Partial Regression Coefficients)
When the purpose of multiple regression is prediction, the important result is an equation containing partial regression coefficients (slopes). If
you had the partial regression coefficients and measured the X variables, you could plug them into the equation and predict the corresponding value of Y. The magnitude of the partial regression coefficient
depends on the unit used for each variable. It does not tell you anything
about the relative importance of each variable.
When the purpose of multiple regression is understanding functional relationships, the important result is an equation containing standard partial regression coefficients, like this:
′ ′ ′ 1 1 yexp = a + b x ′ ′ 2 2 + b x ′ ′ 3 3 + b x + ⋯ Where b is the standard partial regression coefficient of y on X . It is
′ 1 1 the number of standard deviations that Y would change for every one
standard deviation change in X , if all the other X variables could be
1 kept constant. The magnitude of the standard partial regression coefficients tells you something about the relative importance of different variables; X variables with bigger standard partial regression coefficients
have a stronger relationship with the Y variable. 7/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics Linear Regression: A graphical representation of a best fit line for simple
linear regression. Evaluating Model Utility
The results of multiple regression should be viewed with caution. LEARNING OBJECTIVES Evaluate the potential drawbacks of multiple regression. KEY TAKEAWAYS Key Points You should examine the linear regression of the dependent variable on each independent variable, one at a
time, examine the linear regressions between each pair
of independent variables, and consider what you know
about the subject matter. 8/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics You should probably treat multiple regression as a way
of suggesting patterns in your data, rather than rigorous
hypothesis testing.
If independent variables A and B are both correlated
with Y, and A and B are highly correlated with each
other, only one may contribute significantly to the model, but it would be incorrect to blindly conclude that the
variable that was dropped from the model has no
significance.
Key Terms independent variable: in an equation, any variable whose value is not dependent on any other in the
equation
dependent variable: in an equation, the variable whose value depends on one or more variables in the equation
multiple regression: regression model used to find an equation that best predicts the Y variable as a linear
function of multiple X variables Multiple regression is beneficial in some respects, since it can show the
relationships between more than just two variables; however, it should
not always be taken at face value.
It is easy to throw a big data set at a multiple regression and get an impressive-looking output. But many people are skeptical of the usefulness
of multiple regression, especially for variable selection, and you should
view the results with caution. You should examine the linear regression
of the dependent variable on each independent variable, one at a time,
examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter. You should
probably treat multiple regression as a way of suggesting patterns in
your data, rather than rigorous hypothesis testing.
If independent variables A and B are both correlated with Y, and A and
B are highly correlated with each other, only one may contribute signifi- 9/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics cantly to the model, but it would be incorrect to blindly conclude that the
variable that was dropped from the model has no biological importance.
For example, let’s say you did a multiple regression on vertical leap in
children five to twelve years old, with height, weight, age, and score on a
reading test as independent variables. All four independent variables are
highly correlated in children, since older children are taller, heavier, and
more literate, so it’s possible that once you’ve added weight and age to
the model, there is so little variation left that the effect of height is not
significant. It would be biologically silly to conclude that height had no
influence on vertical leap. Because reading ability is correlated with age,
it’s possible that it would contribute significantly to the model; this might
suggest some interesting followup experiments on children all of the
same age, but it would be unwise to conclude that there was a real effect
of reading ability and vertical leap based solely on the multiple
regression. Linear Regression: Random data points and their linear regression. Using the Model for Estimation and Prediction
Standard multiple regression involves several independent variables predicting the dependent variable. 10/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics LEARNING OBJECTIVES Analyze the predictive value of multiple regression in terms of
the overall model and how well each independent variable
predicts the dependent variable. KEY TAKEAWAYS Key Points In addition to telling us the predictive value of the overall model, standard multiple regression tells us how well
each independent variable predicts the dependent variable, controlling for each of the other independent
variables.
Significance levels of 0.05 or lower are typically considered significant, and significance levels between 0.05
and 0.10 would be considered marginal.
An independent variable that is a significant predictor of
a dependent variable in simple linear regression may
not be significant in multiple regression.
Key Terms significance level: A measure of how likely it is to draw a false conclusion in a statistical test, when the results
are really just random variations.
multiple regression: regression model used to find an equation that best predicts the Y variable as a linear
function of multiple X variables Using Multiple Regression for Prediction 11/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics Standard multiple regression is the same idea as simple linear regression, except now we have several independent variables predicting the
dependent variable. Imagine that we wanted to predict a person’s height
from the gender of the person and from the weight. We would use standard multiple regression in which gender and weight would be the independent variables and height would be the dependent variable. The resulting output would tell us a number of things. First, it would tell us how
much of the variance in height is accounted for by the joint predictive
power of knowing a person’s weight and gender. This value is denoted
by R . The output would also tell us if the model allows the prediction of
2 a person’s height at a rate better than chance. This is denoted by the significance level of the model. Within the social sciences, a significance
level of 0.05 is often considered the standard for what is acceptable.
Therefore, in our example, if the statistic is 0.05 (or less), then the model
is considered significant. In other words, there is only a 5 in a 100 chance
(or less) that there really is not a relationship between height, weight and
gender. If the significance level is between 0.05 and 0.10, then the model
is considered marginal. In other words, the model is fairly good at predicting a person’s height, but there is between a 5-10% probability that
there really is not a relationship between height, weight and gender.
In addition to telling us the predictive value of the overall model, standard multiple regression tells us how well each independent variable
predicts the dependent variable, controlling for each of the other independent variables. In our example, the regression analysis would tell us
how well weight predicts a person’s height, controlling for gender, as
well as how well gender predicts a person’s height, controlling for
weight.
To see if weight is a “significant” predictor of height, we would look at the
significance level associated with weight. Again, significance levels of
0.05 or lower would be considered significant, and significance levels
between 0.05 and 0.10 would be considered marginal. Once we have
determined that weight is a significant predictor of height, we would
want to more closely examine the relationship between the two variables. In other words, is the relationship positive or negative? In this example, we would expect that there would be a positive relationship. In
other words, we would expect that the greater a person’s weight, the
greater the height. (A negative relationship is present in the case in 12/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics which the greater a person’s weight, the shorter the height. ) We can determine the direction of the relationship between weight and height by
looking at the regression coefficient associated with weight.
A similar procedure shows us how well gender predicts height. As with
weight, we would check to see if gender is a significant predictor of
height, controlling for weight. The difference comes when determining
the exact nature of the relationship between gender and height. That is,
it does not make sense to talk about the effect on height as gender increases or decreases, since gender is not a continuous variable. Conclusion
As mentioned, the significance levels given for each independent variable indicate whether that particular independent variable is a significant
predictor of the dependent variable, over and above the other independent variables. Because of this, an independent variable that is a significant predictor of a dependent variable in simple linear regression may
not be significant in multiple regression (i.e., when other independent
variables are added into the equation). This could happen because the
covariance that the first independent variable shares with the dependent
variable could overlap with the covariance that is shared between the
second independent variable and the dependent variable. Consequently,
the first independent variable is no longer uniquely predictive and would
not be considered significant in multiple regression. Because of this, it is
possible to get a highly significant R , but have none of the independent
2 variables be significant. 13/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics Multiple Regression: This image shows data points and their linear
regression. Multiple regression is the same idea as single regression,
except we deal with more than one independent variables predicting
the dependent variable. Interaction Models
In regression analysis, an interaction may arise when considering the relationship among three or more variables. LEARNING OBJECTIVES Outline the problems that can arise when the simultaneous influence of two variables on a third is not additive. KEY TAKEAWAYS Key Points 14/43 2021/10/4 下午9:58 Multiple Regression | Boundless Statistics If two variables of interest interact, the relationship between each of the interacting variables and a third “dependent variable” depends on the value of the other interacting variable.
In practice, the presence of interacting variables makes
it more difficult to predict the consequences of changing the value of a variable, particularly if the variables it
interacts with are hard to measure or difficult to control.
The interaction between an explanatory variable and an
environmental variable suggests that the effect of the
explanatory variable has been moderated or modified
by the environmental variable.
Key Terms interaction variable: A variable constructed from an original set of variables to try to represent either all of
the interaction present or some part of it. In statistics, an interaction may arise when considering the relationship
among three or more variables, ...

View
Full Document