MultRegTheory - Gregory Carey 1998 Math & Regression -...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
© Gregory Carey 1998 Math & Regression - 1 The Mathematical Theory of Regression The basic formulation for multiple regression is diagramed in Figure 1. Here, there are a series of predictor variables, termed x 1 , x 2 , . . . x n . There are p of these predictor variables. Many statisticians refer to the predictor variables as independent variables or IV s. There is a single predicted variable, y , also known as the dependent variable or DV . Variable u is a residual , sometimes called an error variable or a disturbance . The b i s are partial regression coefficients ; b i gives the predictive influence of the i th IV on the DV controlling for all the other IV s. The term r ij is the correlation between the i th and j th IV and a is a constant reflecting the scale of measurement for y . Now consider the k th indidivual in a sample. Let y k denote the k th individual's score on the DV and let x ki denote the k th individual's score on the i th IV . The multiple regression model writes the k th individual's DV as a weighted linear combination of the k th individual's scores on the p IV s plus the k th individual's score on the residual. In algebra, we may write k Y = a + 1 b k1 X + 2 b k2 X + . .. p b kp k U (1) X 1 X 2 X p Y U b 1 b 2 b p r 12 r 2 p r 1p Figure 1. Pictorial or path diagram of the multiple regression model.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Gregory Carey 1998 Math & Regression - 2 It is convenient to express the multiple regression model in matrix form. Let us write out the structural equations for all N individual's in a sample: Y 1 Y 2 M Y N = 1 X 11 X 12 L X 1 p 1 X 21 X 22 L X 2 p M M M M M 1 X N 1 X N 2 L X Np a b 1 b 2 M b p + U 1 U 2 M U N (2) Let y denote the column vector of Y i s; let X denote the ( N by p +1) matrix of the X ij s; let b denote the column vector consisting of a and the partial regression coefficients b 1 through b p ; and let u denote the column vector of residuals. Equation (2) may be parismoniously written in matrix form as y = Xb + u (3) form as Another way to write Equation (3) is to note that the observed y score for an individual may be written as the sum of the predicted y plus error ( u ). For the k th individual we may write Y k ˆ Y k +U k (4) Let y denote the ( N by 1) column vector of predicted scores. Then another way to write Equation (3) is y ˆ y u (5) and with a little algebraic manipulation, we can prove that ˆ y Xb . (6) Estimation of the Parameters We now want to find a solution for vector b . To do this we require some criteria for picking reasonable values for the b i s (and a ) from all the possible values that the b i s can take. (Mathematically, when there are more observations than IV s, then there are more structural equations than there are unknowns and the solution is overdetermined .) The traditional criterion for selecting the b i s is that we want those that minimize the square prediction errors . That is, we want to minimize
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 11

MultRegTheory - Gregory Carey 1998 Math & Regression -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online