Unformatted Document Excerpt
Coursehero >>
New York >>
Columbia >>
STAT 4315
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Linear Part
Simple Regression
I
Chapter
1
Linear Regression with One Predictor Variable
Regression analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that a response or outcome variable can be predicted from the other, or others. This methodology is widely used in business, the social and behavioral sciences, the biological sciences, and many other disciplines. A few examples of applications are: 1. Sales of a product can be predicted by utilizing the relationship between sales and amount of advertising expenditures. 2. The performance of an employee on a job can be predicted by utilizing the relationship between performance and a battery of aptitude tests. 3. The size of the vocabulary of a child can be predicted by utilizing the relationship between size of vocabulary and age of the child and amount of education of the parents. 4. The length of hospital stay of a surgical patient can be predicted by utilizing the relationship between the time in the hospital and the severity of the operation. In Part I we take up regression analysis when a single predictor variable is used for predicting the response or outcome variable of interest. In Parts II and III, we consider regression analysis when two or more variables are used for making predictions. In this chapter, we consider the basic ideas of regression analysis and discuss the estimation of the parameters of regression models containing a single predictor variable.
1.1
Relations between Variables
The concept of a relation between two variables, such as between family income and family expenditures for housing, is a familiar one. We distinguish between a functional relation and a statistical relation, and consider each of these in turn.
Functional Relation between Two Variables
2
A functional relation between two variables is expressed by a mathematical formula. If X denotes the independent variable and Y the dependent variable, a functional relation is
Chapter 1
Linear Regression with One Predictor Variable
3
FIGURE 1.1
Example of Functional Relation.
Dollar Sales
Y 300
200
100
Y
2X
0
50
100 Units Sold
150
X
of the form: Y = f (X ) Given a particular value of X , the function f indicates the corresponding value of Y . Consider the relation between dollar sales (Y ) of a product sold at a fixed price and number of units sold (X ). If the selling price is $2 per unit, the relation is expressed by the equation: Y = 2X This functional relation is shown in Figure 1.1. Number of units sold and dollar sales during three recent periods (while the unit price remained constant at $2) were as follows:
Number of Units Sold 75 25 130 Dollar Sales $150 50 260
Example
Period 1 2 3
These observations are plotted also in Figure 1.1. Note that all fall directly on the line of functional relationship. This is characteristic of all functional relations.
Statistical Relation between Two Variables
A statistical relation, unlike a functional relation, is not a perfect one. In general, the observations for a statistical relation do not fall directly on the curve of relationship.
Example 1
Performance evaluations for 10 employees were obtained at midyear and at year-end. These data are plotted in Figure 1.2a. Year-end evaluations are taken as the dependent or response variable Y , and midyear evaluations as the independent, explanatory, or predictor
4 Part One
Simple Linear Regression
FIGURE 1.2 Statistical Relation between Midyear Performance Evaluation and Year-End Evaluation.
(a) Scatter Plot Y 90 80 70 60 Y 90 80 70 60 (b) Scatter Plot and Line of Statistical Relationship
Year-End Evaluation
Year-End Evaluation
0
60
70 80 90 Midyear Evaluation
X
0
60
70 80 90 Midyear Evaluation
X
variable X . The plotting is done as before. For instance, the midyear and year-end performance evaluations for the first employee are plotted at X = 90, Y = 94. Figure 1.2a clearly suggests that there is a relation between midyear and year-end evaluations, in the sense that the higher the midyear evaluation, the higher tends to be the year-end evaluation. However, the relation is not a perfect one. There is a scattering of points, suggesting that some of the variation in year-end evaluations is not accounted for by midyear performance assessments. For instance, two employees had midyear evaluations of X = 80, yet they received somewhat different year-end evaluations. Because of the scattering of points in a statistical relation, Figure 1.2a is called a scatter diagram or scatter plot. In statistical terminology, each point in the scatter diagram represents a trial or a case. In Figure 1.2b, we have plotted a line of relationship that describes the statistical relation between midyear and year-end evaluations. It indicates the general tendency by which yearend evaluations vary with the level of midyear performance evaluation. Note that most of the points do not fall directly on the line of statistical relationship. This scattering of points around the line represents variation in year-end evaluations that is not associated with midyear performance evaluation and that is usually considered to be of a random nature. Statistical relations can be highly useful, even though they do not have the exactitude of a functional relation.
Example 2
Figure 1.3 presents data on age and level of a steroid in plasma for 27 healthy females between 8 and 25 years old. The data strongly suggest that the statistical relationship is curvilinear (not linear). The curve of relationship has also been drawn in Figure 1.3. It implies that, as age increases, steroid level increases up to a point and then begins to level off. Note again the scattering of points around the curve of statistical relationship, typical of all statistical relations.
Chapter 1
Linear Regression with One Predictor Variable
5
FIGURE 1.3 Curvilinear Statistical Relation between Age and Steroid Level in Healthy Females Aged 8 to 25.
Y 30 25 Steroid Level 20 15 10 5 0 10 15 Age (years) 20 25 X
1.2
Regression Models and Their Uses
Regression analysis was first developed by Sir Francis Galton in the latter part of the 19th century. Galton had studied the relation between heights of parents and children and noted that the heights of children of both tall and short parents appeared to "revert" or "regress" to the mean of the group. He considered this tendency to be a regression to "mediocrity." Galton developed a mathematical description of this regression tendency, the precursor of today's regression models. The term regression persists to this day to describe statistical relations between variables.
Historical Origins
Basic Concepts
A regression model is a formal means of expressing the two essential ingredients of a statistical relation: 1. A tendency of the response variable Y to vary with the predictor variable X in a systematic fashion. 2. A scattering of points around the curve of statistical relationship. These two characteristics are embodied in a regression model by postulating that: 1. There is a probability distribution of Y for each level of X. 2. The means of these probability distributions vary in some systematic fashion with X.
Example
Consider again the performance evaluation example in Figure 1.2. The year-end evaluation Y is treated in a regression model as a random variable. For each level of midyear performance evaluation, there is postulated a probability distribution of Y . Figure 1.4 shows such a probability distribution for X = 90, which is the midyear evaluation for the first employee.
6 Part One
Simple Linear Regression
FIGURE 1.4
Pictorial Representation of Regression Model.
Ev al ua tio n
Y 90
Ye ar -E nd
70
Regression Curve Probability Distribution of Y
50
0
50
70 Midyear Evaluation
90
X
The actual year-end evaluation of this employee, Y = 94, is then viewed as a random selection from this probability distribution. Figure 1.4 also shows probability distributions of Y for midyear evaluation levels X = 50 and X = 70. Note that the means of the probability distributions have a systematic relation to the level of X . This systematic relationship is called the regression function of Y on X . The graph of the regression function is called the regression curve. Note that in Figure 1.4 the regression function is slightly curvilinear. This would imply for our example that the increase in the expected (mean) year-end evaluation with an increase in midyear performance evaluation is retarded at higher levels of midyear performance. Regression models may differ in the form of the regression function (linear, curvilinear), in the shape of the probability distributions of Y (symmetrical, skewed), and in other ways. Whatever the variation, the concept of a probability distribution of Y for any given X is the formal counterpart to the empirical scatter in a statistical relation. Similarly, the regression curve, which describes the relation between the means of the probability distributions of Y and the level of X , is the counterpart to the general tendency of Y to vary with X systematically in a statistical relation. Regression Models with More than One Predictor Variable. Regression models may contain more than one predictor variable. Three examples follow. 1. In an efficiency study of 67 branch offices of a consumer finance chain, the response variable was direct operating cost for the year just ended. There were four predictor variables: average size of loan outstanding during the year, average number of loans outstanding, total number of new loan applications processed, and an index of office salaries. 2. In a tractor purchase study, the response variable was volume (in horsepower) of tractor purchases in a sales territory of a farm equipment firm. There were nine predictor variables, including average age of tractors on farms in the territory, number of farms in the territory, and a quantity index of crop production in the territory. 3. In a medical study of short children, the response variable was the peak plasma growth hormone level. There were 14 predictor variables, including age, gender, height, weight, and 10 skinfold measurements. The model features represented in Figure 1.4 must be extended into further dimensions when there is more than one predictor variable. With two predictor variables X 1 and X 2 ,
Chapter 1
Linear Regression with One Predictor Variable
7
for instance, a probability distribution of Y for each (X 1 , X 2 ) combination is assumed by the regression model. The systematic relation between the means of these probability distributions and the predictor variables X 1 and X 2 is then given by a regression surface.
Construction of Regression Models
Selection of Predictor Variables. Since reality must be reduced to manageable proportions whenever we construct models, only a limited number of explanatory or predictor variables can--or should--be included in a regression model for any situation of interest. A central problem in many exploratory studies is therefore that of choosing, for a regression model, a set of predictor variables that is "good" in some sense for the purposes of the analysis. A major consideration in making this choice is the extent to which a chosen variable contributes to reducing the remaining variation in Y after allowance is made for the contributions of other predictor variables that have tentatively been included in the regression model. Other considerations include the importance of the variable as a causal agent in the process under analysis; the degree to which observations on the variable can be obtained more accurately, or quickly, or economically than on competing variables; and the degree to which the variable can be controlled. In Chapter 9, we will discuss procedures and problems in choosing the predictor variables to be included in the regression model. Functional Form of Regression Relation. The choice of the functional form of the regression relation is tied to the choice of the predictor variables. Sometimes, relevant theory may indicate the appropriate functional form. Learning theory, for instance, may indicate that the regression function relating unit production cost to the number of previous times the item has been produced should have a specified shape with particular asymptotic properties. More frequently, however, the functional form of the regression relation is not known in advance and must be decided upon empirically once the data have been collected. Linear or quadratic regression functions are often used as satisfactory first approximations to regression functions of unknown nature. Indeed, these simple types of regression functions may be used even when theory provides the relevant functional form, notably when the known form is highly complex but can be reasonably approximated by a linear or quadratic regression function. Figure 1.5a illustrates a case where the complex regression function
FIGURE 1.5 Uses of Linear Regression Functions to Approximate Complex Regression Functions--Bold Line Is the True Regression Function and Dotted Line Is the Regression Approximation.
(a) Linear Approximation Y (b) Piecewise Linear Approximation Y
X
X
8 Part One
Simple Linear Regression
may be reasonably approximated by a linear regression function. Figure 1.5b provides an example where two linear regression functions may be used "piecewise" to approximate a complex regression function. Scope of Model. In formulating a regression model, we usually need to restrict the coverage of the model to some interval or region of values of the predictor variable(s). The scope is determined either by the design of the investigation or by the range of data at hand. For instance, a company studying the effect of price on sales volume investigated six price levels, ranging from $4.95 to $6.95. Here, the scope of the model is limited to price levels ranging from near $5 to near $7. The shape of the regression function substantially outside this range would be in serious doubt because the investigation provided no evidence as to the nature of the statistical relation below $4.95 or above $6.95.
Uses of Regression Analysis
Regression analysis serves three major purposes: (1) description, (2) control, and (3) prediction. These purposes are illustrated by the three examples cited earlier. The tractor purchase study served a descriptive purpose. In the study of branch office operating costs, the main purpose was administrative control; by developing a usable statistical relation between cost and the predictor variables, management was able to set cost standards for each branch office in the company chain. In the medical study of short children, the purpose was prediction. Clinicians were able to use the statistical relation to predict growth hormone deficiencies in short children by using simple measurements of the children. The several purposes of regression analysis frequently overlap in practice. The branch office example is a case in point. Knowledge of the relation between operating cost and characteristics of the branch office not only enabled management to set cost standards for each office but management could also predict costs, and at the end of the fiscal year it could compare the actual branch cost against the expected cost.
Regression and Causality
The existence of a statistical relation between the response variable Y and the explanatory or predictor variable X does not imply in any way that Y depends causally on X . No matter how strong is the statistical relation between X and Y , no cause-and-effect pattern is necessarily implied by the regression model. For example, data on size of vocabulary (X ) and writing speed (Y ) for a sample of young children aged 510 will show a positive regression relation. This relation does not imply, however, that an increase in vocabulary causes a faster writing speed. Here, other explanatory variables, such as age of the child and amount of education, affect both the vocabulary (X ) and the writing speed (Y ). Older children have a larger vocabulary and a faster writing speed. Even when a strong statistical relationship reflects causal conditions, the causal conditions may act in the opposite direction, from Y to X . Consider, for instance, the calibration of a thermometer. Here, readings of the thermometer are taken at different known temperatures, and the regression relation is studied so that the accuracy of predictions made by using the thermometer readings can be assessed. For this purpose, the thermometer reading is the predictor variable X , and the actual temperature is the response variable Y to be predicted. However, the causal pattern here does not go from X to Y , but in the opposite direction: the actual temperature (Y ) affects the thermometer reading (X ).
Chapter 1
Linear Regression with One Predictor Variable
9
These examples demonstrate the need for care in drawing conclusions about causal relations from regression analysis. Regression analysis by itself provides no information about causal patterns and must be supplemented by additional analyses to obtain insights about causal relations.
Use of Computers
Because regression analysis often entails lengthy and tedious calculations, computers are usually utilized to perform the necessary calculations. Almost every statistics package for computers contains a regression component. While packages differ in many details, their basic regression output tends to be quite similar. After an initial explanation of required regression calculations, we shall rely on computer calculations for all subsequent examples. We illustrate computer output by presenting output and graphics from BMDP (Ref. 1.1), MINITAB (Ref. 1.2), SAS (Ref. 1.3), SPSS (Ref. 1.4), SYSTAT (Ref. 1.5), JMP (Ref. 1.6), S-Plus (Ref. 1.7), and MATLAB (Ref. 1.8).
1.3
Simple Linear Regression Model with Distribution of Error Terms Unspecified
In Part I we consider a basic regression model where there is only one predictor variable and the regression function is linear. The model can be stated as follows: Yi = 0 + 1 X i + i where: Yi is the value of the response variable in the ith trial 0 and 1 are parameters X i is a known constant, namely, the value of the predictor variable in the ith trial i is a random error term with mean E{i } = 0 and variance 2 {i } = 2 ; i and j are uncorrelated so that their covariance is zero (i.e., {i , j } = 0 for all i, j; i = j) i = 1, . . . , n Regression model (1.1) is said to be simple, linear in the parameters, and linear in the predictor variable. It is "simple" in that there is only one predictor variable, "linear in the parameters," because no parameter appears as an exponent or is multiplied or divided by another parameter, and "linear in the predictor variable," because this variable appears only in the first power. A model that is linear in the parameters and in the predictor variable is also called a first-order model. (1.1)
Formal Statement of Model
Important Features of Model
1. The response Yi in the ith trial is the sum of two components: (1) the constant term 0 + 1 X i and (2) the random term i . Hence, Yi is a random variable. 2. Since E{i } = 0, it follows from (A.13c) in Appendix A that: E{Yi } = E{0 + 1 X i + i } = 0 + 1 X i + E{i } = 0 + 1 X i Note that 0 + 1 X i plays the role of the constant a in (A.13c).
10 Part One
Simple Linear Regression
Thus, the response Yi , when the level of X in the ith trial is X i , comes from a probability distribution whose mean is: E{Yi } = 0 + 1 X i We therefore know that the regression function for model (1.1) is: E{Y } = 0 + 1 X (1.3) (1.2)
since the regression function relates the means of the probability distributions of Y for given X to the level of X . 3. The response Yi in the ith trial exceeds or falls short of the value of the regression function by the error term amount i . 4. The error terms i are assumed to have constant variance 2 . It therefore follows that the responses Yi have the same constant variance: 2 {Yi } = 2 since, using (A.16a), we have: 2 {0 + 1 X i + i } = 2 {i } = 2 Thus, regression model (1.1) assumes that the probability distributions of Y have the same variance 2 , regardless of the level of the predictor variable X . 5. The error terms are assumed to be uncorrelated. Since the error terms i and j are uncorrelated, so are the responses Yi and Y j . 6. In summary, regression model (1.1) implies that the responses Yi come from probability distributions whose means are E{Yi } = 0 + 1 X i and whose variances are 2 , the same for all levels of X . Further, any two responses Yi and Y j are uncorrelated. (1.4)
Example
A consultant for an electrical distributor is studying the relationship between the number of bids requested by construction contractors for basic lighting equipment during a week and the time required to prepare the bids. Suppose that regression model (1.1) is applicable and is as follows: Yi = 9.5 + 2.1X i + i where X is the number of bids prepared in a week and Y is the number of hours required to prepare the bids. Figure 1.6 contains a presentation of the regression function: E{Y } = 9.5 + 2.1X Suppose that in the ith week, X i = 45 bids are prepared and the actual number of hours required is Yi = 108. In that case, the error term value is i = 4, for we have E{Yi } = 9.5 + 2.1(45) = 104 and Yi = 108 = 104 + 4 Figure 1.6 displays the probability distribution of Y when X = 45 and indicates from where in this distribution the observation Yi = 108 came. Note again that the error term i is simply the deviation of Yi from its mean value E{Yi }.
Chapter 1
Linear Regression with One Predictor Variable
11
FIGURE 1.6
Illustration of Simple Linear Regression Model (1.1).
Y Yi
Ho ur s
i
108 4
100 E{Yi } 104
60 E{Y } 9.5 2.1X
0
25
45 Number of Bids Prepared
X
FIGURE 1.7
Meaning of Parameters of Simple Linear Regression Model (1.1).
Y E{Y } 50 9.5 2.1X
Hours
1
2.1
Unit Increase in X
0
9.5 10 20 30 40 X Number of Bids Prepared
0
Figure 1.6 also shows the probability distribution of Y when X = 25. Note that this distribution exhibits the same variability as the probability distribution when X = 45, in conformance with the requirements of regression model (1.1).
Meaning of Regression Parameters
The parameters 0 and 1 in regression model (1.1) are called regression coefficients. 1 is the slope of the regression line. It indicates the change in the mean of the probability distribution of Y per unit increase in X . The parameter 0 is the Y intercept of the regression line. When the scope of the model includes X = 0, 0 gives the mean of the probability distribution of Y at X = 0. When the scope of the model does not cover X = 0, 0 does not have any particular meaning as a separate term in the regression model.
Example
Figure 1.7 shows the regression function: E{Y } = 9.5 + 2.1X for the electrical distributor example. The slope 1 = 2.1 indicates that the preparation of one additional bid in a week leads to an increase in the mean of the probability distribution of Y of 2.1 hours. The intercept 0 = 9.5 indicates the value of the regression function at X = 0. However, since the linear regression model was formulated to apply to weeks where the number of
12 Part One
Simple Linear Regression
bids prepared ranges from 20 to 80, 0 does not have any intrinsic meaning of its own here. If the scope of the model were to be extended to X levels near zero, a model with a curvilinear regression function and some value of 0 different from that for the linear regression function might well be required.
Alternative Versions of Regression Model
Sometimes it is convenient to write the simple linear regression model (1.1) in somewhat different, though equivalent, forms. Let X 0 be a constant identically equal to 1. Then, we can write (1.1) as follows: Yi = 0 X 0 + 1 X i + i where X 0 1 (1.5)
This version of the model associates an X variable with each regression coefficient. An alternative modification is to use for the predictor variable the deviation X i - X rather than X i . To leave model (1.1) unchanged, we need to write: Yi = 0 + 1 (X i - X ) + 1 X + i = (0 + 1 X ) + 1 (X i - X ) + i
= 0 + 1 (X i - X ) + i
Thus, this alternative model version is:
Yi = 0 + 1 (X i - X ) + i
(1.6)
where:
0 = 0 + 1 X
(1.6a)
We use models (1.1), (1.5), and (1.6) interchangeably as convenience dictates.
1.4
Data for Regression Analysis
Ordinarily, we do not know the values of the regression parameters 0 and 1 in regression model (1.1), and we need to estimate them from relevant data. Indeed, as we noted earlier, we frequently do not have adequate a priori knowledge of the appropriate predictor variables and of the functional form of the regression relation (e.g., linear or curvilinear), and we need to rely on an analysis of the data for developing a suitable regression model. Data for regression analysis may be obtained from nonexperimental or experimental studies. We consider each of these in turn.
Observational Data
Observational data are data obtained from nonexperimental studies. Such studies do not control the explanatory or predictor variable(s) of interest. For example, company officials wished to study the relation between age of employee (X ) and number of days of illness last year (Y ). The needed data for use in the regression analysis were obtained from personnel records. Such data are observational data since the explanatory variable, age, is not controlled. Regression analyses are frequently based on observational data, since often it is not feasible to conduct controlled experimentation. In the company personnel example just mentioned, for instance, it would not be possible to control age by assigning ages to persons.
Chapter 1
Linear Regression with One Predictor Variable
13
A major limitation of observational data is that they often do not provide adequate information about cause-and-effect relationships. For example, a positive relation between age of employee and number of days of illness in the company personnel example may not imply that number of days of illness is the direct result of age. It might be that younger employees of the company primarily work indoors while older employees usually work outdoors, and that work location is more directly responsible for the number of days of illness than age. Whenever a regression analysis is undertaken for purposes of description based on observational data, one should investigate whether explanatory variables other than those considered in the regression model might more directly explain cause-and-effect relationships.
Experimental Data
Frequently, it is possible to conduct a controlled experiment to provide data from which the regression parameters can be estimated. Consider, for instance, an insurance company that wishes to study the relation between productivity of its analysts in processing claims and length of training. Nine analysts are to be used in the study. Three of them will be selected at random and trained for two weeks, three for three weeks, and three for five weeks. The productivity of the analysts during the next 10 weeks will then be observed. The data so obtained will be experimental data because control is exercised over the explanatory variable, length of training. When control over the explanatory variable(s) is exercised through random assignments, as in the productivity study example, the resulting experimental data provide much stronger information about cause-and-effect relationships than do observational data. The reason is that randomization tends to balance out the effects of any other variables that might affect the response variable, such as the effect of aptitude of the employee on productivity. In the terminology of experimental design, the length of training assigned to an analyst in the productivity study example is called a treatment. The analysts to be included in the study are called the experimental units. Control over the explanatory variable(s) then consists of assigning a treatment to each of the experimental units by means of randomization.
Completely Randomized Design
The most basic type of statistical design for making randomized assignments of treatments to experimental units (or vice versa) is the completely randomized design. With this design, the assignments are made completely at random. This complete randomization provides that all combinations of experimental units assigned to the different treatments are equally likely, which implies that every experimental unit has an equal chance to receive any one of the treatments. A completely randomized design is particularly useful when the experimental units are quite homogeneous. This design is very flexible; it accommodates any number of treatments and permits different sample sizes for different treatments. Its chief disadvantage is that, when the experimental units are heterogeneous, this design is not as efficient as some other statistical designs.
1.5
Overview of Steps in Regression Analysis
The regression models considered in this and subsequent chapters can be utilized either for observational data or for experimental data from a completely randomized design. (Regression analysis can also utilize data from other types of experimental designs, but
14 Part One
Simple Linear Regression
the regression models presented here will need to be modified.) Whether the data are observational or experimental, it is essential that the conditions of the regression model be appropriate for the data at hand for the model to be applicable. We begin our discussion of regression analysis by considering inferences about the regression parameters for the simple linear regression model (1.1). For the rare occasion where prior knowledge or theory alone enables us to determine the appropriate regression model, inferences based on the regression model are the first step in the regression analysis. In the usual situation, however, where we do not have adequate knowledge to specify the appropriate regression model in advance, the first step is an exploratory study of the data, as shown in the flowchart in Figure 1.8. On the basis of this initial exploratory analysis, one or more preliminary regression models are developed. These regression models are then examined for their appropriateness for the data at hand and revised, or new models
FIGURE 1.8
Typical Strategy for Regression Analysis.
Start
Exploratory data analysis
Develop one or more tentative regression models
Revise regression models and/or develop new ones
NO
Is one or more of the regression models suitable for the data at hand?
YES Identify most suitable model
Make inferences on basis of regression model
Stop
Chapter 1
Linear Regression with One Predictor Variable
15
are developed, until the investigator is satisfied with the suitability of a particular regression model. Only then are inferences made on the basis of this regression model, such as inferences about the regression parameters of the model or predictions of new observations. We begin, for pedagogic reasons, with inferences based on the regression model that is finally considered to be appropriate. One must have an understanding of regression models and how they can be utilized before the issues involved in the development of an appropriate regression model can be fully explained.
1.6
Estimation of Regression Function
The observational or experimental data to be used for estimating the parameters of the regression function consist of observations on the explanatory or predictor variable X and the corresponding observations on the response variable Y . For each trial, there is an X observation and a Y observation. We denote the (X , Y ) observations for the first trial as (X 1 , Y1 ), for the second trial as (X 2 , Y2 ), and in general for the ith trial as (X i , Yi ), where i = 1, . . . , n.
Example
In a small-scale study of persistence, an experimenter gave three subjects a very difficult task. Data on the age of the subject (X ) and on the number of attempts to accomplish the task before giving up (Y ) follow:
Subject i : Age X i : Number of attempts Y i : 1 20 5 2 55 12 3 30 10
In terms of the notation to be employed, there were n = 3 subjects in this study, the observations for the first subject were (X 1 , Y1 ) = (20, 5), and similarly for the other subjects.
Method of Least Squares
To find "good" estimators of the regression parameters 0 and 1 , we employ the method of least squares. For the observations (X i , Yi ) for each case, the method of least squares considers the deviation of Yi from its expected value: Yi - (0 + 1 X i ) (1.7)
In particular, the method of least squares requires that we consider the sum of the n squared deviations. This criterion is denoted by Q:
n
Q=
i=1
(Yi - 0 - 1 X i )2
(1.8)
According to the method of least squares, the estimators of 0 and 1 are those values b0 and b1 , respectively, that minimize the criterion Q for the given sample observations (X 1 , Y1 ), (X 2 , Y2 ), . . . , (X n , Yn ).
16 Part One
Simple Linear Regression
FIGURE 1.9 Illustration of Least Squares Criterion Q for Fit of a Regression Line--Persistence Study
Example.
^ Y Y 9.0 Q 0(X) 26.0 Y ^ Y 2.81 Q .177X 5.7
12 Attempts 9 ^ Y 6 3 0 20 40 Age (a) 60 X 9.0 0(X) Attempts
12 9 6 3 0 20 40 Age (b) 60 X ^ Y
2.81
.177X
Example
Figure 1.9a presents the scatter plot of the data for the persistence study example and the regression line that results when we use the mean of the responses (9.0) as the predictor and ignore X : ^ Y = 9.0 + 0(X ) ^ Note that this regression line uses estimates b0 = 9.0 and b1 = 0, and that Y denotes the ordinate of the estimated regression line. Clearly, this regression line is not a good fit, as evidenced by the large vertical deviations of two of the Y observations from the ^ corresponding ordinates Y of the regression line. The deviation for the first subject, for which (X 1 , Y1 ) = (20, 5), is: Y1 - (b0 + b1 X 1 ) = 5 - [9.0 + 0(20)] = 5 - 9.0 = -4 The sum of the squared deviations for the three cases is: Q = (5 - 9.0)2 + (12 - 9.0)2 + (10 - 9.0)2 = 26.0 Figure 1.9b shows the same data with the regression line: ^ Y = 2.81 + .177X The fit of this regression line is clearly much better. The vertical deviation for the first case now is: Y1 - (b0 + b1 X 1 ) = 5 - [2.81 + .177(20)] = 5 - 6.35 = -1.35 and the criterion Q is much reduced: Q = (5 - 6.35)2 + (12 - 12.55)2 + (10 - 8.12)2 = 5.7 Thus, a better fit of the regression line to the data corresponds to a smaller sum Q. The objective of the method of least squares is to find estimates b0 and b1 for 0 and 1 , respectively, for which Q is a minimum. In a certain sense, to be discussed shortly, these
Chapter 1
Linear Regression with One Predictor Variable
17
estimates will provide a "good" fit of the linear regression function. The regression line in Figure 1.9b is, in fact, the least squares regression line. Least Squares Estimators. The estimators b0 and b1 that satisfy the least squares criterion can be found in two basic ways: 1. Numerical search procedures can be used that evaluate in a systematic fashion the least squares criterion Q for different estimates b0 and b1 until the ones that minimize Q are found. This approach was illustrated in Figure 1.9 for the persistence study example. 2. Analytical procedures can often be used to find the values of b0 and b1 that minimize Q. The analytical approach is feasible when the regression model is not mathematically complex. Using the analytical approach, it can be shown for regression model (1.1) that the values b0 and b1 that minimize Q for any particular set of sample data are given by the following simultaneous equations: Yi = nb0 + b1 X i Yi = b0 Xi X i2 (1.9a) (1.9b)
X i + b1
Equations (1.9a) and (1.9b) are called normal equations; b0 and b1 are called point estimators of 0 and 1 , respectively. The normal equations (1.9) can be solved simultaneously for b0 and b1 : b1 = b0 = 1 n (X i - X )(Yi - Y ) (X i - X )2 Yi - b1 X i = Y - b1 X (1.10a) (1.10b)
where X and Y are the means of the X i and the Yi observations, respectively. Computer calculations generally are based on many digits to obtain accurate values for b0 and b1 . Comment
The normal equations (1.9) can be derived by calculus. For given sample observations (X i , Yi ), the quantity Q in (1.8) is a function of 0 and 1 . The values of 0 and 1 that minimize Q can be derived by differentiating (1.8) with respect to 0 and 1 . We obtain: Q = -2 0 Q = -2 1 (Yi - 0 - 1 X i ) X i (Yi - 0 - 1 X i )
We then set these partial derivatives equal to zero, using b0 and b1 to denote the particular values of 0 and 1 that minimize Q: -2 -2 (Yi - b0 - b1 X i ) = 0 X i (Yi - b0 - b1 X i ) = 0
18 Part One
Simple Linear Regression
Simplifying, we obtain:
n
(Yi - b0 - b1 X i ) = 0
i=1 n
X i (Yi - b0 - b1 X i ) = 0
i=1
Expanding, we have: Yi - nb0 - b1 X i Yi - b0 X i - b1 Xi = 0 X i2 = 0
from which the normal equations (1.9) are obtained by rearranging terms. A test of the second partial derivatives will show that a minimum is obtained with the least squares estimators b0 and b1 .
Properties of Least Squares Estimators. Markov theorem, states:
An important theorem, called the Gauss-
Under the conditions of regression model (1.1), the least squares estimators b0 and b1 in (1.10) are unbiased and have minimum variance among all unbiased linear estimators.
(1.11)
This theorem, proven in the next chapter, states first that b0 and b1 are unbiased estimators. Hence: E{b0 } = 0 E{b1 } = 1
so that neither estimator tends to overestimate or underestimate systematically. Second, the theorem states that the estimators b0 and b1 are more precise (i.e., their sampling distributions are less variable) than any other estimators belonging to the class of unbiased estimators that are linear functions of the observations Y1 , . . . , Yn . The estimators b0 and b1 are such linear functions of the Yi . Consider, for instance, b1 . We have from (1.10a): b1 = (X i - X )(Yi - Y ) 2 (X i - X )
It will be shown in Chapter 2 that this expression is equal to: b1 = where: ki = Xi - X (X i - X )2 (X i - X )Yi = (X i - X )2 ki Yi
Since the ki are known constants (because the X i are known constants), b1 is a linear combination of the Yi and hence is a linear estimator.
Chapter 1
Linear Regression with One Predictor Variable
19
In the same fashion, it can be shown that b0 is a linear estimator. Among all linear estimators that are unbiased then, b0 and b1 have the smallest variability in repeated samples in which the X levels remain unchanged.
Example
The Toluca Company manufactures refrigeration equipment as well as many replacement parts. In the past, one of the replacement parts has been produced periodically in lots of varying sizes. When a cost improvement program was undertaken, company officials wished to determine the optimum lot size for producing this part. The production of this part involves setting up the production process (which must be done no matter what is the lot size) and machining and assembly operations. One key input for the model to ascertain the optimum lot size was the relationship between lot size and labor hours required to produce the lot. To determine this relationship, data on lot size and work hours for 25 recent production runs were utilized. The production conditions were stable during the six-month period in which the 25 runs were made and were expected to continue to be the same during the next three years, the planning period for which the cost improvement program was being conducted. Table 1.1 contains a portion of the data on lot size and work hours in columns 1 and 2. Note that all lot sizes are multiples of 10, a result of company policy to facilitate the administration of the parts production. Figure 1.10a shows a SYSTAT scatter plot of the data. We see that the lot sizes ranged from 20 to 120 units and that none of the production runs was outlying in the sense of being either unusually small or large. The scatter plot also indicates that the relationship between lot size and work hours is reasonably linear. We also see that no observations on work hours are unusually small or large, with reference to the relationship between lot size and work hours. To calculate the least squares estimates b0 and b1 in (1.10), we require the deviations X i - X and Yi - Y . These are given in columns 3 and 4 of Table 1.1. We also require the cross-product terms (X i - X )(Yi - Y ) and the squared deviations (X i - X )2 ; these are shown in columns 5 and 6. The squared deviations (Yi - Y )2 in column 7 are for later use.
TABLE 1.1 Data on Lot Size and Work Hours and Needed Calculations for Least Squares Estimates--Toluca Company Example.
(1) Lot Size Xi 80 30 50 40 80 70 1,750 70.0 (2) Work Hours Yi 399 121 221 244 342 323 7,807 312.28 (3) Xi - X 10 -40 -20 -30 10 0 0 (4) Yi - Y 86.72 -191.28 -91.28 -68.28 29.72 10.72 0 (5) (X i - X)(Yi - Y ) 867.2 7,651.2 1,825.6 2,048.4 297.2 0.0 70,690 (6) (X i - X)2 100 1,600 400 900 100 0 19,800 (7) (Yi - Y )2 7,520.4 36,588.0 8,332.0 4,662.2 883.3 114.9 307,203
Run i 1 2 3 23 24 25 Total Mean
20 Part One
Simple Linear Regression
FIGURE 1.10
SYSTAT Scatter Plot and Fitted Regression Line--Toluca Company Example.
600 500 400 Hours 300 200 100 0
(a) Scatter Plot 600 500 400 Hours 300 200 100 50 100 Lot Size 150
(b) Fitted Regression Line
0
50
100 Lot Size
150
FIGURE 1.11
Portion of MINITAB Regression Output-- Toluca Company Example.
The regression equation is Y = 62.4 + 3.57 X Predictor Constant X s = 48.82 Coef 62.37 3.5702 Stdev 26.18 0.3470 t-ratio 2.38 10.29 p 0.026 0.000
R-sq = 82.2%
R-sq(adj) = 81.4%
We see from Table 1.1 that the basic quantities needed to calculate the least squares estimates are as follows: (X i - X )(Yi - Y ) = 70,690 (X i - X )2 = 19,800 X = 70.0 Y = 312.28 Using (1.10) we obtain: b1 = (X i - X )(Yi - Y ) 70,690 = = 3.5702 )2 (X i - X 19,800
b0 = Y - b1 X = 312.28 - 3.5702(70.0) = 62.37 Thus, we estimate that the mean number of work hours increases by 3.57 hours for each additional unit produced in the lot. This estimate applies to the range of lot sizes in the data from which the estimates were derived, namely to lot sizes ranging from about 20 to about 120. Figure 1.11 contains a portion of the MINITAB regression output for the Toluca Company example. The estimates b0 and b1 are shown in the column labeled Coef, corresponding to
Chapter 1
Linear Regression with One Predictor Variable
21
the lines Constant and X , respectively. The additional information shown in Figure 1.11 will be explained later.
Point Estimation of Mean Response
Estimated Regression Function. Given sample estimators b0 and b1 of the parameters in the regression function (1.3): E{Y } = 0 + 1 X we estimate the regression function as follows: ^ Y = b0 + b1 X (1.12)
^ where Y (read Y hat) is the value of the estimated regression function at the level X of the predictor variable. We call a value of the response variable a response and E{Y } the mean response. Thus, the mean response stands for the mean of the probability distribution of Y corresponding ^ to the level X of the predictor variable. Y then is a point estimator of the mean response when the level of the predictor variable is X . It can be shown as an extension of the Gauss^ Markov theorem (1.11) that Y is an unbiased estimator of E{Y }, with minimum variance in the class of unbiased linear estimators. ^ For the cases in the study, we will call Y i : ^ Y i = b0 + b1 X i i = 1, . . . , n (1.13)
^ the fitted value for the ith case. Thus, the fitted value Y i is to be viewed in distinction to the observed value Yi .
Example
For the Toluca Company example, we found that the least squares estimates of the regression coefficients are: b0 = 62.37 Hence, the estimated regression function is: ^ Y = 62.37 + 3.5702X This estimated regression function is plotted in Figure 1.10b. It appears to be a good description of the statistical relationship between lot size and work hours. To estimate the mean response for any level X of the predictor variable, we simply substitute that value of X in the estimated regression function. Suppose that we are interested in the mean number of work hours required when the lot size is X = 65; our point estimate is: ^ Y = 62.37 + 3.5702(65) = 294.4 Thus, we estimate that the mean number of work hours required for production runs of X = 65 units is 294.4 hours. We interpret this to mean that if many lots of 65 units are produced under the conditions of the 25 runs on which the estimated regression function is based, the mean labor time for these lots is about 294 hours. Of course, the labor time for any one lot of size 65 is likely to fall above or below the mean response because of inherent variability in the production system, as represented by the error term in the model. b1 = 3.5702
22 Part One
Simple Linear Regression
TABLE 1.2
Fitted Values, Residuals, and Squared Residuals-- Toluca Company Example.
(1) Lot Size Xi 80 30 50 40 80 70 1,750
(2) Work Hours Yi 399 121 221 244 342 323 7,807
Run i 1 2 3 23 24 25 Total
(3) Estimated Mean Response ^ Yi 347.98 169.47 240.88 205.17 347.98 312.28 7,807
(4)
(5) Squared Residual ^ (Yi - Y i )2 = ei2 2,603.0 2,349.3 395.2 1,507.8 35.8 114.9 54,825
Residual ^ Yi - Y i = ei 51.02 -48.47 -19.88 38.83 -5.98 10.72 0
Fitted values for the sample cases are obtained by the substituting appropriate X values into the estimated regression function. For the first sample case, we have X 1 = 80. Hence, the fitted value for the first case is: ^ Y 1 = 62.37 + 3.5702(80) = 347.98 This compares with the observed work hours of Y1 = 399. Table 1.2 contains the observed and fitted values for a portion of the Toluca Company data in columns 2 and 3, respectively. Alternative Model (1.6). When the alternative regression model (1.6):
Yi = 0 + 1 (X i - X ) + i
is to be utilized, the least squares estimator b1 of 1 remains the same as before. The least squares estimator of 0 = 0 + 1 X becomes, from (1.10b):
b0 = b0 + b1 X = (Y - b1 X ) + b1 X = Y
(1.14)
Hence, the estimated regression function for alternative model (1.6) is: ^ Y = Y + b1 (X - X ) (1.15)
In the Toluca Company example, Y = 312.28 and X = 70.0 (Table 1.1). Hence, the estimated regression function in alternative form is: ^ Y = 312.28 + 3.5702(X - 70.0) For the first lot in our example, X 1 = 80; hence, we estimate the mean response to be: ^ Y 1 = 312.28 + 3.5702(80 - 70.0) = 347.98 which, of course, is identical to our earlier result.
Residuals
The ith residual is the difference between the observed value Yi and the corresponding fitted ^ value Y i . This residual is denoted by ei and is defined in general as follows: ^ ei = Yi - Y i (1.16)
Chapter 1
Linear Regression with One Predictor Variable
23
FIGURE 1.12
Illustration of Residuals-- Toluca Company Example (not drawn to scale).
Y e1 Hours ^ Y2 169.47 e2 Y2 121 48.47 51.02
Y1
399
^ Y1
347.98
0
30 Lot Size
80
X
For regression model (1.1), the residual ei becomes: ei = Yi - (b0 + b1 X i ) = Yi - b0 - b1 X i (1.16a)
The calculation of the residuals for the Toluca Company example is shown for a portion of the data in Table 1.2. We see that the residual for the first case is: ^ e1 = Y1 - Y 1 = 399 - 347.98 = 51.02 The residuals for the first two cases are illustrated graphically in Figure 1.12. Note in this figure that the magnitude of a residual is represented by the vertical deviation of the Yi observation from the corresponding point on the estimated regression function (i.e., from ^ the corresponding fitted value Y i ). We need to distinguish between the model error term value i = Yi - E{Yi } and the ^ residual ei = Yi - Y i . The former involves the vertical deviation of Yi from the unknown true regression line and hence is unknown. On the other hand, the residual is the vertical ^ deviation of Yi from the fitted value Y i on the estimated regression line, and it is known. Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand. We discuss this use in Chapter 3.
Properties of Fitted Regression Line
The estimated regression line (1.12) fitted by the method of least squares has a number of properties worth noting. These properties of the least squares estimated regression function do not apply to all regression models, as we shall see in Chapter 4. 1. The sum of the residuals is zero:
n
ei = 0
i=1
(1.17)
Table 1.2, column 4, illustrates this property for the Toluca Company example. Rounding errors may, of course, be present in any particular case, resulting in a sum of the residuals that does not equal zero exactly. 2. The sum of the squared residuals, ei2 , is a minimum. This was the requirement to be satisfied in deriving the least squares estimators of the regression parameters since the
24 Part One
Simple Linear Regression
criterion Q in (1.8) to be minimized equals ei2 when the least squares estimators b0 and b1 are used for estimating 0 and 1 . ^ 3. The sum of the observed values Yi equals the sum of the fitted values Y i :
n n
Yi =
i=1 i=1
^ Yi
(1.18)
This property is illustrated in Table 1.2, columns 2 and 3, for the Toluca Company example. ^ It follows that the mean of the fitted values Y i is the same as the mean of the observed values Yi , namely, Y . 4. The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ith trial:
n
X i ei = 0
i=1
(1.19)
5. A consequence of properties (1.17) and (1.19) is that the sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial:
n
^ Y i ei = 0
i=1
(1.20)
6. The regression line always goes through the point ( X , Y ). Comment
The six properties of the fitted regression line follow directly from the least squares normal equations (1.9). For example, property 1 in (1.17) is proven as follows: ei = =0 (Yi - b0 - b1 X i ) = Yi - nb0 - b1 Xi
by the first normal equation (1.9a)
Property 6, that the regression line always goes through the point ( X , Y ), can be demonstrated easily from the alternative form (1.15) of the estimated regression line. When X = X , we have: ^ Y = Y + b1 (X - X ) = Y + b1 ( X - X ) = Y
1.7
Estimation of Error Terms Variance 2
The variance 2 of the error terms i in regression model (1.1) needs to be estimated to obtain an indication of the variability of the probability distributions of Y . In addition, as we shall see in the next chapter, a variety of inferences concerning the regression function and the prediction of Y require an estimate of 2 .
Point Estimator of 2
To lay the basis for developing an estimator of 2 for regression model (1.1), we first consider the simpler problem of sampling from a single population. Single Population. We know that the variance 2 of a single population is estimated by the sample variance s 2 . In obtaining the sample variance s 2 , we consider the deviation of
Chapter 1
Linear Regression with One Predictor Variable
25
an observation Yi from the estimated mean Y , square it, and then sum all such squared deviations:
n
(Yi - Y )2
i=1
Such a sum is called a sum of squares. The sum of squares is then divided by the degrees of freedom associated with it. This number is n - 1 here, because one degree of freedom is lost by using Y as an estimate of the unknown population mean . The resulting estimator is the usual sample variance: n (Yi - Y )2 s 2 = i=1 n-1 which is an unbiased estimator of the variance 2 of an infinite population. The sample variance is often called a mean square, because a sum of squares has been divided by the appropriate number of degrees of freedom. Regression Model. The logic of developing an estimator of 2 for the regression model is the same as for sampling from a single population. Recall in this connection from (1.4) that the variance of each observation Yi for regression model (1.1) is 2 , the same as that of each error term i . We again need to calculate a sum of squared deviations, but must recognize that the Yi now come from different probability distributions with different means that depend upon the level X i . Thus, the deviation of an observation Yi must be calculated ^ around its own estimated mean Y i . Hence, the deviations are the residuals: ^ Yi - Y i = ei and the appropriate sum of squares, denoted by SSE, is:
n n
SSE =
i=1
^ (Yi - Y i )2 =
i=1
ei2
(1.21)
where SSE stands for error sum of squares or residual sum of squares. The sum of squares SSE has n - 2 degrees of freedom associated with it. Two degrees of freedom are lost because both 0 and 1 had to be estimated in obtaining the estimated ^ means Y i . Hence, the appropriate mean square, denoted by MSE or s 2 , is: s 2 = MSE = SSE = n-2 ^ (Yi - Y i )2 ei2 = n-2 n-2 (1.22)
where MSE stands for error mean square or residual mean square. It can be shown that MSE is an unbiased estimator of 2 for regression model (1.1): (1.23) An estimator of the standard deviation is simply s = MSE, the positive square root of MSE. E{MSE} = 2
Example
We will calculate SSE for the Toluca Company example by (1.21). The residuals were obtained earlier in Table 1.2, column 4. This table also shows the squared residuals in column 5. From these results, we obtain: SSE = 54,825
26 Part One
Simple Linear Regression
Since 25 - 2 = 23 degrees of freedom are associated with SSE, we find: 54,825 = 2,384 23 Finally, a point estimate of , the standard deviation of the probability distribution of Y for any X , is s = 2,384 = 48.8 hours. Consider again the case where the lot size is X = 65 units. We found earlier that the mean of the probability distribution of Y for this lot size is estimated to be 294.4 hours. Now, we have the additional information that the standard deviation of this distribution is estimated to be 48.8 hours. This estimate is shown in the MINITAB output in Figure 1.11, labeled as s. We see that the variation in work hours from lot to lot for lots of 65 units is quite substantial (49 hours) compared to the mean of the distribution (294 hours). s 2 = MSE =
1.8
Normal Error Regression Model
No matter what may be the form of the distribution of the error terms i (and hence of the Yi ), the least squares method provides unbiased point estimators of 0 and 1 that have minimum variance among all unbiased linear estimators. To set up interval estimates and make tests, however, we need to make an assumption about the form of the distribution of the i . The standard assumption is that the error terms i are normally distributed, and we will adopt it here. A normal error term greatly simplifies the theory of regression analysis and, as we shall explain shortly, is justifiable in many real-world situations where regression analysis is applied.
Model
The normal error regression model is as follows: Yi = 0 + 1 X i + i where: Yi is the observed response in the ith trial X i is a known constant, the level of the predictor variable in the ith trial 0 and 1 are parameters i are independent N (0, 2 ) i = 1, . . . , n Comments
1. The symbol N (0, 2 ) stands for normally distributed, with mean 0 and variance 2 . 2. The normal error model (1.24) is the same as regression model (1.1) with unspecified error distribution, except that model (1.24) assumes that the errors i are normally distributed. 3. Because regression model (1.24) assumes that the errors are normally distributed, the assumption of uncorrelatedness of the i in regression model (1.1) becomes one of independence in the normal error model. Hence, the outcome in any one trial has no effect on the error term for any other trial--as to whether it is positive or negative, small or large.
(1.24)
Chapter 1
Linear Regression with One Predictor Variable
27
4. Regression model (1.24) implies that the Yi are independent normal random variables, with mean E{Yi } = 0 + 1 X i and variance 2 . Figure 1.6 pictures this normal error model. Each of the probability distributions of Y in Figure 1.6 is normally distributed, with constant variability, and the regression function is linear. 5. The normality assumption for the error terms is justifiable in many situations because the error terms frequently represent the effects of factors omitted from the model that affect the response to some extent and that vary at random without reference to the variable X . For instance, in the Toluca Company example, the effects of such factors as time lapse since the last production run, particular machines used, season of the year, and personnel employed could vary more or less at random from run to run, independent of lot size. Also, there might be random measurement errors in the recording of Y , the hours required. Insofar as these random effects have a degree of mutual independence, the composite error term i representing all these factors would tend to comply with the central limit theorem and the error term distribution would approach normality as the number of factor effects becomes large. A second reason why the normality assumption of the error terms is frequently justifiable is that the estimation and testing procedures to be discussed in the next chapter are based on the t distribution and are usually only sensitive to large departures from normality. Thus, unless the departures from normality are serious, particularly with respect to skewness, the actual confidence coefficients and risks of errors will be close to the levels for exact normality.
Estimation of Parameters by Method of Maximum Likelihood
When the functional form of the probability distribution of the error terms is specified, estimators of the parameters 0 , 1 , and 2 can be obtained by the method of maximum likelihood. Essentially, the method of maximum likelihood chooses as estimates those values of the parameters that are most consistent with the sample data. We explain the method of maximum likelihood first for the simple case when a single population with one parameter is sampled. Then we explain this method for regression models. Single Population. Consider a normal population whose standard deviation is known to be = 10 and whose mean is unknown. A random sample of n = 3 observations is selected from the population and yields the results Y1 = 250, Y2 = 265, Y3 = 259. We now wish to ascertain which value of is most consistent with the sample data. Consider = 230. Figure 1.13a shows the normal distribution with = 230 and = 10; also shown there are the locations of the three sample observations. Note that the sample observations
FIGURE 1.13
Densities for Sample Observations for Two Possible Values of : Y1 = 250, Y2 = 265, Y3 = 259.
230
259
230 Y1 (a) Y3 Y2
Y Y1
259 Y3 Y2 (b)
Y
28 Part One
Simple Linear Regression
would be in the right tail of the distribution if were equal to 230. Since these are unlikely occurrences, = 230 is not consistent with the sample data. Figure 1.13b shows the population and the locations of the sample data if were equal to 259. Now the observations would be in the center of the distribution and much more likely. Hence, = 259 is more consistent with the sample data than = 230. The method of maximum likelihood uses the density of the probability distribution at Yi (i.e., the height of the curve at Yi ) as a measure of consistency for the observation Yi . Consider observation Y1 in our example. If Y1 is in the tail, as in Figure 1.13a, the height of the curve will be small. If Y1 is nearer to the center of the distribution, as in Figure 1.13b, the height will be larger. Using the density function for a normal probability distribution in (A.34) in Appendix A, we find the densities for Y1 , denoted by f 1 , for the two cases of in Figure 1.13 as follows: = 230: = 259: f1 = f1 = 1 2 (10) 1 2 (10) exp - exp - 1 250 - 230 2 10 1 250 - 259 2 10
2
= .005399
2
= .026609
The densities for all three sample observations for the two cases of are as follows:
= 230 f1 f2 f3 .005399 .000087 .000595 = 259 .026609 .033322 .039894
The method of maximum likelihood uses the product of the densities (i.e., here, the product of the three heights) as the measure of consistency of the parameter value with the sample data. The product is called the likelihood value of the parameter value and is denoted by L(). If the value of is consistent with the sample data, the densities will be relatively large and so will be the product (i.e., the likelihood value). If the value of is not consistent with the data, the densities will be small and the product L() will be small. For our simple example, the likelihood values are as follows for the two cases of : L( = 230) = .005399(.000087)(.000595) = .27910-9 L( = 259) = .026609(.033322)(.039894) = .0000354 Since the likelihood value L( = 230) is a very small number, it is shown in scientific notation, which indicates that there are nine zeros after the decimal place before 279. Note that L( = 230) is much smaller than L( = 259), indicating that = 259 is much more consistent with the sample data than = 230. The method of maximum likelihood chooses as the maximum likelihood estimate that value of for which the likelihood value is largest. Just as for the method of least squares,
Chapter 1
Linear Regression with One Predictor Variable
29
there are two methods of finding maximum likelihood estimates: by a systematic numerical search and by use of an analytical solution. For some problems, analytical solutions for the maximum likelihood estimators are available. For others, a computerized numerical search must be conducted. For our example, an analytical solution is available. It can be shown that for a normal population the maximum likelihood estimator of is the sample mean Y . In our example, Y = 258 and the maximum likelihood estimate of therefore is 258. The likelihood value of = 258 is L( = 258) = .0000359, which is slightly larger than the likelihood value of .0000354 for = 259 that we had calculated earlier. The product of the densities viewed as a function of the unknown parameters is called the likelihood function. For our example, where = 10, the likelihood function is: L() = 1 2 (10)
3
exp -
1 250 - 2 10
2
2
exp -
1 265 - 2 10
2
exp -
1 259 - 2 10
Figure 1.14 shows a computer plot of the likelihood function for our example. It is based on the calculation of likelihood values L() for many values of . Note that the likelihood values at = 230 and = 259 correspond to the ones we determined earlier. Also note that the likelihood function reaches a maximum at = 258. The fact that the likelihood function in Figure 1.14 is relatively peaked in the neigh borhood of the maximum likelihood estimate Y = 258 is of particular interest. Note, for instance, that for = 250 or = 266, the likelihood value is already only a little more than one-half as large as the likelihood value at = 258. This indicates that the maximum likelihood estimate here is relatively precise because values of not near the maxi mum likelihood estimate Y = 258 are much less consistent with the sample data. When the likelihood function is relatively flat in a fairly wide region around the maximum likelihood
FIGURE 1.14
Likelihood Function for Estimation of Mean of Normal Population: Y1 = 250, Y2 = 265, Y3 = 259.
0.00004
0.00003
L( )
0.00002
0.00001
0.00000 220 230 240 250 260 270 280 290 300
30 Part One
Simple Linear Regression
estimate, many values of the parameter are almost as consistent with the sample data as the maximum likelihood estimate, and the maximum likelihood estimate would therefore be relatively imprecise. Regression Model. The concepts just presented for maximum likelihood estimation of a population mean carry over directly to the estimation of the parameters of normal error regression model (1.24). For this model, each Yi observation is normally distributed with mean 0 + 1 X i and standard deviation . To illustrate the method of maximum likelihood estimation here, consider the earlier persistence study example on page 15. For simplicity, let us suppose that we know = 2.5. We wish to determine the likelihood value for the parameter values 0 = 0 and 1 = .5. For subject 1, X 1 = 20 and hence the mean of the probability distribution would be 0 + 1 X 1 = 0 + .5(20) = 10.0. Figure 1.15a shows the normal distribution with mean 10.0 and standard deviation 2.5. Note that the observed value Y1 = 5 is in the left tail of the distribution and that the density there is relatively small. For the second subject, X 2 = 55 and hence 0 + 1 X 2 = 27.5. The normal distribution with mean 27.5 is shown in Figure 1.15b. Note that the observed value Y2 = 12 is most unlikely for this case and that the density there is extremely small. Finally, note that the observed value Y3 = 10 is also in the left tail of its distribution if 0 = 0 and 1 = .5, as shown in Figure 1.15c, and that the density there is also relatively small.
FIGURE 1.15 Densities for Sample Observations if 0 = 0 and 1 = 5--Persistence Study Example.
(a) X1
0
(b) X2 10
0
(c) X3
0
20, Y1 5 .5(20) 1X1
55, Y2 12 .5(55) 27.5 1X2
30, Y3 10 .5(30) 15 1X3
10.0
Y Y2
27.5
Y Y3 (d) Combined Presentation
15
Y
Y1
Y
A
m tte
s pt
30
E{Y }
0
.5X
E{Y2}
27.5
20 E{Y3} Y3 Y1 E{Y1} 40 Age 10 50 60 X 15 Y2
10
0
10
20
30
Chapter 1
Linear Regression with One Predictor Variable
31
Figure 1.15d combines all of this information, showing the regression function E{Y } = 0 + .5X , the three sample cases, and the three normal distributions. Note how poorly the regression line fits the three sample cases, as was also indicated by the three small density values. Thus, it appears that 0 = 0 and 1 = .5 are not consistent with the data. We calculate the densities (i.e., heights of the curve) in the usual way. For Y1 = 5, X 1 = 20, the normal density is as follows when 0 = 0 and 1 = .5: 1 1 5 - 10.0 f1 = exp - 2 2.5 2 (2.5)
2
= .021596
The other densities are f 2 = .7175 10-9 and f 3 = .021596, and the likelihood value of 0 = 0 and 1 = .5 therefore is: L(0 = 0, 1 = .5) = .021596(.7175 10-9 )(.021596) = .3346 10-12 In general, the density of an observation Yi for the normal error regression model (1.24) is as follows, utilizing the fact that E{Yi } = 0 + 1 X i and 2 {Yi } = 2 : fi = 1 2 exp - 1 Yi - 0 - 1 X i 2
2
(1.25)
The likelihood function for n observations Y1 , Y2 , . . . , Yn is the product of the individual densities in (1.25). Since the variance 2 of the error terms is usually unknown, the likelihood function is a function of three parameters, 0 , 1 , and 2 :
n
L(0 , 1 , ) =
2 i=1
1 1 exp - (Yi - 0 - 1 X i )2 (2 2 )1/2 2 2
n
=
1 1 exp - 2 )n/2 (2 2 2
(Yi - 0 - 1 X i )2
i=1
(1.26)
The values of 0 , 1 , and 2 that maximize this likelihood function are the maximum ^ ^ likelihood estimators and are denoted by 0 , 1 , and 2 , respectively. These estimators can ^ be found analytically, and they are as follows:
Parameter 0 1 2 Maximum Likelihood Estimator ^ 0 = b0 ^ 1 = b1 2 = ^ same as (1.10b) same as (1.10a) ^ (Y i - Y i )2 n
(1.27)
Thus, the maximum likelihood estimators of 0 and 1 are the same estimators as those provided by the method of least squares. The maximum likelihood estimator 2 is biased, ^ and ordinarily the unbiased estimator MSE as given in (1.22) is used. Note that the unbi^ ased estimator MSE or s 2 differs but slightly from the maximum likelihood estimator 2 ,
32 Part One
Simple Linear Regression
especially if n is not small: s 2 = MSE = n 2 ^ n-2 (1.28)
Example
For the persistence study example, we know now that the maximum likelihood estimates of 0 and 1 are b0 = 2.81 and b1 = .177, the same as the least squares estimates in Figure 1.9b. Comments
^ ^ 1. Since the maximum likelihood estimators 0 and 1 are the same as the least squares estimators b0 and b1 , they have the properties of all least squares estimators: a. They are unbiased. b. They have minimum variance among all unbiased linear estimators. In addition, the maximum likelihood estimators b0 and b1 for the normal error regression model (1.24) have other desirable properties: c. They are consistent, as defined in (A.52). d. They are sufficient, as defined in (A.53). e. They are minimum variance unbiased; that is, they have minimum variance in the class of all unbiased estimators (linear or otherwise). Thus, for the normal error model, the estimators b0 and b1 have many desirable properties. 2. We find the values of 0 , 1 , and 2 that maximize the likelihood function L in (1.26) by taking partial derivatives of L with respect to 0 , 1 , and 2 , equating each of the partials to zero, and solving the system of equations thus obtained. We can work with loge L, rather than L, because both L and loge L are maximized for the same values of 0 , 1 , and 2 : n n 1 loge L = - loge 2 - loge 2 - 2 2 2 2 (Yi - 0 - 1 X i )2 (1.29)
Partial differentiation of the logarithm of the likelihood function is much easier; it yields: 1 (loge L) = 2 0 1 (loge L) = 2 1 (Yi - 0 - 1 X i ) X i (Yi - 0 - 1 X i ) (Yi - 0 - 1 X i )2
n 1 (loge L) =- 2 + 2 2 2 4
^ We now set these partial derivatives equal to zero, replacing 0 , 1 , and 2 by the estimators 0 , ^ 1 , and 2 . We obtain, after some simplification: ^ ^ ^ (Yi - 0 - 1 X i ) = 0 ^ ^ X i (Yi - 0 - 1 X i ) = 0 ^ ^ (Yi - 0 - 1 X i )2 = 2 ^ n (1.30a) (1.30b) (1.30c)
Chapter 1
Linear Regression with One Predictor Variable
33
Formulas (1.30a) and (1.30b) are identical to the earlier least squares normal equations (1.9), and formula (1.30c) is the biased estimator of 2 given earlier in (1.27).
Cited References
1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8.
BMDP New System 2.0. Statistical Solutions, Inc. MINITAB Release 13. Minitab Inc. SAS/STAT Release 8.2. SAS Institute, Inc. SPSS 11.5 for Windows. SPSS Inc. SYSTAT 10.2. SYSTAT Software, Inc. JMP Version 5. SAS Institute, Inc. S-Plus 6 for Windows. Insightful Corporation. MATLAB 6.5. The MathWorks, Inc.
Problems
1.1. Refer to the sales volume example on page 3. Suppose that the number of units sold is measured accurately, but clerical errors are frequently made in determining the dollar sales. Would the relation between the number of units sold and dollar sales still be a functional one? Discuss. 1.2. The members of a health spa pay annual membership dues of $300 plus a charge of $2 for each visit to the spa. Let Y denote the dollar cost for the year for a member and X the number of visits by the member during the year. Express the relation between X and Y mathematically. Is it a functional relation or a statistical relation? 1.3. Experience with a certain type of plastic indicates that a relation exists between the hardness (measured in Brinell units) of items molded from the plastic (Y ) and the elapsed time since termination of the molding process (X ). It is proposed to study this relation by means of regression analysis. A participant in the discussion objects, pointing out that the hardening of the plastic "is the result of a natural chemical process that doesn't leave anything to chance, so the relation must be mathematical and regression analysis is not appropriate." Evaluate this objection. 1.4. In Table 1.1, the lot size X is the same in production runs 1 and 24 but the work hours Y differ. What feature of regression model (1.1) is illustrated by this? 1.5. When asked to state the simple linear regression model, a student wrote it as follows: E{Yi } = 0 + 1 X i + i . Do you agree? 1.6. Consider the normal error regression model (1.24). Suppose that the parameter values are 0 = 200, 1 = 5.0, and = 4. a. Plot this normal error regression model in the fashion of Figure 1.6. Show the distributions of Y for X = 10, 20, and 40. b. Explain the meaning of the parameters 0 and 1 . Assume that the scope of the model includes X = 0. 1.7. In a simulation exercise, regression model (1.1) applies with 0 = 100, 1 = 20, and 2 = 25. An observation on Y will be made for X = 5. a. Can you state the exact probability that Y will fall between 195 and 205? Explain. b. If the normal error regression model (1.24) is applicable, can you now state the exact probability that Y will fall between 195 and 205? If so, state it. 1.8. In Figure 1.6, suppose another Y observation is obtained at X = 45. Would E{Y } for this new observation still be 104? Would the Y value for this new case again be 108? 1.9. A student in accounting enthusiastically declared: "Regression is a very powerful tool. We can isolate fixed and variable costs by fitting a linear regression model, even when we have no data for small lots." Discuss.
34 Part One
Simple Linear Regression
1.10. An analyst in a large corporation studied the relation between current annual salary (Y ) and age (X ) for the 46 computer programmers presently employed in the company. The analyst concluded that the relation is curvilinear, reaching a maximum at 47 years. Does this imply that the salary for a programmer increases until age 47 and then decreases? Explain. 1.11. The regression function relating production output by an employee after taking a training program (Y ) to the production output before the training program (X ) is E{Y } = 20 + .95X , where X ranges from 40 to 100. An observer concludes that the training program does not raise production output on the average because 1 is not greater than 1.0. Comment. 1.12. In a study of the relationship for senior citizens between physical activity and frequency of colds, participants were asked to monitor their weekly time spent in exercise over a five-year period and the frequency of colds. The study demonstrated that a negative statistical relation exists between time spent in exercise and frequency of colds. The investigator concluded that increasing the time spent in exercise is an effective strategy for reducing the frequency of colds for senior citizens. a. Were the data obtained in the study observational or experimental data? b. Comment on the validity of the conclusions reached by the investigator. c. Identify two or three other explanatory variables that might affect both the time spent in exercise and the frequency of colds for senior citizens simultaneously. d. How might the study be changed so that a valid conclusion about causal relationship between amount of exercise and frequency of colds can be reached? 1.13. Computer programmers employed by a software developer were asked to participate in a monthlong training seminar. During the seminar, each employee was asked to record the number of hours spent in class preparation each week. After completing the seminar, the productivity level of each participant was measured. A positive linear statistical relationship between participants' productivity levels and time spent in class preparation was found. The seminar leader concluded that increases in employee productivity are caused by increased class preparation time. a. Were the data used by the seminar leader observational or experimental data? b. Comment on the validity of the conclusion reached by the seminar leader. c. Identify two or three alternative variables that might cause both the employee productivity scores and the employee class participation times to increase (decrease) simultaneously. d. How might the study be changed so that a valid conclusion about causal relationship between class preparation time and employee productivity can be reached? 1.14. Refer to Problem 1.3. Four different elapsed times since termination of the molding process (treatments) are to be studied to see how they affect the hardness of a plastic. Sixteen batches (experimental units) are available for the study. Each treatment is to be assigned to four experimental units selected at random. Use a table of random digits or a random number generator to make an appropriate randomization of assignments. 1.15. The effects of five dose levels are to be studied in a completely randomized design, and 20 experimental units are available. Each dose level is to be assigned to four experimental units selected at random. Use a table of random digits or a random number generator to make an appropriate randomization of assignments. 1.16. Evaluate the following statement: "For the least squares method to be fully valid, it is required that the distribution of Y be normal." 1.17. A person states that b0 and b1 in the fitted regression function (1.13) can be estimated by the method of least squares. Comment. 1.18. According to (1.17), ei = 0 when regression model (1.1) is fitted to a set of n cases by the method of least squares. Is it also true that i = 0? Comment.
Chapter 1
Linear Regression with One Predictor Variable
35
1.19. Grade point average. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student's grade point average (GPA) at the end of the freshman year (Y ) can be predicted from the ACT test score (X ). The results of the study follow. Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 21 3.897 2 14 3.885 3 28 3.778 ... ... ... 118 28 3.914 119 16 1.860 120 28 2.948
a. Obtain the least squares estimates of 0 and 1 , and state the estimated regression function. b. Plot the estimated regression function and the data. Does the estimated regression function appear to fit the data well? c. Obtain a point estimate of the mean freshman GPA for students with ACT test score X = 30. d. What is the point estimate of the change in the mean response when the entrance test score increases by one point? *1.20. Copier maintenance. The Tri-City Office Equipment Corporation sells an imported copier on a franchise basis and performs preventive maintenance and repair service on this copier. The data below have been collected from 45 recent calls on users to perform routine preventive maintenance service; for each call, X is the number of copiers serviced and Y is the total number of minutes spent by the service person. Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 2 20 2 4 60 3 3 46 ... ... ... 43 2 27 44 4 61 45 5 77
a. Obtain the estimated regression function. b. Plot the estimated regression function and the data. How well does the estimated regression function fit the data? c. Interpret b0 in your estimated regression function. Does b0 provide any relevant information here? Explain. d. Obtain a point estimate of the mean service time when X = 5 copiers are serviced. *1.21. Airfreight breakage. A substance used in biological and medical research is shipped by airfreight to users in cartons of 1,000 ampules. The data below, involving 10 shipments, were collected on the number of times the carton was transferred from one aircraft to another over the shipment route (X ) and the number of ampules found to be broken upon arrival (Y ). Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 1 16 2 0 9 3 2 17 4 0 12 5 3 22 6 1 13 7 0 8 8 1 15 9 2 19 10 0 11
a. Obtain the estimated regression function. Plot the estimated regression function and the data. Does a linear regression function appear to give a good fit here? b. Obtain a point estimate of the expected number of broken ampules when X = 1 transfer is made.
36 Part One
Simple Linear Regression
c. Estimate the increase in the expected number of ampules broken when there are 2 transfers as compared to 1 transfer. d. Verify that your fitted regression line goes through the point ( X , Y ). 1.22. Plastic hardness. Refer to Problems 1.3 and 1.14. Sixteen batches of the plastic were made, and from each batch one test item was molded. Each test item was randomly assigned to one of the four predetermined time levels, and the hardness was measured after the assigned elapsed time. The results are shown below; X is the elapsed time in hours, and Y is hardness in Brinell units. Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 16 199 2 16 205 3 16 196 ... ... ... 14 40 248 15 40 253 16 40 246
a. Obtain the estimated regression function. Plot the estimated regression function and the data. Does a linear regression function appear to give a good fit here? b. Obtain a point estimate of the mean hardness when X = 40 hours. c. Obtain a point estimate of the change in mean hardness when X increases by 1 hour. 1.23. Refer to Grade point average Problem 1.19. a. Obtain the residuals ei . Do they sum to zero in accord with (1.17)? b. Estimate 2 and . In what units is expressed? *1.24. Refer to Copier maintenance Problem 1.20. a. Obtain the residuals ei and the sum of the squared residuals ei2 . What is the relation between the sum of the squared residuals here and the quantity Q in (1.8)? b. Obtain point estimates of 2 and . In what units is expressed? *1.25. Refer to Airfreight breakage Problem 1.21. a. Obtain the residual for the first case. What is its relation to 1 ? b. Compute ei2 and MSE. What is estimated by MSE? 1.26. Refer to Plastic hardness Problem 1.22. a. Obtain the residuals ei . Do they sum to zero in accord with (1.17)? b. Estimate 2 and . In what units is expressed? *1.27. Muscle mass. A person's muscle mass is expected to decrease with age. To explore this relationship in women, a nutritionist randomly selected 15 women from each 10-year age group, beginning with age 40 and ending with age 79. The results follow; X is age, and Y is a measure of muscle mass. Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 43 106 2 41 106 3 47 97 ... ... ... 58 76 56 59 72 70 60 76 74
a. Obtain the estimated regression function. Plot the estimated regression function and the data. Does a linear regression function appear to give a good fit here? Does your plot support the anticipation that muscle mass decreases with age? b. Obtain the following: (1) a point estimate of the difference in the mean muscle mass for women differing in age by one year, (2) a point estimate of the mean muscle mass for women aged X = 60 years, (3) the value of the residual for the eighth case, (4) a point estimate of 2 .
Chapter 1
Linear Regression with One Predictor Variable
37
1.28. Crime rate. A criminologist studying the relationship between level of education and crime rate in medium-sized U.S. counties collected the following data for a random sample of 84 counties; X is the percentage of individuals in the county having at least a high-school diploma, and Y is the crime rate (crimes reported per 100,000 residents) last year. Assume that first-order regression model (1.1) is appropriate.
i: Xi : Yi : 1 74 8,487 2 82 8,179 3 81 8,362 ... ... ... 82 88 8,040 83 83 6,981 84 76 7,582
a. Obtain the estimated regression function. Plot the estimated regression function and the data. Does the linear regression function appear to give a good fit here? Discuss. b. Obtain point estimates of the following: (1) the difference in the mean crime rate for two counties whose high-school graduation rates differ by one percentage point, (2) the mean crime rate last year in counties with high school graduation percentage X = 80, (3) 10 , (4) 2 .
Exercises
1.29. Refer to regression model (1.1). Assume that X = 0 is within the scope of the model. What is the implication for the regression function if 0 = 0 so that the model is Yi = 1 X i + i ? How would the regression function plot on a graph? 1.30. Refer to regression model (1.1). What is the implication for the regression function if 1 = 0 so that the model is Yi = 0 + i ? How would the regression function plot on a graph? 1.31. Refer to Plastic hardness Problem 1.22. Suppose one test item was molded from a single batch of plastic and the hardness of this one item was measured at 16 different points in time. Would the error term in the regression model for this case still reflect the same effects as for the experiment initially described? Would you expect the error terms for the different points in time to be uncorrelated? Discuss. 1.32. Derive the expression for b1 in (1.10a) from the normal equations in (1.9). 1.33. (Calculus needed.) Refer to the regression model Yi = 0 + i in Exercise 1.30. Derive the least squares estimator of 0 for this model. 1.34. Prove that the least squares estimator of 0 obtained in Exercise 1.33 is unbiased. 1.35. Prove the result in (1.18) -- that the sum of the Y observations is the same as the sum of the fitted values. 1.36. Prove the result in (1.20) -- that the sum of the residuals weighted by the fitted values is zero. 1.37. Refer to Table 1.1 for the Toluca Company example. When asked to present a point estimate of the expected work hours for lot sizes of 30 pieces, a person gave the estimate 202 because this is the mean number of work hours in the three runs of size 30 in the study. A critic states that this person's approach "throws away" most of the data in the study because cases with lot sizes other than 30 are ignored. Comment. 1.38. In Airfreight breakage Problem 1.21, the least squares estimates are b0 = 10.20 and b1 = 4.00, and ei2 = 17.60. Evaluate the least squares criterion Q in (1.8) for the estimates (1) b0 = 9, b1 = 3; (2) b0 = 11, b1 = 5. Is the criterion Q larger for these estimates than for the least squares estimates? 1.39. Two observations on Y were obtained at each of three X levels, namely, at X = 5, X = 10, and X = 15. a. Show that the least squares regression line fitted to the three points (5, Y1 ), (10, Y2 ), and (15, Y3 ), where Y1 , Y2 , and Y3 denote the means of the Y observations at the three X levels, is identical to the least squares regression line fitted to the original six cases.
38 Part One
Simple Linear Regression
b. In this study, could the error term variance 2 be estimated without fitting a regression line? Explain. 1.40. In fitting regression model (1.1), it was found that observation Yi fell directly on the fitted ^ regression line (i.e., Yi = Y i ). If this case were deleted, would the least squares regression line fitted to the remaining n - 1 cases be changed? [Hint: What is the contribution of case i to the least squares criterion Q in (1.8)?] 1.41. (Calculus needed.) Refer to the regression model Yi = 1 X i +i , i = 1, . . . , n, in Exercise 1.29. a. Find the least squares estimator of 1 . b. Assume that the error terms i are independent N (0, 2 ) and that 2 is known. State the likelihood function for the n sample observations on Y and obtain the maximum likelihood estimator of 1 . Is it the same as the least squares estimator? c. Show that the maximum likelihood estimator of 1 is unbiased. 1.42. Typographical errors. Shown below are the number of galleys for a manuscript (X ) and the dollar cost of correcting typographical errors (Y ) in a random sample of recent orders handled by a firm specializing in technical manuscripts. Assume that the regression model Yi = 1 X i + i is appropriate, with normally distributed independent error terms whose variance is 2 = 16.
i: Xi : Yi : 1 7 128 2 12 213 3 4 75 4 14 250 5 25 446 6 30 540
a. State the likelihood function for the six Y observations, for 2 = 16. b. Evaluate the likelihood function for 1 = 17, 18, and 19. For which of these 1 values is the likelihood function largest? c. The maximum likelihood estimator is b1 = X i Yi / X i2 . Find the maximum likelihood estimate. Are your results in part (b) consistent with this estimate? d. Using a computer graphics or statistics package, evaluate the likelihood function for values of 1 between 1 = 17 and 1 = 19 and plot the function. Does the point at which the likelihood function is maximized correspond to the maximum likelihood estimate found in part (c)?
Projects
1.43. Refer to the CDI data set in Appendix C.2. The number of active physicians in a CDI (Y ) is expected to be related to total population, number of hospital beds, and total personal income. Assume that first-order regression model (1.1) is appropriate for each of the three predictor variables. a. Regress the number of active physicians in turn on each of the three predictor variables. State the estimated regression functions. b. Plot the three estimated regression functions and data on separate graphs. Does a linear regression relation appear to provide a good fit for each of the three predictor variables? c. Calculate MSE for each of the three predictor variables. Which predictor variable leads to the smallest variability around the fitted regression line? 1.44. Refer to the CDI data set in Appendix C.2. a. For each geographic region, regress per capita income in a CDI (Y ) against the percentage of individuals in a county having at least a bachelor's degree (X ). Assume that
Chapter 1
Linear Regression with One Predictor Variable
39
first-order regression model (1.1) is appropriate for each region. State the estimated regression functions. b. Are the estimated regression functions similar for the four regions? Discuss. c. Calculate MSE for each region. Is the variability around the fitted regression line approximately the same for the four regions? Discuss. 1.45. Refer to the SENIC data set in Appendix C.1. The average length of stay in a hospital (Y ) is anticipated to be related to infection risk, available facilities and services, and routine chest X-ray ratio. Assume that first-order regression model (1.1) is appropriate for each of the three predictor variables. a. Regress average length of stay on each of the three predictor variables. State the estimated regression functions. b. Plot the three estimated regression functions and data on separate graphs. Does a linear relation appear to provide a good fit for each of the three predictor variables? c. Calculate MSE for each of the three predictor variables. Which predictor variable leads to the smallest variability around the fitted regression line? 1.46. Refer to the SENIC data set in Appendix C.1. a. For each geographic region, regress average length of stay in hospital (Y ) against infection risk (X ). Assume that first-order regression model (1.1) is appropriate for each region. State the estimated regression functions. b. Are the estimated regression functions similar for the four regions? Discuss. c. Calculate MSE for each region. Is the variability around the fitted regression line approximately the same for the four regions? Discuss. 1.47. Refer to Typographical errors Problem 1.42. Assume that first-order regression model (1.1) is appropriate, with normally distributed independent error terms whose variance is 2 = 16. a. State the likelihood function for the six observations, for 2 = 16. b. Obtain the maximum likelihood estimates of 0 and 1 , using (1.27). c. Using a computer graphics or statistics package, obtain a three-dimensional plot of the likelihood function for values of 0 between 0 = -10 and 0 = 10 and for values of 1 between 1 = 17 and 1 = 19. Does the likelihood appear to be maximized by the maximum likelihood estimates found in part (b)?
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Stanford - EE - 278
EE 278Statistical Signal ProcessingOctober 9, 2009Handout #5Homework #2 Solutions1. The cdf of random variable X is given byFX (x) =132+ 3 (x + 1)21 x 0x < 10a. Find the probabilities of the following events.1A = cfw_X > 3 ,B = cfw_|X |
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #1 Due: Wednesday September 30September 23, 2009 Handout #1You may either hand the assignment in class or drop it in the Homework In box in the EE 278 drawer of the class le cabinet on the second oor of the
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #1 SolutionsOctober 2, 2009 Handout #31. Monty Hall. (Bonus) Gold is placed behind one of three curtains. A contestant chooses one of the curtains. Monty Hall, the game host, opens an unselected empty curtai
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #2 Due: Wednesday October 7 1. Probabilities from cdf. The cdf of random variable X is given by FX (x) =1 3 2 + 3 (x + 1)2September 30, 2009 Handout #21 x 0 x < 10a. Find the probabilities of the followin
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #3 Due: Wednesday October 14October 7, 2009 Handout #41. Family planning. Alice and Bob choose a number X at random with equal probability from the set cfw_2, 3, 4. If the outcome is X = x, they decide to ha
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #3 SolutionsOctober 16, 2009 Handout #71. Family planning. Alice and Bob choose a number X at random with equal probability from the set cfw_2, 3, 4. If the outcome is X = x, they decide to have children unt
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #4 Due: Wednesday October 21October 14, 2009 Handout #61. Two envelopes. A xed amount a is placed in one envelope and an amount 5a is placed in the other. One of the envelopes is opened (each envelope is equ
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #4 SolutionsOctober 23, 2009 Handout #91. Two envelopes. A xed amount a is placed in one envelope and an amount 5a is placed in the other. One of the envelopes is opened (each envelope is equally probable),
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #5 Due: Wednesday October 28October 21, 2009 Handout #81. Additive-noise channel with path gain. Consider the additive noise channel shown in the gure below, where X and Z are zero mean and uncorrelated, and
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #5 SolutionsOctober 30, 2009 Handout #121. Additive-noise channel with path gain. Consider the additive noise channel shown in the gure below, where X and Z are zero mean and uncorrelated, and a and b are co
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #6 Due: Wednesday November 4October 28, 2009 Handout #101. Gaussian random vector Suppose X N (, ) is a Gaussian random vector with 1 110 = 5 and = 1 4 0 . 2 009 a. Find the pdf of X1 . b. Find the pdf of X2
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #6 SolutionsNovember 6, 2009 Handout #141. Gaussian random vector Suppose X N (, ) is a Gaussian random vector with 1 110 = 5 and = 1 4 0 . 2 009 a. Find the pdf of X1 . b. Find the pdf of X2 + X3 . c. Find
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #7 Due: Wednesday, November 18November 11, 2009 Handout #171. Convergence examples. Consider the following sequences of random variables dened on the probability space (, F , P), where = cfw_0, 1, . . . , m
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #7 SolutionsNovember 20, 2009 Handout #191. Convergence examples. Consider the following sequences of random variables dened on the probability space (, F , P), where = cfw_0, 1, . . . , m 1, F is the collec
Stanford - EE - 278
EE 278 Statistical Signal Processing Homework #8 Due: Wednesday, December 2November 18, 2009 Handout #181. Discrete-time Wiener process. Let cfw_Zn : n 0 be a discrete-time white Gaussian noise process; that is, Z1 , Z2 , Z3 , . . . are i.i.d. N (0, 1).
Stanford - EE - 278
Lecture Notes 0 Course Intro duction EE 278 in EE Curriculum Statistical Signal Processing Course Goal Topics Lecture NotesEE 278: Course Introduction01EE 278 in EE Curriculum EE at Stanford: ve laboratories in two general areas: Computer Systems Lab
Stanford - EE - 278
Lecture Notes 1 Review of Basic Probability Theory Probability Space and Axioms Basic Laws Conditional Probability and Bayes Rule Indep endenceEE 278: Basic Probability11Probability Theory Probability theory provides the mathematical rules for assigni
Stanford - EE - 278
Lecture Notes 2 Random Variables Denition Probability Mass Function (PMF) Cumulative Distribution Function (CDF) Probability Density Function (PDF) Functions of a Random Variable Application: Generation of Random VariablesEE 278: Random Variables21Rand
Stanford - EE - 278
Lecture Notes 3 Two Random Variables Joint, Marginal, and Conditional PMFs Joint, Marginal, and Conditional CDFs, PDFs One Discrete and one Continuous Random Variables Signal Detection: MAP Rule Functions of Two Random VariablesEE 278: Two Random Variabl
Stanford - EE - 278
Lecture Notes 4 Expectation Denition and Properties Mean and Variance Markov and Chebyshev Inequalities Covariance and Correlation Conditional ExpectationEE 278: Expectation41Expectation Let X X be a discrete r.v. with pmf pX (x) and let g (x) be a fu
Stanford - EE - 278
Lecture Notes 5 Mean Square Error Estimation Minimum MSE Estimation Linear Estimation Jointly Gaussian Random VariablesEE 278: Mean Square Error Estimation51Minimum MSE Estimation Consider the following signal processing problem: X fX (x) Noisy Channe
Stanford - EE - 278
Lecture Notes 6 Random Vectors Joint, Marginal, and Conditional CDF, PDF, PMF Indep endence and Conditional Indep endence Mean and Covariance Matrix Mean and Variance of Sum of RVs Gaussian Random Vectors MSE Estimation: the Vector CaseEE 278: Random Vec
Stanford - EE - 278
Lecture Notes 7 Convergence and Limit Theorems Motivation Convergence with Probability 1 Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLTEE 278: Convergence and Limit Theorems71Motivation One of the key ques
Stanford - EE - 278
Lecture Notes 8 Random Pro cesses Denition and Simple Examples Discrete Time Random Processes IID Random Walk Process Markov Processes I n d e p e n d e n t I n c r e m e n t Pr o c e s s e s Gauss-Markov Process Mean and Autocorrelation Function Gaussian
Stanford - EE - 278
Lecture Notes 9 Stationary Random Pro cesses Strict-Sense and Wide-Sense Stationarity Autocorrelation Function of a Stationary Process Power Sp ectral Density Resp onse of LTI System to WSS Process Input Linear Estimation: the Random Process CaseEE 278:
Stanford - EE - 278
EE 278 Statistical Signal Processing Sample Midterm Examination ProblemsOctober 28, 2009 Handout #11The following are old midterm problems. The midterm will cover lecture notes 15, pages 16 of lecture notes 6, and homeworks 15, including the Schwarz and
Stanford - EE - 278
EE 278 Statistical Signal Processing Sample Midterm Examination ProblemsNovember 4, 2009 Handout #131. Inequalities. Label each of the following statements with =, , , or None. Label a statement with = if equality always holds. Label a statement with or
Stanford - EE - 278
EE 278 Statistical Signal ProcessingWednesday, November 11, 2009 Handout #16Midterm Examination Solutions 1. Inequalities. a. eS is a convex function so by Jensens inequality E(eS ) eE(s) = e Since E(S ) = 1. b. S is a concave function so by Jensens ine
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set One Due Wednesday, September 301. Some practice with geometric sums and complex exponentials (5 points each) Well make much use of formulas for the sum of a geometric series, especia
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Solutions to Problem Set One1. Some practice with geometric sums and complex exponentials (5 points each) Well make much use of formulas for the sum of a geometric series, especially in combinat
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Two Due Wednesday, October 7, 20091. (10 points) A famous sum You cannot go through life knowing about Fourier series and not know the application to evaluating a very famous sum. Le
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Solutions to Problem Set Two1. (10 points) A famous sum You cannot go through life knowing about Fourier series and not know the application to evaluating a very famous sum. Let S (t) be the saw
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Three Due Wednesday, October 14, 20091. (25 points) Piecewise linear approximations and Fourier transforms. (a) The stretched triangle function is dened by a (t) = (t/a) = Find F a (
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Solutions to Problem Set Three1. (25 points) Piecewise linear approximations and Fourier transforms. (a) The stretched triangle function is dened by a (t) = (t/a) = Find F a (s). (b) Find the Fo
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Four Due Wednesday, October 211. (10 points) Eva and Rajiv continue their conversation about convolution: Rajiv: You know, convolution really is a remarkable operation, the way it im
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Solutions to Problem Set Four1. (10 points) Eva and Rajiv continue their conversation about convolution: Rajiv: You know, convolution really is a remarkable operation, the way it imparts propert
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Five Due Wednesday, October 281. (10 points) Expected values of random variables, orthogonality, and approximation Let X be a random variable with probability distribution function p
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Five Due Wednesday, October 281. (10 points) Expected values of random variables, orthogonality, and approximation Let X be a random variable with probability distribution function p
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Six Due Wednesday, November 41. (10 points) Downconversion A common problem in radio engineering is downconversion to baseband. Consider a signal f (t) whose spectrum F f (s) satises
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Solutions to Problem Set Six1. (10 points) Downconversion A common problem in radio engineering is downconversion to baseband. Consider a signal f (t) whose spectrum F f (s) satises F f (s ) = 0
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Seven Due Wednesday, November 111. (20 points) Handels Hallelujah In this problem we will explore the eects of sampling with or without anti-aliasing lters. As we saw in lecture ther
Stanford - EE - 261
EE 261 The Fourier Transform and its Applications Fall 2009 Problem Set Eight Due Wednesday, November 181. (10 points) Dierent denitions for the DFT This is an alternate version, in one respect, to Section 6.9 in the notes, on dierent denitions of the DF
Arizona - EE - 591
EEE 523 Advanced Analog Integrated Circuits LaboratoryLab 1 Design and Analysis of Folded Cascode AmplifierSubmitted by Saurabh Naik ASU ID: 1201916850 Date: 11/16/2009Abstract: The folded cascode amplifier is a special variation of an amplifier where
Arizona - EE - 591
EEE 591 Analog Integrated Circuits LaboratoryLab 5 Design of Automatic Gain Control CircuitSubmitted by Saurabh Naik ASU ID: 1201916850 Date: 11/17/2009Abstract: In the early years of radio circuits, fading (defined as slow variations in the amplitude
Arizona - EE - 591
EEE 591 Analog Integrated Circuits LaboratoryLab 6 Design and Analysis of Common Drain AmplifierSubmitted by Saurabh Naik ASU ID: 1201916850 Date: 11/24/2009Abstract: In electronics, a common-drain amplifier, also known as a source follower, is one of
Northeastern - BUS - 281
Name_ Bus 220 Quiz Chapters Garrison 12 and 13 1. Hayworth Company has just segmented last year's income statements into its ten product lines. The chief executive officer (CEO) is curious as to what effect dropping one of the product lines at the beginni
Northeastern - BUS - 281
BUSI 562Sample Test Chapter 5,6,71. When the activity level is expected to decline within the relevant range, what effects would be anticipated with respect to each of the following? F ix e d c o s t p e r u n it In c re a se In c re a se N o change N o
Calhoun Community College - BUSINESS - acct 3211
CHAPTER 14LONG-TERM LIABILITIESMULTIPLE CHOICEConceptualAnswera a b d d b a d d c d d d d c b d d d d c d d d c c. c d b b cNo.1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. *27. *28. *29. *30. *31.De
Calhoun Community College - BUSINESS - acct 3211
CHAPTER 12INTANGIBLE ASSETSMULTIPLE CHOICEConceptualAnswerc b d d d c d b c a a d a b a d d c b aNo.1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.DescriptionAccounting for internally-created intangibles. Amortization method
Calhoun Community College - BUSINESS - acct 3211
CHAPTER 13CURRENT LIABILITIES AND CONTINGENCIESMULTIPLE CHOICEConceptualAnswerd d a a b d d d c d c d d c d d d d a d c d b a c d b c c c a d d dNo.1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28
Calhoun Community College - BUSINESS - acct 3211
Take Assessment Assignment1Name: Instructions: Assignment1Thereare19questionsinthisassignmentincludingmultiple choices,multipleanswers,fillintheblank,matching,andselection fromadropdownlist.Youhaveonlyonechancetodothe assignmentandyouneedtofinishitwithi
Calhoun Community College - BUSINESS - acct 3211
No. 1. 2.aACCT3212 (Spring 2007)cash common stock cash note payable b interest expense cash c interest expense interest payable equipment cash b depreciation expense accumulated depreciation expense 4.a prepaid rent cash b rent expense prepaid rent 5.a
Calhoun Community College - BUSINESS - acct 3211
Problem 1 Multiple ChoiceFirst MidtermWinter 20051. Stockholders equity represents: a. The amount invested in the corporation by the stockholders. b. The amount earned by the company since incorporation. c. The amount owed the stockholders in a liquida
Calhoun Community College - BUSINESS - acct 3211
Problem 1 Multiple Choice First Midterm Spring 2006 11. On November 30, 2004, Bend Corp. Issued $300,000 maturity value, 6% bonds for $300,000 cash. The bonds were dated October 31, 2004. Interest will be paid semiannually. How much cash did they receive?
Calhoun Community College - BUSINESS - acct 3211
To: Dr. Jan From: Lina Xia, LiPing Zhou. Date: May 22, 2007 Subject: Analysis Martek Biosciences Corp.s assets depreciation The kind of assets that is subject of the controversy described in the Wall Street Journal article is idle asset in part of propert
Calhoun Community College - BUSINESS - acct 3211
ACCT3212 (Spring 2007)No. 1. 2.a b c 3.a b 4.a b 5.a b 6.a b 7.ab c 8.acash common stock cash note payable interest expense cash interest expense interest payable equipment cash depreciation expense accumulated depreciation expense prepaid rent cash re
Calhoun Community College - BUSINESS - acct 3211
As we are a consulting firm, we will help (company name) improve the characteristics of their website. There are the explanations of what content is on the website and also how the design of the website makes high quality.Content 1. Products 1) having th
Calhoun Community College - BUSINESS - acct 3211
To: Dr. Jan From: Lina Xia, LiPing Zhou. Date: May 22, 2007 Subject: Analysis Martek Biosciences Corp.s depreciation According to David Reilly, the kind of assets that is subject of the controversy described in this article is idle asset in part of proper
Calhoun Community College - BUSINESS - acct 3211
Chapter 5(B) Homework Solution and etc.P 5-4: Percentage-of-Completion method Contract price (i.e. total revenue for this project) is $10,000,000 1. Calculate the amount of gross profit to be recognized in each year: 2006: Total actual costs incurred to