This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Simple regression (Aug 23, 2010) Review Basic equations for simple regression The purpose of regression is to describe the relationship between a response variable y and a predictor variable X, and to evaluate the strength of the relationship if it exists. In its simplest form, the “linear model” of the relationship between a continuous variable y and a variable X is assumed to be linear (straight line), and the equation expressing the linear relationship between y and X is written as: ) , ( ~ 2 1 σ ε ε β β NID where X y i i i i + + = . In this expression, β is called the “intercept” (the value of y when X = 0) and β 1 is the “slope” (the change in y associated with each unit change in X ). The Greek letters signify that they are “population parameters,” or characteristics of the population, rather than “sample statistics” used to estimate population parameters. The equation says that when you consider all the people or communities or organizations in your population, one attribute (e.g., political conservatism of individuals) of the population, y , is linearly (straight line) related to a second attribute (e.g., age) of the population, X ( ) 1 i i X y β β + = , but not perfectly, and the departure of individual observations from the value predicted by the equation ( ε i ) is our understanding of the term “statistical error” or “residual.” Taken together, the two “parameters” of the model provide a complete description of the straight line relationship. One characteristic of population parameters is that they are fixed quantities of the population; that is, if you have N people living in a community at noon on September 1, 2010, then β 1 is an expression of the relationship between the variables y and X for all of the people living in the community at that time. Departures of the observations from the prediction line are referred to as statistical error (ε i ) and the error terms are assumed to have particular characteristics; they are said to be “normally and independently distributed with mean zero and variance 2 σ ,hence we say ). , ( ~ 2 σ ε NID i The error term for each observation , i ε , and population variance , σ 2 , is also population parameters. Usually, we don’t know the value of the population means (µ y and µ X ), variances (σ y and σ X ), correlations (ρ y and ρ X ), intercepts or slopes, so we estimate their value by taking a sample from the population. These estimates are our “sample statistics.” If the sample is random, so that neither the researchers nor the subjects of the study have any influence over who is selected, then the sample statistics are “unbiased” estimates of the population parameters. In the discussion that follows, we will focus on estimating sample statistics and we will assume the data are from a random sample of size n (n < N)....
View Full Document
This note was uploaded on 02/18/2012 for the course STAT 404 taught by Professor Staff during the Spring '08 term at Iowa State.
- Spring '08