This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Chapter 10 Regression Analysis We are often interested in comparisons among several distributions or relationships among sev eral variables. A study of data often leads us to ask whether there is a causeandeffect relation between two or more variables. Regression Analysis is a statistical method for investigating such relationships. The goal is to build a good model  a prediction equation relating the effect to causes . 10.1 Introduction, Scatter Plot and Correlation Example 10.1 Height – Weight, Overweight – Health Example 10.2 Does smoking cause lung cancer? Table below summarizes a study carried out by government statisticians in England. The data concern 25 occupational groups and are condensed from data on thousands of individual men. One variable is smoking ratio which is a measure of the number of cigarettes smoked per day by men in each occupation relative to the number smoked by all men of the same age. Another variable is the standardized mortality ratio . 101 Smoking Mortality Smoking Mortality 77 84 112 96 137 116 113 144 117 123 110 139 94 128 125 113 116 155 133 146 102 101 115 128 111 118 105 115 93 113 87 79 88 104 91 85 102 88 100 120 91 104 76 60 104 129 66 51 107 86 Response variable, Explanatory variable: A response variable measures an outcome of a study. An explanatory variable explains or causes changes in the response variables. It is convention to use y to denote the response variable , and x 1 , x 2 , etc, denote the ex planatory variables. Note: The response variable is also called dependent variable, and the explanatory variable is called independent or predictor variable. Warning: • When you observe two variables, there may or may not be cause and effect relationship. Scatter Plots Always plot the explanatory variable, if there is one, on the horizontal axis. If there is no explanatoryresponse distinction, either variable can go on the horizontal axis. Interpreting scatterplots • Overall pattern • Direction 102 – Positive association: aboveaverage ( below average) values of x tends to accompany above average ( below average) values of y – Negative association: above average ( below average) values of x tends to accompany below average ( above average) values of y • Form of relationship: linear? • Strength of the relationship • Outliers or other deviation from the pattern The coefficient of correlation: Measures the strength and direction of the linear relationship between two variables x and y . Given ( x 1 ,y 1 ) , ( x 2 ,y 2 ) , ··· , ( x n ,y n ). The Person product moment coefficient of correlation r between x and y is: r = S xy p S xx S yy where S xy = n X i =1 ( x i ¯ x )( y i ¯ y ) = n X i =1 x i y i n ¯ x ¯ y, S xx = n X i =1 ( x i ¯ x ) 2 , S yy = n X i =1 ( y i ¯ y ) 2 Properties: • Always 1 ≤ r ≤ 1 . Extreme values ± 1 occur only when the points lie exactly along a straight line....
View
Full
Document
 Fall '09
 lisiufeng
 Math, Statistics, Regression Analysis

Click to edit the document details