Unformatted text preview: M316 Chapter 4 Dr. Berg Scatterplots and Correlation Sometimes variables measured on individuals are not independent of each other. Explanatory and Response Variables To study a relationship between two variables, we measure both variables on the same individuals. We often think of one variable as explaining or influencing the other. Definition A response variable measures the outcome of a study or experiment. An explanatory variable may explain or influence the response variable. People sometimes call the response variable the dependent variable and the explanatory variable the independent variable. These have other meanings in statistics, so we will avoid this usage. Note also that the explanatory variable does not necessarily cause the response. Example A states previous voting record can be used to predict future votes, so it is convenient to classify previous votes as explanatory even though they don't cause the later outcomes. Example (4.1) Student volunteers at Ohio State University drank different amounts of beer, and thirty minutes later had their blood alcohol levels measured. The amount of beer consumed is the explanatory variable and the blood alcohol level is the response variable. As usual the analysis consists of: a) plotting the data and looking for overall patterns and deviations; and b) choosing appropriate numerical summaries of the data. Example (4.1) Which (if any) of these would be the explanatory variable, and which would be the response variable? a) The time spent studying for an exam versus the score on the exam. b) The weight of a person versus height. c) Hours of extracurricular activity versus GPA. d) The score on the SAT math exam versus the score on the verbal exam. 1 M316 Chapter 4 Dr. Berg Displaying Relationships: Scatterplots The most useful graph for displaying the relationship between two quantitative variables is a scatterplot. Definition A scatterplot shows the relationship between two quantitative variables measured on the same individuals. Each individual is represented by a dot with the xcoordinate representing one variable and the ycoordinate representing the other. If one is explanatory and the other a response variable, the explanatory value goes on the xaxis and the response goes on the yaxis. Example (4.3) Some people use average SAT scores to rank state school systems. This is not proper, because state average scores depend on more than just school quality. Following the fourstep process, we look at one influence on state SAT scores. State: The percent of high school students who take the SAT varies from state to state. Does this fact help explain differences among the states in average SAT score? Formulate: Examine the relationship between percent taking and state mean score. Choose the explanatory and response variables (if any). Make a scatterplot to show the relationship and interpret the plot. Solve (first steps): We suspect that the "percent taking" will help explain the "mean score", so "percent taking" is the explanatory variable and "mean score" is the response variable. Does the scatterplot support our suspicion? 2 M316 Interpreting Scatterplots Chapter 4 Dr. Berg Now we apply our principles of data analysis using a graph. Examining a Scatterplot Given any graph of data, look for the overall pattern, and for striking deviations from that pattern. You can describe the overall pattern of a scatterplot by the direction, form, and strength of the relationship. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. Example In the previous example, what line would come "closest" to all the data points? What direction would it go (increasing or decreasing)? Can you think of an explanation for this pattern? Definition Two variables are positively associated when above average values of one tend to accompany above average values of the other, and below average values of one tend to accompany below average values of the other. Two variables are negatively associated when the opposite occurs. Exercise Make a scatterplot of these scores of 12 golfers in 2 rounds of a tournament. Is there a relationship? If so, is it positive or negative? Is it linear? Is there an explanatory variable? Player 1 2 3 4 5 6 7 8 9 10 11 12 Round 1 89 90 87 95 86 81 102 105 83 88 91 79 Round 2 94 85 89 89 81 76 107 89 87 91 88 80 Adding Categorical Variables to Scatterplots To add a categorical variable to a scatterplot, use a different color or symbol for each category. Example We show the scatterplot for mean SAT score and percent of high school graduates who take the test for only the northeastern (+) and midwestern () only. 3 M316 Chapter 4 Dr. Berg Measuring Linear Association: Correlation The relationship between two quantitative variables is often linear. For this we have a numerical measure called correlation. Definition Suppose quantitative variables x and y for n individuals take values xi and yi respectively for 1 i n . The correlation between x and y is 1 n x i - x y i - y r= n -1 i=1 sx sy and sy are the standard deviations. where sx Notice that the data values for each variable are standardized by subtracting the mean and dividing by the standard deviation. This should be familiar from chapter 3 where we converted nonstandard uniform distributions to standard distributions. 4 M316 Chapter 4 Dr. Berg Exercise (4.8) Find the correlation of these 5 years of data for coffee prices paid to growers in Indonesia and the percent of forest lost in a national park in a coffee producing region. Price (cents/pound) 29 40 54 55 72 Forest lost (percent) 0.49 1.59 1.69 1.82 3.10 Facts About Correlation 1) Correlation does not distinguish between an explanatory and a response variable. 2) Because r uses standardized data values, it does not change when units of measurement are changed. 3) Positive r indicates a positive association and negative r indicates a negative association. 4) The correlation r is always a number between 1 and 1. Example (4.6) Here are scatterplots for six different correlation values. 5 M316 Chapter 4 Dr. Berg Cautions About Correlation 1) Both variables must be quantitative. 2) Correlation measures the strength of association only for a linear relation. 3) Correlation is not resistant. 4) Correlation is not a complete summary of twovariable. Exercise Suppose the speed (in miles per hour) and mileage (in miles per gallon) for some car fit the following data. Speed 20 30 40 50 60 Mileage 24 28 30 28 24 Make a scatterplot and find the correlation. 6 ...
View Full Document
- Fall '08
- Standard Deviation, response variable, explanatory variable, Dr. Berg