Is there a rating category tending toward lower box office revenues than expected? Explain. (h) Repeat (g) using the movie genre (e.g., comedy) as the categorical variable. (i) In analyzing these movies, the researchers also looked at the data set after removing the 6 movies that earned more than \$200 million. Subset the data (e.g., Investigation 2.1) and then examine the scatterplot, correlation coefficient, and least-squares regression line for predicting the movie revenue from the critics’ rating score for these data. New correlation coefficient: New least-squares regression line: (j) Describe the effect of removing these 6 movies from the analysis. Definition: An observation or set of observations is considered influential if removing the observation from the data set substantially changes the values of the correlation coefficient and/or the least squares regression equation. Typically, observations that have extreme explanatory variable outcomes (far below or far above ) are potentially influential. To measure the influence of an observation, it is removed, and measures are calculated for how much the summary results change. It is not always the case that the points with the largest residuals are the most influential. In this example you should have seen that removing those six movies actually makes the relationship between box office income and critic scores much weaker ( r drops from 0.42 to about 0.3). Study Conclusions There does appear to be a weak relationship between the composite critics’ scores and the amount of money the movie makes at the box office, with higher rated movies making more money. If the composite critics’ score is 10 points higher, we predict the movie wi ll make about 18.57 million more dollars. It is interesting to note that this regression line will tend to overestimate the amount of revenue for an R rated movie and underestimate the revenue of action movies. If the top 6 grossing movies of 2003 ( Bruce Almighty, Finding Nemo, Pirates of the Caribbean, The Lord of the Rings III, The Matrix Reloaded, and X2: X-Men United ) are removed, the relationship is not as strong, but still shows a weak positive linear association ( r = 0.3). x

Chance/Rossman, 2015 ISCAM III Investigation 5.9 381 Practice Problem 5.9 (a) Which do you think will be more resistant to outliers, the regression line that minimizes the sum of squared errors or the regression line that minimizes the sum of the absolute errors? Explain. (b) Investigate your conjecture using the Analyzing Two Quantitative Variables applet. Write a short paragraph summarizing your analysis and your observations. Section 5.3 Summary In this section you considered the association between two quantitative variables. Because this was your first encounter with this type of analysis scenario, we returned to several themes from Chapter 1, including starting with graphical displays and numerical summaries. You learned that the relevant graphical display for describing the association between two quantitative variables is a scatterplot , and that some features to look for in a scatterplot are the direction , strength , and linearity of the association.
