101 Pages

IPS6e.PPT.Ch02

Course: STAT 1601, Spring 2012
School: Minnesota
Rating:
 
 
 
 
 

Word Count: 5093

Document Preview

Chapter LookingatDataRelationships Scatterplots IPS 2.1 2009 W. H. Freeman and Company Objectives(IPSChapter2.1) Scatterplots Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Categorical variables in scatterplots Scatterplot smoothers ExaminingRelationships Most statistical studies involve more than one variable. Questions: What individuals does the...

Register Now

Unformatted Document Excerpt

Coursehero >> Minnesota >> Minnesota >> STAT 1601

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Chapter LookingatDataRelationships Scatterplots IPS 2.1 2009 W. H. Freeman and Company Objectives(IPSChapter2.1) Scatterplots Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Categorical variables in scatterplots Scatterplot smoothers ExaminingRelationships Most statistical studies involve more than one variable. Questions: What individuals does the data describe? What variables are present and how are they measured? Are all of the variables quantitative? Do some of the variables explain or even cause changes in other variables? Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one? Student Beers Blood Alcohol 1 5 0.1 2 2 0.03 3 9 0.19 6 7 0.095 7 3 0.07 9 3 0.02 11 4 0.07 13 5 0.085 4 8 0.12 5 3 0.04 8 5 0.06 10 5 0.05 12 6 0.1 14 7 0.09 15 1 0.01 16 4 0.05 Lookingatrelationships Start with a graph Look for an overall pattern and deviations from the pattern Use numerical descriptions of the data and overall pattern (if appropriate) Scatterplots In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph. Student Beers BAC 1 5 0.1 2 2 0.03 3 9 0.19 6 7 0.095 7 3 0.07 9 3 0.02 11 4 0.07 13 5 0.085 4 8 0.12 5 3 0.04 8 5 0.06 10 5 0.05 12 6 0.1 14 7 0.09 15 1 0.01 16 4 0.05 Explanatoryandresponsevariables A response variable measures or records an outcome of a study. An explanatory variable explains changes in the response variable. Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis. Response (dependent) variable: blood alcohol content y x Explanatory (independent) variable: number of beers Some plots dont have clear explanatory and response variables. Do calories explain sodium amounts? Does percent return on Treasury bills explain percent return on common stocks? Interpretingscatterplots After plotting two variables on a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern Direction: positive, negative, no direction Form: linear, curved, clusters, no pattern Strength: how closely the points fit the form and deviations from that pattern. Outliers Formanddirectionofanassociation Linear No relationship Nonlinear Positive association: High values of one variable tend to occur together with high values of the other variable. Negative association: High values of one variable tend to occur together with low values of the other variable. No relationship: X and Y vary independently. Knowing X tells you nothing about Y. Strengthoftheassociation The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. With a strong relationship, you can get a pretty good estimate of y if you know x. With a weak relationship, for any x you might get a wide range of y values. This is a weak relationship. For a particular state median household income, you cant predict the state per capita income very well. This is a very strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value. Howtoscaleascatterplot Same data in all four plots Using an inappropriate scale for a scatterplot can give an incorrect impression. Both variables should be given a similar amount of space: Plot roughly square Points should occupy all the plot space (no blank space) Outliers An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship. Not an outlier: Outliers The upper right-hand point here is not an outlier of the relationshipIt is what you would expect for this many beers given the linear relationship between beers/weight and blood alcohol. This point is not in line with the others, so it is an outlier of the relationship. IQ score and Grade point average a) Describe in words what this plot shows. a) Describe the direction, shape, and strength. Are there outliers? a) What is the deal with these people? Categoricalvariablesinscatterplots Often, things are not simple and one-dimensional. We need to group the data into categories to reveal trends. What may look like a positive linear relationship is in fact a series of negative linear associations. Plotting different habitats in different colors allows us to make that important distinction. Comparison of men and women racing records over time. Each group shows a very strong negative linear relationship that would not be apparent without the gender categorization. Relationship between lean body mass and metabolic rate in men and women. Both men and women follow the same positive linear trend, but women show a stronger association. As a group, males typically have larger values for both variables. Categoricalexplanatoryvariables When the explanatory variable is categorical, you cannot make a scatterplot, but you can compare the different categories side by side on the same graph (boxplots, or mean +/ standard deviation). Comparison of income (quantitative response variable) for different education levels (five categories). But be careful in your interpretation: This is NOT a positive association, because education is not quantitative. Example:Beetlestrappedonboardsofdifferentcolors Beetles were trapped on sticky boards scattered throughout a field. The sticky boards were of four different colors (categorical explanatory variable). The number of beetles trapped (response variable) is shown on the graph below. ? What association? What relationship? Blue White Green Yellow Board color Blue Green White Yellow Board color Describe one category at a time. When both variables are quantitative, the order of the data points is defined entirely by their value. This is not true for categorical data. Scatterplotsmoothers When an association is more complex than linear, we can still describe the overall pattern by smoothing the scatterplot. You can simply average the y values separately for each x value. When a data set does not have many y values for a given x, software smoothers form an overall pattern by looking at the y values for points in the neighborhood of each x value. Smoothers are resistant to outliers. Time plot of the acceleration of the head of a crash test dummy as a motorcycle hits a wall. The overall pattern was calculated by a software scatterplot smoother. LookingatDataRelationships Correlation IPS Chapter 2.2 2009 W. H. Freeman and Company Objectives(IPSChapter2.2) Correlation The correlation coefficient r r does not distinguish between x and y r has no units of measurement r ranges from -1 to +1 Influential points Thecorrelationcoefficient"r" The correlation coefficient is a measure of the direction and strength of a linear relationship. It is calculated using the mean and the standard deviation of both the x and y variables. Correlation can only be used to describe quantitative variables. Categorical variables dont have means and standard deviations. Thecorrelationcoefficient"r" Time to swim: x = 35, sx = 0.7 Pulse rate: y = 140 sy = 9.5 Part of the calculation involves finding z, the standardized score we used when working with the normal distribution. You DON'T want to do this by hand. Make sure you learn how to use your calculator or software. Standardization: Allows us to compare correlations between data sets where variables are measured in different units or when variables are different. For instance, we might want to compare the correlation between [swim time and pulse], with the correlation between [swim time and breathing rate]. rdoesnotdistinguishx&y The correlation coefficient, r, treats x and y symmetrically. r = -0.75 r = -0.75 "Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75). "r"hasnounit Changing the units of variables does not change the correlation coefficient "r", because we get rid of all our units when we standardize (get z-scores). z-score plot is the same for both plots r = -0.75 r = -0.75 "r"ranges from1to+1 "r" quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y. When variability in one or both variables decreases, the correlation coefficient gets stronger ( closer to +1 or -1). Correlationonlydescribeslinearrelationships No matter how strong the association, r does not describe curved relationships. Note: You can sometimes transform a non-linear association to a linear form, for instance by taking the logarithm. You can then calculate a correlation using the transformed data. Influentialpoints Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers. Just moving one point away from the general trend here decreases the correlation from -0.91 to -0.75 Try it out for yourself --- companion book website http://www.whfreeman.com/ips6e Adding two outliers decreases r from 0.95 to 0.61. Reviewexamples 1) What is the explanatory variable? Describe the form, direction, and strength of the relationship? Estimate r. (in 1000s) 2) If women always marry men 2 years older than themselves, what is the correlation of the ages between husband and wife? ageman = agewoman + 2 equation for a straight line Thoughtquizoncorrelation 1. Why is there no distinction between explanatory and response variables in correlation? 2. Why do both variables have to be quantitative? 3. How does changing the units of measurement affect correlation? 4. What is the effect of outliers on correlations? 5. Why doesnt a tight fit to a horizontal line imply a strong correlation? LookingatDataRelationships LeastSquaresRegression IPS Chapter 2.3 2009 W. H. Freeman and Company Objectives(IPSChapter2.3) Least-squares regression Regression lines Prediction and Extrapolation Correlation and r2 Transforming relationships Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data? Theregressionline A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. In regression, the distinction between explanatory and response variables is important. Theregressionline The least-squares regression line is the unique line such that the sum of the squared vertical (y) distances between the data points and the line is as small as possible. Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added (Pythagoras). Properties The least-squares regression line can be shown to have this equation: y = b 0 + b1 x y is the predicted y value (y hat) b1 is the slope b0 is the y-intercept Howto: First we calculate the slope of the line, b1; from statistics we already know: b1 = r r is the correlation. sy is the standard deviation of the response variable y. sx is the the standard deviation of the explanatory variable x. sy sx Once we know b1, the slope, we can calculate b0, the y-intercept: b 0 = y b1 x where x and y are the sample means of the x and y variables Typically, we use a 2-var stats calculator or stats software. BEWARE!!! Not all calculators and software use the same convention. Some use: y = a + bx And some use: = x b y a+ Make sure you know what YOUR calculator gives you for a and b before you answer homework or exam questions. Softwareoutput intercept slope R2 r R2 intercept slope The equation completely describes the regression line. To plot the regression line you only need to plug two x values into the equation, get y, and draw the line that goes through those points. Hint: The regression line always passes through the mean of x and y. The points you use for drawing the regression line are derived from the equation. They are NOT points from your sample data (except by pure coincidence). The distinction between explanatory and response variables is crucial in regression. If you exchange y for x in calculating the regression line, you will get the wrong line. Regression examines the distance of all points from the line in the y direction only. Hubble telescope data about galaxies moving away from earth: These two lines are the two regression lines calculated either correctly (x = distance, y = velocity, solid line) or incorrectly (x = velocity, y = distance, dotted line). Correlationversusregression The correlation is a measure In regression we examine of spread (scatter) in both the the variation in the response x and y directions in the linear variable (y) given change in relationship. the explanatory variable (x). Makingpredictions The equation of the least-squares regression allows you to predict y for any x within the range studied. 0 4 . 0 y.4 08 =1 +0 0x 0 Nobody in the study drank 6.5 beers, but by finding the value of y from the regression line for x = 6.5 we would expect a blood alcohol content of 0.094 mg/ml. y = 0.0144 * 6.5 + 0.0008 y = 0.936 + 0.0008 = 0.0944 mg/ml (in 1000s) Year Pow er b oat s Dead Manate es 1 9 77 4 47 13 1 9 78 4 60 21 1 9 79 4 81 24 1 9 80 4 98 16 1 9 81 5 13 24 1 9 82 5 12 20 1 9 83 5 26 15 1 9 84 5 59 34 1 9 85 5 85 33 1 9 86 6 14 33 1 9 87 6 45 39 1 9 88 6 75 43 1 9 89 7 11 50 1 9 90 7 19 47 . x 41 y=0125 .4 There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. The least squares regression line has the equation: . x 41 y= 0125 .4 Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths? y = 0.125(500) 41.4 y = 62.5 41.4 = 21.1 Roughly 21 manatees. Extrapolation This can be a very stupid thing to do, as seen here. Height in Inches Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line. Height in Inches !!! !!! Theyintercept Sometimes the y-intercept is not biologically possible. Here we have negative blood alcohol content, which makes no sense But the negative value is appropriate for the equation of the regression line. There is a lot of scatter in the data, and the line is just an estimate. y-intercept shows negative blood alcohol Coefficientofdetermination,r2 r2 represents the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x. b1 = r sy sx r = -1 r2 = 1 Changes in x explain 100% of the variations in y. r = 0.87 r2 = 0.76 Y can be entirely predicted for any given value of x. r=0 r2 = 0 Changes in x explain 0% of the variations in y. The value(s) y takes is (are) entirely independent of what value x takes. Here the change in x only explains 76% of the change in y. The rest of the change in y (the vertical scatter, shown as red arrows) must be explained by something other x. r than =0.7 r2 =0.49 There is quite some variation in BAC for the same number of beers drank. A persons blood volume is a factor in the equation that was overlooked here. We changed number of beers to number of beers/weight of person in lb. r =0.9 r2 =0.81 In the first plot, number of beers only explains 49% of the variation in blood alcohol content. But number of beers / weight explains 81% of the variation in blood alcohol content. Additional factors contribute to variations in BAC among individuals (like maybe some genetic ability to process alcohol). Grade performance If class attendance explains 16% of the variation in grades, what is the correlation between percent of classes attended and grade? 1. We need to make an assumption: attendance and grades are positively correlated. So r will be positive too. 2. r2 = 0.16, so r = +0.16 = + 0.4 A weak correlation. Transformingrelationships A scatterplot might show a clear relationship between two quantitative variables, but issues of influential points or nonlinearity prevent us from using correlation and regression tools. Transforming the data changing the scale in which one or both of the variables are expressed can make the shape of the relationship linear in some cases. Example: Patterns of growth are often exponential, at least in their initial phase. Changing the response variable y into log(y) or ln(y) will transform the pattern from an upward-curved exponential to a straight line. Exponentialbacterialgrowth In ideal environments, bacteria multiply through binary fission. The number of bacteria can double every 20 minutes in that way. 4 5000 Log of bacterial count Bacterial count 4000 3000 2000 1000 3 2 1 0 0 0 30 60 90 120 150 180 210 240 Time (min) 1 - 2 - 4 - 8 - 16 - 32 - 64 - 0 30 60 90 120 150 180 210 240 Time (min) log(2n) = n*log(2) 0.3n Exponential growth 2n, Taking the log changes the growth not suitable for regression. pattern into a straight line. Bodyweightandbrainweight in96mammalspecies r = 0.86, but this is misleading. The elephant is an influential point. Most mammals are very small in comparison. Without this point, r = 0.50 only. Now we plot the log of brain weight against the log of body weight. The pattern is linear, with r = 0.96. The vertical scatter is homogenous good for predictions of brain weight from body weight (in the log scale). LookingatDataRelationships CautionsaboutCorrelationand Regression IPS Chapter 2.4 2009 W. H. Freeman and Company Objectives(IPSChapter2.4) Cautions about correlation and regression Residuals Outliers and influential points Lurking variables Correlation/regression using averages The restricted range problem Correlation/regressionusingaverages Many regression or correlation studies use average data. While this is appropriate, you should know that correlations based on averages are usually quite higher than those made on the raw data. The correlation is a measure of spread (scatter) in a linear relationship. Using averages greatly reduces the scatter. Therefore, r and r2 are typically greatly increased when averages are used. Boys Each dot represents an average. The variation among boys per age class is not shown. Boys These histograms illustrate that each mean represents a distribution of boys of a particular age. Should parents be worried if their son does not match the point for his age? If the raw values were used in the correlation instead of the mean, there would be a lot of spread in the y-direction, and thus the correlation would be smaller. That's why typically growth charts show a range of values (here from 5th to 95th percentiles). This is a more comprehensive way of displaying the same information. Residuals The distances from each point to the least-squares regression line give us potentially useful information about the contribution of individual data points to the overall pattern of scatter. These distances are called residuals. Points above the line have a positive residual. The sum of these residuals is always 0. Points below the line have a negative residual. Predicted Observed y dist. ( y y ) = residual Residualplots Residuals are the distances between y-observed and y-predicted. We plot them in a residual plot. If residuals are scattered randomly around 0, chances are your data fit a linear model, was normally distributed, and you didnt have outliers. The x-axis in a residual plot is the same as on the scatterplot. Only the y-axis is different. Residuals are randomly scatteredgood! Curved patternmeans the relationship you are looking at is not linear. A change in variability across a plot is a warning sign. You need to find out why it is, and remember that predictions made in areas of larger variability will not be as good. Outliersandinfluentialpoints Outlier: observation that lies outside the overall pattern of observations. Influential individual: observation that markedly changes the regression if removed. This is often an outlier on the x-axis. Child 19 = outlier in y direction Child 19 is an outlier of the relationship. Child 18 = outlier in x direction Child 18 is only an outlier in the x direction and thus might be an influential point. outlier in y-direction All data Without child 18 Without child 19 Are these points influential? influential Alwaysplotyourdata A correlation coefficient and a regression line can be calculated for any relationship between two quantitative variables. However, outliers greatly influence the results, and running a linear regression on a nonlinear association is not only meaningless but misleading. So make sure to always plot your data before you run a correlation or regression analysis. Alwaysplotyourdata! The correlations all give r 0.816, and the regression lines are all approximately = 3 + 0.5x. For all four sets, we would predict = 8 when x = 10. However, making the scatterplots shows us that the correlation/ regression analysis is not appropriate for all data sets. Moderate linear association; regression OK. Obvious nonlinear relationship; regression not OK. One point deviates from the highly linear pattern; this outlier must be examined closely before proceeding. Just one very influential point; all other points have the same x value; a redesign is due here. Lurkingvariables A lurking variable is a variable not included in the study design that does have an effect on the variables studied. Lurking variables can falsely suggest a relationship. What is the lurking variable in these examples? How could you answer if you didnt know anything about the topic? Strong positive association between number of firefighters at a fire site and the amount of damage a fire does. Negative association between moderate amounts of wine drinking and death rates from heart disease in developed nations. There is quite some variation in BAC for the same number of beers drank. A persons blood volume is a factor in the equation that we have overlooked. Now we change number of beers to number of beers/weight of person in lb. The scatter is much smaller now. Ones weight was indeed influencing the response variable blood alcohol content. Vocabulary:lurkingvs.confounding A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables. Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. Association is not causation. Even if an association is very strong, this is not by itself good evidence that a change in x will cause a change in y. Association is not causation. Even if an association is very strong, this is not by itself good evidence that a change in x will cause a change in y Cautionbeforerushingintoacorrelationora regressionanalysis Do not use a regression on inappropriate data. Pattern in the residuals Presence of large outliers Clumped data falsely appearing linear Use residual plots for help. Beware of lurking variables. Avoid extrapolating (going beyond interpolation). Recognize when the correlation/regression is performed on averages. A relationship, however strong it is, does not itself imply causation. LookingatDataRelationships Dataanalysisfortwowaytables IPS Chapter 2.5 2009 W.H. Freeman and Company Objectives(IPSChapter2.5) Data analysis for two-way tables Two-way tables Joint distributions Marginal distributions Relationships between categorical variables Conditional distributions Simpsons paradox Twowaytables An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables obtained from a two-way, or block, design. (There are now two ways to group the data). Group by age Record education Second factor: education First factor: age Twowaytables We call education the row variable and age group the column variable. Each combination of values for these two variables is called a cell. For each cell, we can compute a proportion by dividing the cell entry by the total sample size. The collection of these proportions would be the joint distribution of the two variables. Marginaldistributions We can look at each categorical variable separately in a two-way table by studying the row totals and the column totals. They represent the marginal distributions, expressed in counts or percentages (They are written as if in a margin.) 2000 U.S. census The marginal distributions can then be displayed on separate bar graphs, typically expressed as percents instead of raw counts. Each graph represents only one of the two variables, completely ignoring the second one. Parentalsmoking Does parental smoking influence the smoking habits of their high school children? Summary two-way table: High school students were asked whether they smoke and whether their parents smoke. Marginal distribution for the categorical variable parental smoking: The row totals are used and re-expressed as percent of the grand total. The percents are then displayed in a bar graph. Relationshipsbetweencategorical variablesdistributions summarize each categorical variable The marginal independently. But the two-way table actually describes the relationship between both categorical variables. The cells of a two-way table represent the intersection of a given level of one categorical factor and a given level of the other categorical factor. ConditionalDistribution In the table below, the 25 to 34 age group occupies the first column. To find the complete distribution of education in this age group, look only at that column. Compute each count as a percent of the column total. These percents should add up to 100% because all persons in this age group fall into one of the education categories. These four percents together are the conditional distribution of education, given the 25 to 34 age group. 2000 U.S. census Conditionaldistributions The percents within the table represent the conditional distributions. Comparing the conditional distributions allows you to describe the relationship between both categorical variables. Here the percents are calculated by age range (columns). 29.30% = 11071 37785 = cell total . column total The conditional distributions can be graphically compared using side by side bar graphs of one variable for each value of the other variable. Here, the percents are calculated by age range (columns). Musicandwinepurchasedecision What is the relationship between type of music played in supermarkets and type of wine purchased? We want to compare the conditional distributions of the response variable (wine purchased) for each value of the explanatory variable (music played). Therefore, we calculate column percents. Calculations: When no music was played, there were 84 bottles of wine sold. Of these, 30 were French wine. 30/84 = 0.357 35.7% of the wine sold was French when no music was played. We calculate the column conditional percents similarly for each of the nine cells in the table: 30 = 35.7% 84 = cell total . column total For every two-way table, there are two sets of possible conditional distributions. Does background music in supermarkets influence customer purchasing decisions? Wine purchased for each kind of music played (column percents) Music played for each kind of wine purchased (row percents) Simpsonsparadox An association or comparison that holds for all of several groups can reverse direction when the data are combined (aggregated) to form a single group. This reversal is called Simpsons paradox. Example: Hospital death rates Hospit al A Hospit al B Died 63 16 Surv iv ed 2037 784 Tot al 2100 800 % surv . 97. 0% 98. 0% Pa t ie nt singoodcondit ion But once patient Hospit al A Hospit al B condition is taken Died 6 8 into account, we Surv iv ed 594 592 see that hospital A Tot al 600 600 has in fact a better % surv . 99. 0% 98. 7% record for both patient conditions (good and poor). On the surface, Hospital B would seem to have a better record. Pa t ie nt sinpoorcondit ion Hospit al A Hospit al B Died 57 8 Surv iv ed 1443 192 Tot al 1500 200 % surv . 96. 2% 96. 0% Here, patient condition was the lurking variable. LookingatDataRelationships TheQuestionofCausation IPS Chapter 2.6 2009 W. H. Freeman and Company Objectives(IPSChapter2.6) The question of causation Causation Common response Confounding Establishing causation Explainingassociation:causation Association, however strong, does NOT imply causation. Example 1: Daughters body mass index depends on mothers body mass index. This is an example of direct causation. Example 2: Married men earn more than single men. Can a man raise his income by getting married? Only careful experimentation can show causation. Associationandcausation reading index Strong positive linear relationship Children reading skills w ith shoe size 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 shoe size Not all examples are so obvious 5 6 7 Explainingassociation:commonresponse Students who have high SAT scores in high school have high GPAs in their first year of college. This positive correlation can be explained as a common response to students ability and knowledge. The observed association between two variables x and y could be explained by a third lurking variable z. Both x and y change in response to changes in z. This creates an association even though there is no direct causal link. Explainingassociation:confounding Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. Example: Studies have found that religious people live longer than nonreligious people. Religious people also take better care of themselves and are less likely to smoke or be overweight. Some possible explanations for an observed association. The dashed lines show an association. The solid arrows show a causeand-effect link. x is explanatory, y is response, and z is a lurking variable. Figure2.28 I ntroduction to the Practice of Statistics, Sixth Edition 2009 W.H. Freeman and Company Establishingcausation It appears that lung cancer is associated with smoking. How do we know that both of these variables are not being affected by an unobserved third (lurking) variable? For instance, what if there is a genetic predisposition that causes people to both get lung cancer and become addicted to smoking, but the smoking itself doesnt CAUSE lung cancer? We can evaluate the association using the following criteria: 1) The association is strong. 2) The association is consistent. 3) Higher doses are associated with stronger responses. 4) Alleged cause precedes the effect. 5) The alleged cause is plausible.
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Minnesota - STAT - 1601
Producing Data Design of ExperimentsIPS Chapters 3.1 2009 W.H. Freeman and CompanyObjectives (IPS Chapters 3.1)Design of experiments Anecdotal and available data Comparative experiments Randomization Randomized comparative experiments Cautions
Minnesota - STAT - 1601
Inference for Distributions for the Mean of a PopulationIPS Chapter 7.1 2009 W.H Freeman and CompanyObjectives (IPS Chapter 7.1)Inference for the mean of a population The t distributions The one-sample t confidence interval The one-sample t test
Minnesota - STAT - 1601
Inference for ProportionsInference for a Single ProportionIPS Chapter 8.1 2009 W.H. Freeman and CompanyObjectives (IPS Chapter 8.1)Inference for a single proportionLarge-sample confidence interval for p "Plus four" confidence interval for p Sign
Minnesota - STAT - 1601
AnalysisofTwoWayTablesInferenceforTwoWayTablesIPS Chapter 9.1 2009 W.H. Freeman and CompanyObjectives(IPSChapter9.1)Inference for two-way tables The hypothesis: no association Expected cell counts The chi-square test The chi-square test and the z t
Minnesota - STAT - 1601
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 43 44 45 46 47 48 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 68 69 71 72 74 76 77 78 79 80 83 84 85 86 87 88 89GPA 7.94 8.292 4.643 7.47 8
Minnesota - STAT - 1601
NBA Team New York Knicks Los Angeles Lakers Chicago Bulls Detroit Pistons Cleveland Cavaliers Houston Rockets Dallas Mavericks Phoenix Suns Boston Celtics San Antonio Spurs Toronto Raptors Miami Heat Philadelphia 76ers Utah Jazz Washington Wizards Sacrame
Minnesota - STAT - 1601
Minnesota - STAT - 1601
Minnesota - STAT - 1601
Minnesota - STAT - 1601
STAT 1601, Spring 2012SYLLABUSCourse: Introduction to Statistics Class Time: M.W.F. 9:15 AM -10:20 AM in Science 3610 Prerequisite: High school higher algebra Instructor: Jong-Min Kim, Statistics Office: 2380 Science (Tel:589-6341) Office Hours: 3:30 PM
Minnesota - STAT - 1601
T-2TablesProbabilityTable entry for z is the area under the standard normal curve to the left of z.zTABLE A Standard normal probabilitiesz -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3
Minnesota - STAT - 1601
T-4TablesTABLE B Random digitsLine 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 19223 73676 45467
Minnesota - STAT - 1601
T-6Tables TABLE CBinomial probabilitiesEntry is P(X = k) = p n 2 k 0 1 2 0 1 2 3 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 .01 .9801 .0198 .0001 .9703 .0294 .0003 .9606 .0388 .0006 .02 .9604 .0392 .0004 .9412 .0576 .0012 .92
Minnesota - STAT - 1601
TablesT-11Table entry for p and C is the critical value t with probability p lying to its right and probability C lying between -t and t .Probability pt*TABLE D t distribution critical valuesUpper-tail probability p df 1 2 3 4 5 6 7 8 9 10 11 12 13
Minnesota - STAT - 1601
T-12TablesTable entry for p is the critical value F with probability p lying to its right.Probability pF*TABLE E F critical valuesDegrees of freedom in the numerator p .100 .050 .025 .010 .001 .100 .050 .025 .010 .001 .100 .050 .025 .010 .001 .100 .
Minnesota - STAT - 1601
T-20TablesTable entry for p is the critical value ( 2 ) with probability p lying to its right.Probability p( 2)*TABLE F 2 distribution critical valuesTail probability p df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Minnesota - BUS - 265
Business Statistics (Summer Session, 2011)Instructor:Dr. Jong-Min Kim Professor of Statistics, University of Minnesota-Morris E-mail: jongmink@morris.umn.edu or kjonomi@hotmail.com Phone #: 011-9061-3605 Course website: http:/cda.mrs.umn.edu/~ jongmink/
Minnesota - BUS - 265
StatisticsforManagementandEconomicsSixthEditionGeraldKellerBrianWarrackChapter1WhatisStatistics?1KeyStatisticalConceptsApopulationisthegroupofallitemsofinteresttoastatisticspractitioners.Itisfrequentlyverylarge.Adescriptivemeasureofapopulationisc
Minnesota - BUS - 265
Chapter 2GraphicalDescriptiveTechniques12.1 IntroductionDescriptive statistics involves thearrangement, summary, and presentation ofdata, to enable meaningful interpretation, and tosupport decision making.Descriptive statistics methods make use
Minnesota - BUS - 265
Chapter 7Random Variablesand DiscreteprobabilityDistributions17.2 Random Variables andProbability DistributionsA random variable is a function or rulethat assigns a numerical value to eachsimple event in a sample space. A random variable reflec
Minnesota - BUS - 265
Chapter 8ContinuousContinuousProbabilityDistributionsDistributions18.2Continuous Probability8.2DistributionsDistributions A continuous random variable has anuncountably infinite number of valuesin the interval (a,b). The probability that a
Minnesota - BUS - 265
Chapter 9SamplingSamplingDistributionsDistributions19.1 Introduction In real life calculating parameters ofInpopulations is prohibitive becausepopulations are very large.populations Rather than investigating the wholeRatherpopulation, we tak
Minnesota - BUS - 265
Chapter 10Introduction toEstimation110.1 IntroductionStatistical inference is the process bywhich we acquire information aboutpopulations from samples.There are two types of inference:EstimationHypotheses testing210.2 Concepts of EstimationTh
Minnesota - BUS - 265
Chapter 11Introduction toHypothesisTesting111.1 Introduction The purpose of hypothesis testing is todetermine whether there is enoughstatistical evidence in favor of a certainbelief about a parameter. Examples Is there statistical evidence in a
Minnesota - BUS - 265
Chapter 12Inference About One Population112.1 Introduction In this chapter we utilize the approach developed before to describe a population. Identify the parameter to be estimated or tested. Specify the parameter's estimator and its sampling distrib
Minnesota - BUS - 265
Chapter 13Inference about Two Populations112.1 Introduction Variety of techniques are presented whose objective is to compare two populations. We are interested in: The difference between two means. The ratio of two variances. The difference between
Minnesota - BUS - 265
Chapter 14StatisticalInference:A Review ofChapters 12 and 13114.1 Introduction In this chapter we build a framework thathelps decide which technique (ortechniques) should be used in solving aproblem. Logical flow chart of techniques forChapter
Minnesota - BUS - 265
Chapter 15Analysis ofVariance15.1 Introduction Analysis of variance compares two ormore populations of interval data. Specifically, we are interested indetermining whether differences existbetween the population means. The procedure works by anal
Minnesota - BUS - 265
Chapter 15 - continuedAnalysis ofVariance15.5 Two-Factor Analysis ofVariance Example 15.3 Suppose in Example 15.1, two factors are tobe examined: The effects of the marketing strategy on sales. Emphasis on convenience Emphasis on quality Emphasi
Michigan State University - ADV - 826
Chapter 11. Advertising: to turn the mind of the prospective customer toward the brand. (P6) Promotion: to produce immediate purchase of the brand. (P7) 2. Direct mail, personal selling, social media ads, etc. (figure 1.1) 3. 4. Because although the use
Michigan State University - ADV - 850
New Tactics: New Media RelationsNews ReleaseBig ChangeWhy change?New technologyNOT MEWhat Changed ?New communication tool and channel New communication habit Audiences OrganizationsNew audiences PublicsWhat is the change?News Release"The role o
Minnesota - BUS - 265
Chapter 16Chi Squared Tests16.1 Introduction Two statistical techniques arepresented, to analyze nominal data. A goodness-of-fit test for the multinomialexperiment. A contingency table test of independence. Both tests use the 2 as the samplingdis
Michigan State University - ADV - 860
1. Definition of PR 1) Definition Ivy Lee: proper adjustment of the interrelations of public and business Edward Bernays: the attempt, by information, persuasion and adjustment, to engineer public support for an activity, cause, movement or institution. H
Minnesota - BUS - 265
Chapter 17Simple LinearSimpleRegression117.1 Introduction In Chapters 17 to 19 we examine therelationship between interval variablesvia a mathematical equation. The motivation for using thetechnique: Forecast the value of a dependentvariable (
Michigan State University - ADV - 865
1. Why have media planning departments acquired increasingly more power and influence in advertising agencies than creative departments? According to Leonard's article, there several reasons that caused the power shift from the creative departments to the
Michigan State University - ADV - 865
Assigned Readings 1) Three readings on ANGEL 2) ACCR a) Introduction, Ch1, Ch2, Ch26, Ch27 b) Ch4 c) Ch21 d) Ch14, Ch15 3) SCA a) Introduction, Ch1 b) Ch2, Ch3, Ch10 c) Ch4, Ch9 d) Ch6 4) AAD a) Ch1, Ch5, Ch61. How critics and defenders of advertising vi
Minnesota - BUS - 265
Chapter 18MultipleRegression118.1 Introduction In this chapter we extend the simplelinear regression model, and allow forany number of independent variables. We expect to build a model that fitsthe data better than the simple linearregression mo
Michigan State University - ADV - 865
1. How critics and defenders of advertising view consumer culture (Pollay & Holbrook articles) 1) UNESCO report Regarded as a form of communication, it (advertising) has been criticized for playing on emotions, simplifying real human situations into stere
Michigan State University - ADV - 865
Regulating the Censorship Placing the Blame In this part, the authors claimed that there is a tendency to blame much of the Economic Censorship on advertisers. It is stated that advertisers are in control and responsible for most consistent and the most p
Minnesota - BUS - 265
StatisticalHeroesStatisticalHeroesFlorenceNightingaleandW.S.GossettFlorenceNightingaleFlorenceNightingaleFlorenceNightingaleFlorenceNightingaleGossettGossett
Michigan State University - ADV - 865
1. Explain the difference between the agency approach and the stakeholder orientation in marketing ethics. 2. According to the U.S. Federal Trade Commission, advertisers can be legally punished (fined, censored, etc.) for causing "consumer harm." Explain
Minnesota - BUS - 265
1. HistogramSelect Tools > Data Analysis from the mainExcel menu bar.This will bring up the window below.Select Histogram from this window. Thehistogram window should now appear as shownbelow.The following needs to be filled in: The Input Range is
Michigan State University - CAS - 892
Group Cues and Ideological ConstraintIn BriefA) Two expectations a) Common political appeals help to structure candidate preferences around political ideology, and this is done through activating and reinforcing group concerns, especially racial ones. b
Minnesota - CE - 4211
Topic: Fundamentals of Signal Design and TimingHenry Liu CE 4211/5211 Traffic Engineering University of Minnesota email: henryliu@umn.eduFour Basic Mechanism1. 2. 3. 4.Discharge headways at a signalized intersection Critical lane and time budget conce
Michigan State University - CAS - 892
News Frames, Political Cynicism, and Media CynicismBackground InformationA) The problem is that cynicism saps the public's confidence in politics and government, and encourages the assumption that what we see is not what it seems. B) News polls in recen
Michigan State University - CAS - 892
Political Campaign DebateIn BriefA) The primary goal of this paper: to provide a thorough review of the research that has been conducted surrounding televised campaign debates. B) Basic function of debate a) It reaches large audience, more than any othe
Minnesota - CE - 4211
Traffic Simulation What is it?John HourdosTransportation InfrastructureCritical Components of Transportation Infrastructure System Drivers Vehicles Roads d highways R d and hi h Freeway system Rural highway system Arterial d A i l and street system
Michigan State University - CAS - 892
The Influence of Television and Radio Advertising on Candidate EvaluationsIn BriefA) Until 1980s, the persuasive effects of mass media had been downplayed by early students of political propaganda, the "minimal effects" thesis. Scholarship of the 1960s
Minnesota - CE - 4211
t ime_code FWY 4/12/2001 0:00 I-5 4/12/2001 0:05 I-5 4/12/2001 0:10 I-5 4/12/2001 0:15 I-5 4/12/2001 0:20 I-5 4/12/2001 0:25 I-5 4/12/2001 0:30 I-5 4/12/2001 0:35 I-5 4/12/2001 0:40 I-5 4/12/2001 0:45 I-5 4/12/2001 0:50 I-5 4/12/2001 0:55 I-5 4/12/2001 1:
Michigan State University - CAS - 892
The Press EffectBackground InformationA) There is a problem with the information source of press. Many reporters spend most of their time among politicians, and other journalists. When asked for the source, many reporters cite "the people" despite the f
Minnesota - CE - 4211
t ime_code FWY 4/12/2001 0:00 I-5 4/12/2001 0:05 I-5 4/12/2001 0:10 I-5 4/12/2001 0:15 I-5 4/12/2001 0:20 I-5 4/12/2001 0:25 I-5 4/12/2001 0:30 I-5 4/12/2001 0:35 I-5 4/12/2001 0:40 I-5 4/12/2001 0:45 I-5 4/12/2001 0:50 I-5 4/12/2001 0:55 I-5 4/12/2001 1:
Michigan State University - CAS - 892
Overview of the Reading Who Sets the Agenda? Agenda-Setting as a Two-Step Flow Hans-Bernd Brosius & Gabriel Weimann COMMUNICATION RESEARCH, vol. 23 No. 5, October 1996 561-580Overview of the Reading (con't) Four models of a two-step flow of the agend
Minnesota - CE - 4211
t ime_code FWY 3/19/2001 0:00 CA-57 3/19/2001 0:05 CA-57 3/19/2001 0:10 CA-57 3/19/2001 0:15 CA-57 3/19/2001 0:20 CA-57 3/19/2001 0:25 CA-57 3/19/2001 0:30 CA-57 3/19/2001 0:35 CA-57 3/19/2001 0:40 CA-57 3/19/2001 0:45 CA-57 3/19/2001 0:50 CA-57 3/19/2001
Michigan State University - COM - 475
Definition of a CampaignWhat is a campaign? A campaign is intended to generate specific outcomes or effects in a relatively large number of individuals usually within a specified period of time and through and organized set of communication activities. D
Minnesota - CE - 4211
t ime_code FWY 3/22/2001 0:00 CA-91 3/22/2001 0:05 CA-91 3/22/2001 0:10 CA-91 3/22/2001 0:15 CA-91 3/22/2001 0:20 CA-91 3/22/2001 0:25 CA-91 3/22/2001 0:30 CA-91 3/22/2001 0:35 CA-91 3/22/2001 0:40 CA-91 3/22/2001 0:45 CA-91 3/22/2001 0:50 CA-91 3/22/2001
Michigan State University - COM - 475
Audience Analysis Allows you to understand your audience(s) Allows you to better predict behavior and thus, develop messages that appeal to your audience(s) Consists of gathering, interpretation, and application of demographic, behavioral, psychographic
Minnesota - CE - 4211
t ime_code FWY 4/2/2001 0:00 CA-55 4/2/2001 0:05 CA-55 4/2/2001 0:10 CA-55 4/2/2001 0:15 CA-55 4/2/2001 0:20 CA-55 4/2/2001 0:25 CA-55 4/2/2001 0:30 CA-55 4/2/2001 0:35 CA-55 4/2/2001 0:40 CA-55 4/2/2001 0:45 CA-55 4/2/2001 0:50 CA-55 4/2/2001 0:55 CA-55
Michigan State University - COM - 475
The Extended Parallel Process (EPPM) Learning models are too reactive individuals think in response to messages: Fear control process: Emotional processes=fear arousal and fear reduction Danger control process: Cognitive process=formulation of thoughts a
Minnesota - CE - 4211
t ime_code FWY 3/22/2001 0:00 I-5 3/22/2001 0:05 I-5 3/22/2001 0:10 I-5 3/22/2001 0:15 I-5 3/22/2001 0:20 I-5 3/22/2001 0:25 I-5 3/22/2001 0:30 I-5 3/22/2001 0:35 I-5 3/22/2001 0:40 I-5 3/22/2001 0:45 I-5 3/22/2001 0:50 I-5 3/22/2001 0:55 I-5 3/22/2001 1:
Michigan State University - COM - 475
Diffusion of Innovations What does this theory look at? Looks at how people make decisions to adopt new innovations How products, services, and ideas are diffused in populationssystem" Examines "how an innovation, is communicated, over time, among memb
Minnesota - CE - 4211
t ime_code FWY 4/2/2001 0:00 CA-22 4/2/2001 0:05 CA-22 4/2/2001 0:10 CA-22 4/2/2001 0:15 CA-22 4/2/2001 0:20 CA-22 4/2/2001 0:25 CA-22 4/2/2001 0:30 CA-22 4/2/2001 0:35 CA-22 4/2/2001 0:40 CA-22 4/2/2001 0:45 CA-22 4/2/2001 0:50 CA-22 4/2/2001 0:55 CA-22