{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

corelation2

# corelation2 - Raw Frequency Distribution for Average Number...

This preview shows pages 1–4. Sign up to view the full content.

Raw Frequency Distribution for Average Number of Cigarettes Smoked/Day: Average # of Cigarettes Smoked/Day vs Frequency (f) 0 50 100 150 200 250 300 0 to 6 7 to 13 14 to 20 21 to 27 28 to 34 35 to 41 42 to 48 49 to 55 56 to 62 63 to 69 Average # of Cigarettes Smoked/Day There was one outlier at 95 cigarettes/day and although theoretically this is possible, it skews the data and it was only one subject. Summary Statistics with Outlier: Summary Statistics without Outlier: Summary Statistics Mean 22.4398682 Standard Error 0.400742955 Median 20 Mode 20 Standard Deviation 9.873252462 Sample Variance 97.48111417 Kurtosis 5.633613481 Skewness 1.306280989 Range 95 Minimum 0 Maximum 95 Sum 13621 Count 607 Interval f () 0 to 6 16 7 to 13 74 14 to 20 281 21 to 27 73 28 to 34 97 35 to 41 49 42 to 48 4 49 to 55 9 56 to 62 3 63 to 69 0 Total n 607

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
As you can see from the skewness value, the statistics without the outlier are much less skewed than with the outlier. The next step is to negate these two outliers for both variables and after we have done this, we are able to make a graph showing the regression and whether these two variables are related to one another. Regression Graph for “Age When Started Smoking vs # of Cigarettes Smoked/Day”: Age When Started Smoking Daily vs # of Cigarettes Smoked/Day 0 10 20 30 40 50 60 70 0 10 20 30 40 Age When Started Smoking Daily (yrs) Y Line of Best Fit Coefficients Intercept 25.22237964 X Variable 1 -0.210680649 Equation for Line of Best Fit: Y= -0.21X + 25.22 Summary Statistics Mean 22.32013201 Standard Error 0.383068739 Median 20 Mode 20 Standard Deviation 9.430028912 Sample Variance 88.92544527 Kurtosis 1.58247016 Skewness 0.784469337 Range 60 Minimum 0 Maximum 60 Sum 13526 Count 606 Regression Statistics Multiple R 0.083640864 R Square 0.006995794 Adjusted R Square 0.005351747 Standard Error 9.404761494 Observations 606
The R 2 value is very low in this case, approximately 0.7%. This means that the proportion of Y variance associated with X variance is low. In other words, as X changes, Y does not change. The number of cigarettes smoked per day does not depend on (or depends very little on) the age at which someone starts smoking and this conclusion can be drawn by the fact that the R 2 value is so low. The standard error is 9.4. This is a large standard error and means that the deviations between the Y values and the points along the line of best fit (or rather, the average of them) are far apart from one another. The closer those Y values are to the line, the smaller the standard error would be. In this case, they are not close to the line, producing a large standard error. The equation for the line of best fit is the equation above because looking at the coefficients chart, you can see that the slope is the “X Variable” and the y-intercept is “intercept” and all I did was plug them into the equation for slope which is y= mx + b. Inferences and Interpretation ( Smoking Study): In order to interpret the data, we must first make inferences about the strength of the linear relationship. We could make inferences about the slope, but the strength of the relationship gives us more information. Because of convenience, I have chosen to use r as the test statistic instead of t since they are both equivalent means of producing the same results.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 10

corelation2 - Raw Frequency Distribution for Average Number...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online