correlation1 - the mean and median are still very close...

Smoking Study For the smoking study data, I decided to look at how the age of when someone started smoking daily affected how many cigarettes per day they smoked. The first step in determining whether these two variables were correlated was to analyze the raw frequency distributions for both variables and determine whether or not there were any outliers affecting and skewing the distribution. Raw Frequency Distribution for “Age When Started Smoking”: Age When First Started Smoking Daily vs Frequency (f) 0 50 100 150 200 250 300 350 0 to 3 4 to 7 8 to 11 12 to 15 16 to 19 20 to 23 24 to 27 28 to 31 32 to 35 Age When Started Smoking (yrs) There was one outlier in this data and it was 45 years old. I decided to negate it since it was just one subject and because it skewed the data. As you can see from the data tables below, according to the skew value, the outlier did skew the data, although

Unformatted text preview: the mean and median are still very close, meaning that the distribution would have been close to being normal either way. Summary Statistics with Outlier: Summary Statistics without Outlier: Interval f () 0 to 3 5 4 to 7 19 8 to 11 98 12 to 15 324 16 to 19 127 20 to 23 26 24 to 27 5 28 to 31 32 to 35 3 Total n 607 Summary Statistics Mean 13.83059211 Standard Error 0.160082261 Median 14 Mode 15 Standard Deviation 3.947253333 Sample Variance 15.58080887 Kurtosis 9.139883479 Skewness 1.256875327 Range 45 Minimum Maximum 45 Sum 8409 Count 608 Summary Statistics Mean 13.77924217 Standard Error 0.151872934 Median 14 Mode 15 Standard Deviation 3.741749659 Sample Variance 14.00069051 Kurtosis 4.068313133 Skewness 0.562276515 Range 34 Minimum Maximum 34 Sum 8364 Count 607...
