Assessing Normality Notes Spring

# Assessing Normality Notes Spring - 2 of 8 1.2 Method of...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 2 of 8 1.2 Method of assessment 1.2 Method of assessment Importance of assessing normality Many statistical tests require that the sample data come from a population with a normal distribution. If we don't satisfy the requirements then the results of the test will not be accurate. It is important to check to ensure we have met the assumptions of normality. Below are recommendations for an initial assessment of normality.1 How to asses normality 1. Make a histogram. Reject normality if dramatically departs from bell shape or more than one outlier exists. 2. Make a normal quantile plot. Reject if plot does not closely follow a line. Quantiles & Percentiles If a student who scored 1100 on the SAT was in the 75th percentile then: percentile=0.75 quantile=1100 P75 = 1100 The quantile is the data value that has an associated percentile. NORMAL QUANTILE PLOTS Definition 1.1 Normal quantile plot (Q-Q Plot). A graph used to assess normality. It plots the sample quantile (vertical axis) against the theoretical quantile (horizontal axis). sample quantile the original data point xi value (or z-score). theoretical quantile the expected z-score for the data point xi when we assume it comes from a normal distribution. If the sample quantiles match their theoretical quantiles the graph will be a straight line indicating the data has a normal distribution. Normally distributed sample data will have minor deviations from a straight line due to sampling error. Finding theoretical quantiles 1. Sort the data so that the xi 's are increasing. 2. Find the k percentile (0-1 range) for each xi ki = i - 0.5 n 1 Further quantitative methods exist such as the Kolmogorov-Smirnov test and the Shapiro-Wilk normality test. However, these tests must be used with caution because they have very low power when used with small sample sizes and can have a high risk of type II error. Anthony Tanbakuchi MAT167 Assessing Normality 3 of 8 3. Find zi , the theoretical quantile z-score corresponding to the percentile ki 2 , assuming the data is from a normal distribution: zi = qnorm(k) Once you have found each zi for each data point xi you plot the points (zi , xi ). Example The following sorted data points xi represent 10 student's mother heights in our class. Also listed are the corresponding z scores: x ={60, 62, 62, 63, 64, 64, 65, 65, 67, 68} z ={-1.66, -0.83, -0.83, -0.42, 0, 0, 0.42, 0.42, 1.25, 1.66} Question 2. Find the theoretical quantile corresponding to the third data point. Below are Normal Q-Q plots for the above 10 mother heights. Not that plotting (zi , xi ) is equivalent to (zi , zi ). Normal Q-Q Plot 68 1.5 q q Normal Q-Q Plot q q 66 0.5 q q q q q q 64 q q q x z -0.5 q q q 62 q q 60 q -1.5 -0.5 0.5 1.5 q -1.5 -1.5 -0.5 0.5 1.5 Theoretical Quantiles Theoretical Quantiles Q-Q plots: qqnorm(x); qqline(x) Where x is a vector of data. 2 Percentiles R Command are a type of probability. Anthony Tanbakuchi MAT167 4 of 8 1.2 Method of assessment Q-Q PLOT EXAMPLES Normal distribution density 0.4 70 Normal Q-Q Plot q q q q q q q q qq qq q q q q qq qq qq qq qq q q qq qq qq q q qq qq qq q q qq q q q q Sample Quantiles -3 -2 -1 0 x 1 2 3 0.3 f(x) 0.2 0.1 0.0 30 40 -2 50 60 -1 0 1 2 Theoretical Quantiles Q-Q plot of 50 data points randomly selected from a normal density. Light tails density 10.5 2.0 Normal Q-Q Plot qq qq qq q q q q q qq q q q q q q q q q qq qq q q q q q q q q qq q q q q qq q q q qq q Sample Quantiles 1.5 f(x) 0.5 1.0 10.1 10.3 0.0 q q 9.8 10.2 x 10.6 -2 -1 0 1 2 Theoretical Quantiles A light tailed density has less area in the tails making them appear shorter. Anthony Tanbakuchi MAT167 Assessing Normality Heavy tails density 5 of 8 Normal Q-Q Plot q Sample Quantiles 0.25 30 q q f(x) 0.15 10 20 q q q qq qqq qqq q qqqq qqqq qqqqq qqqqqq qqqqqq qqq qq q qq 0.05 -10 0 q q -3 -2 -1 0 x 1 2 3 -2 -1 0 1 2 Theoretical Quantiles A heavy tailed density has more area in the tails making them appear longer. Granularity Pbinom(x, n = 25, p = 0.5) , , 0.20 20 Normal Q-Q Plot q q q qqq qqq qq q q q q q q q qqq qq qq qq qq q qq qq qq qq qq q qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qqq qq qq qq qq qq qqq qq qq qq qq qq qq qq qq qq qq q qq qq qq q q q qq qq qq qq qq qq qq qq qq q q qq q q q q qq Sample Quantiles 0 5 10 x 15 20 25 Pbinom(x) 0.10 0.00 5 10 15 -3 -1 0 1 2 3 Theoretical Quantiles Discrete data (such as binomial) will be clumped or granular in a Q-Q plot. Anthony Tanbakuchi MAT167 6 of 8 1.2 Method of assessment Positive skew density Normal Q-Q Plot q 0.0 0.2 0.4 0.6 0.8 Sample Quantiles f(x) 4 5 6 q 3 q q q q q q q q q q qq qq qq qq qqq qqq qqq qqq qqqq qqq qq qq qq q qq q q 0 1 2 3 x 4 5 6 1 2 -2 -1 0 1 2 Theoretical Quantiles For positive skew, Q-Q plot has increasing slope from left to right. Negative skew density Normal Q-Q Plot qq q qq q qqq q qqq qqqq qqqqq qqq qqq qqq qq q q q qq qq q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 Sample Quantiles f(x) -12 -10 q 0 1 2 3 x 4 5 6 -8 -2 -1 0 1 2 Theoretical Quantiles For negative skew, Q-Q plot has decreasing slope from left to right. ASSESSING NORMALITY OF CLASS DATA Mother heights R: R: R: R: par ( mfrow = c ( 1 , 2 ) ) h i s t ( h e i g h t mother ) qqnorm ( h e i g h t mother ) q q l i n e ( h e i g h t mother ) Anthony Tanbakuchi MAT167 Assessing Normality 7 of 8 Histogram of height_mother 56 58 60 62 64 66 68 6 Normal Q-Q Plot q q q qq qqq q q qqqq q q q 5 1 Sample Quantiles Frequency 2 3 4 q 0 56 58 60 62 64 66 68 height_mother -2 -1 0 1 2 Theoretical Quantiles Question 3. Do the mother heights appear to be normally distributed? Work hours R: R: R: R: par ( mfrow = c ( 1 , 2 ) ) h i s t ( work h o u r s ) qqnorm ( work h o u r s ) q q l i n e ( work h o u r s ) Anthony Tanbakuchi MAT167 8 of 8 1.3 Summary Histogram of work_hours 50 7 Normal Q-Q Plot q 6 Sample Quantiles 5 40 Frequency q 30 3 4 qq q q q 20 qq qq q 2 10 1 qq q q qq 0 0 10 20 30 40 50 0 -2 -1 0 1 2 work_hours Theoretical Quantiles Question 4. Do the student work hours appear to be normally distributed? 1.3 Summary How to asses normality 1. Make a histogram. Reject normality if dramatically departs from bell shape or more than one outlier exists. R: hist(x) 2. Make a normal quantile plot. Reject if plot does not closely follow a line. R: qqnorm(x); qqline(x) 1.4 Additional Examples Question 5. Assess the normality of the men's weight and cholesterol data in the Mhealth table from the book. Anthony Tanbakuchi MAT167 ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online