Sleuth_03 - CHAPTER 3 A Closer Look at Assumptions 1 !is is...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 3 A Closer Look at Assumptions 1 !is is an advanced "apter. Please ask questions# CASE STUDY 1: CLOUD SEEDING Cloud seeding to increase rainfall (randomized experiment) 52 cumulus clouds At random 26 were seeded and 26 were controls Experimenter and pilot were “blind” to the treatment CLOUD SEEDING DATA DATA CHARACTERISTICS Data were positively skewed, and the group with the higher center also had the higher spread Log transform of rainfall was used CLOUD SEEDING DATA ANALYSIS OF LOG DATA One-sided p-value for two-sample t-test was 0.007. Seeded clouds had exp(1.144) = 3.1 times as mu! rainfall as the unseeded. 95% con"dence interval for this ratio is 1.3 to 7.7. What is our scope of inference? CASE STUDY 2: AGENT ORANGE E#ects of agent orange on troops in Vietnam Observational study 646 Vietnam veterans and 97 other veterans Neither sample was randomly selected Similar shape and spread Scope of inference? DIOXIN CONCENTRATIONS DATA ANALYSIS DROP VETERAN 646 ALSO DROP VETERAN 645 Robustness of the Two-Sample t Meaning of Robustness: A statistical tool is “robust” to departures from a particular assumption if it is valid even when the assumptions are not met Assumption True, but it’s Assumptions of the Two-Sample t I. Normal distributions II. Equal standard deviations ($1 = $2) III. Independence Robustness against departures from Normality %e Central Limit %eorem asserts that departures from normality are not serious as long as sample sizes are reasonably large Long-tailed distributions (those with outliers) cause more problems than skewed distributions Robustness against departures from equal standard deviations Having unequal population standard deviations causes more problems than la& of normality If sample sizes are roughly the same, having unequal population standard deviations is not mu! of a problem %e worst situations occur when one sample is mu! larger than the other and the smaller sample comes from the population with the largest standard deviation Robustness against departures from independence La& of independence is usually caused by collecting data in subgroups rather than as individuals (cluster e#ect) collecting more than one observation on a given individual over time (serial e#ect) Special regression or analysis of variance models are used when data are collected in subgroups, time series models are used when observations are collected serially. Robustness against departures from independence (cont.) t-tests are not robust to departures from independence! Example of Clustering (Grouping) We have n subjects in ea! of "ve di#erent PE classes %e response is weight loss over a set period of time We weigh ea! subject twice, but treat the data as if ea! measurement corresponded to two persons – NOT a good idea! Resistant Statistical Tools A statistical procedure is “resistant” if it does not give very di#erent results with or without a few outliers in the data, Example: %e median is very resistant to the presence of outliers, but the mean may be greatly a#ected Since t-tests are based on means, they are not resistant! Resistant Example Sample of "ve numbers: 1, 5, 3, 5, 7 ' What is the mean? 4.2 ' What is the median? 5 1, 5, 3, 5, 700 ' What is the mean? 142.8 ' What is the median? 5 Strategy for dealing with Outliers Investigate outliers for accuracy and appropriateness Carry out the statistical procedures with and without the outliers If the conclusions are the same for both analyses, leave them in If the conclusions are di#erent, consideration should be given to a transformation of the data or to alternative procedures that are more resistant (Chapter 4) Sometimes "nding outliers is the goal of our analysis! Data Transformations Logarithm (log) transformations Use if the ratio of the largest to the smallest observation is greater than 10 Use for positively skewed data for two or more groups when the group with larger average also has the larger spread (We did this for the rainfall data) More Transformations Square root: if data are counts Reciprocal: if data are waiting times or time to failure Logit = log(p/(1-p)): if data are proportions or percents Authors prefer graphical methods to determine transformation Normal probability plots Side-by-side box plots Dot plots !e Next Two Slides Slide one indicates how mu! the p-values !ange when the data are not normal Slide two indicates how mu! the p-values !ange when the standard deviations are not the same Either perform a transformation or GOTO Chapter 4 Log of the Median: mu(sub log) with M (sub log) = log (median) from the original data Example: Log Transformations %e means of the (natural) log rainfall data are ( Unseeded: 3.99 (estimates the center of the unseeded clouds on the log scale) ( Seeded: 5.13 (estimates the center of the data of the seeded clouds on the log scale) ( Both distributions (on the log scale) appear to be symmetric, therefore, the center approximately equals both the mean and median. ( %e di#erence 5.13 – 3.99 = 1.14 Example: Interpretation If we take e (the base of the natural logarithms) to the (5.13 – 3.99 =) 1.14 power, we "nd the ratio of the amount of rainfall of the seeded clouds to the unseeded clouds ( exp(1.14) = 2.7182821.14 = 3.14 ( Our best estimate is over three times as mu! rainfall Unseeded: 3.99 (estimates the center of the unseeded clouds on the log scale) Log Transformations Log Relationships ( ( ( log (ab) = log a + log b log (a/b) = log a - log b log (an) = n log a Log Transformations (cont.) Log transformation on one group of data ( Positively skewed data ( Log transformation makes the distribution more symmetric ( %e mean and median are equal for symmetric distributions, ( ( i.e., mean[log(Y)] = median[log(Y)] Since the log transformation preserves ordering, median[log(Y)] = log[median(Y)] (typo in book) %erefore, mean[log(Y)] = log[median(Y)] Mean (log(Y)) ~ log (median(Y)) Log Transformations (cont.) Log transformation on two groups of data ( Positively skewed data ( Log transformation makes the distribution more symmetric ( Let Z1 = mean[log(Y1)] = log[median(Y1)] ( Let Z2 = mean[log(Y2)] = log[median(Y2)] %en Z2 - Z1 = log[median(Y2)] - log[median(Y1)] !"#$%&(Y2 ) = log !"#$%&(Y1 ) CHAPTER 3 !e End 34 ...
View Full Document

This note was uploaded on 10/03/2011 for the course STAT 511 taught by Professor Eggett,d during the Winter '08 term at BYU.

Ask a homework question - tutors are online