This preview shows page 1. Sign up to view the full content.
Unformatted text preview: CHAPTER 3
A Closer Look at Assumptions 1 !is is an advanced "apter.
Please ask questions# CASE STUDY 1: CLOUD SEEDING
Cloud seeding to increase rainfall (randomized
experiment)
52 cumulus clouds
At random 26 were seeded and 26 were controls
Experimenter and pilot were “blind” to the treatment CLOUD SEEDING DATA DATA CHARACTERISTICS
Data were positively skewed, and the group with the
higher center also had the higher spread
Log transform of rainfall was used CLOUD SEEDING DATA ANALYSIS OF LOG DATA
Onesided pvalue for twosample ttest was 0.007.
Seeded clouds had exp(1.144) = 3.1 times as mu! rainfall
as the unseeded.
95% con"dence interval for this ratio is 1.3 to 7.7.
What is our scope of inference? CASE STUDY 2: AGENT ORANGE
E#ects of agent orange on troops in Vietnam
Observational study
646 Vietnam veterans and 97 other veterans
Neither sample was randomly selected
Similar shape and spread
Scope of inference? DIOXIN CONCENTRATIONS DATA ANALYSIS DROP VETERAN 646 ALSO DROP VETERAN 645 Robustness of the TwoSample t
Meaning of Robustness:
A statistical tool is “robust” to departures from a
particular assumption if it is valid even when the
assumptions are not met Assumption True, but it’s Assumptions of the TwoSample t
I. Normal distributions II. Equal standard deviations ($1 = $2)
III. Independence Robustness against departures
from Normality
%e Central Limit %eorem asserts that departures from
normality are not serious as long as sample sizes are
reasonably large
Longtailed distributions (those with outliers) cause more
problems than skewed distributions Robustness against departures
from equal standard deviations
Having unequal population standard deviations causes
more problems than la& of normality
If sample sizes are roughly the same, having unequal
population standard deviations is not mu! of a problem
%e worst situations occur when one
sample is mu! larger than the other
and the smaller sample comes from the
population with the largest standard
deviation Robustness against departures
from independence
La& of independence is usually caused by
collecting data in subgroups rather than as individuals
(cluster e#ect)
collecting more than one observation on a given individual
over time (serial e#ect) Special regression or analysis of variance
models are used when data are collected
in subgroups, time series models are used
when observations are collected serially. Robustness against departures
from independence (cont.) ttests are not robust to departures
from independence! Example of Clustering (Grouping)
We have n subjects in ea! of "ve di#erent PE
classes
%e response is weight loss over a set period of
time
We weigh ea! subject twice, but treat the data as
if ea! measurement corresponded to two persons
– NOT a good idea! Resistant Statistical Tools
A statistical procedure is “resistant” if it does not
give very di#erent results with or without a few
outliers in the data,
Example: %e median is very resistant to the presence of
outliers, but the mean may be greatly a#ected Since ttests are based on means, they are not
resistant! Resistant Example
Sample of "ve numbers:
1, 5, 3, 5, 7
' What is the mean? 4.2
' What is the median? 5
1, 5, 3, 5, 700
' What is the mean? 142.8
' What is the median? 5 Strategy for dealing with Outliers
Investigate outliers for accuracy and appropriateness
Carry out the statistical procedures with and without the
outliers
If the conclusions are the same for both analyses, leave them in
If the conclusions are di#erent, consideration should be given to
a transformation of the data or to alternative procedures that
are more resistant (Chapter 4)
Sometimes "nding outliers is the goal of our analysis! Data Transformations
Logarithm (log) transformations
Use if the ratio of the largest to the smallest observation is
greater than 10
Use for positively skewed data for two or more groups when
the group with larger average also has the larger spread
(We did this for the rainfall data) More Transformations
Square root: if data are counts
Reciprocal: if data are waiting times or time to
failure
Logit = log(p/(1p)): if data are proportions or
percents Authors prefer graphical methods
to determine transformation
Normal probability plots
Sidebyside box plots
Dot plots !e Next Two Slides
Slide one indicates how mu! the pvalues !ange
when the data are not normal
Slide two indicates how mu! the pvalues !ange
when the standard deviations are not the same Either perform a
transformation or
GOTO Chapter 4 Log of the Median: mu(sub log) with M (sub log) = log (median) from
the original data Example: Log Transformations
%e means of the (natural) log rainfall data are
( Unseeded: 3.99 (estimates the center of the unseeded
clouds on the log scale)
( Seeded: 5.13 (estimates the center of the data of the
seeded clouds on the log scale)
( Both distributions (on the log scale) appear to be
symmetric, therefore, the center approximately equals
both the mean and median.
( %e di#erence 5.13 – 3.99 = 1.14 Example: Interpretation
If we take e (the base of the natural logarithms) to the
(5.13 – 3.99 =) 1.14 power, we "nd the ratio of the amount
of rainfall of the seeded clouds to the unseeded clouds
( exp(1.14) = 2.7182821.14 = 3.14
( Our best estimate is over three times as mu! rainfall
Unseeded: 3.99 (estimates the center of the unseeded
clouds on the log scale) Log Transformations
Log Relationships
(
(
( log (ab) = log a + log b
log (a/b) = log a  log b
log (an) = n log a Log Transformations (cont.)
Log transformation on one group of data
( Positively skewed data
( Log transformation makes the distribution more symmetric
( %e mean and median are equal for symmetric distributions,
(
( i.e., mean[log(Y)] = median[log(Y)]
Since the log transformation preserves ordering,
median[log(Y)] = log[median(Y)] (typo in book)
%erefore, mean[log(Y)] = log[median(Y)]
Mean (log(Y)) ~ log (median(Y)) Log Transformations (cont.)
Log transformation on two groups of data
( Positively skewed data
( Log transformation makes the distribution more symmetric
( Let Z1 = mean[log(Y1)] = log[median(Y1)]
( Let Z2 = mean[log(Y2)] = log[median(Y2)]
%en Z2  Z1 = log[median(Y2)]  log[median(Y1)]
!"#$%&(Y2 )
= log
!"#$%&(Y1 ) CHAPTER 3
!e End 34 ...
View
Full
Document
This note was uploaded on 10/03/2011 for the course STAT 511 taught by Professor Eggett,d during the Winter '08 term at BYU.
 Winter '08
 Eggett,D

Click to edit the document details