**Unformatted text preview: **BABS 540 — Data Utilization
Lecture 1: ANOVA What is Statistics?
“The science of collecting, organizing, and interpreting
data”
• Moore, McCabe, Duckworth, and Alwan “...a way of reasoning, along with a collection of tools and
methods, designed to help us understand the world.”
• Sharpe, De Veaux, and Velleman
What are statistics? Where do we use statistics?
Marketing
Finance
Accounting
Supply Chain
Human Resources
… pretty much every area of business. Statistics
One of the main things we do in statistics is:
Use a sample to infer something about a population. Statistics ... "the most important science in the whole
world: for upon it depends the practical application of
every other science and of every art; the one science
essential to all political and social administration, all
education, all organization based upon experience, for it
only gives the results of our experience.”
— Florence Nightingale Variation or Uncertainty
The world is full of uncertainty. And of variation.
How can we determine when there are real diﬀerences?
Or real patterns?
Or when we are simply observing natural variation? Three main points of BABS 540
1. Statistics is (are) everywhere.
2. Understanding data is vital to making good decisions in
business (and life).
3. Working with data requires being able to match the
right tool to the job. What does Mean Mean?
Measures of central tendency:
• mean
• median
• mode Source: economix.blogs.nytimes.com WWII Bombers Administrivia
Expectations
Syllabus
Questions Dead Fish
• Dead ﬁsh respond to human emotions (p < 0.001).
Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon:
An argument for multiple comparisons correction
Craig M. Bennett1, Abigail A. Baird2, Michael B. Miller1, and George L. Wolford3
1
3 Psychology Department, University of California Santa Barbara, Santa Barbara, CA; 2 Department of Psychology, Vassar College, Poughkeepsie, NY;
Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH INTRODUCTION GLM RESULTS With the extreme dimensionality of functional neuroimaging data comes
extreme risk for false positives. Across the 130,000 voxels in a typical fMRI
volume the probability of a false positive is almost certain. Correction for
multiple comparisons should be completed with these datasets, but is often
ignored by investigators. To illustrate the magnitude of the problem we
carried out a real experiment that demonstrates the danger of not correcting
for chance properly. METHODS
Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
the time of scanning.
Task. The task administered to the salmon involved completing an open-ended
mentalizing task. The salmon was shown a series of photographs depicting human
individuals in social situations with a specified emotional valence. The salmon was
asked to determine what emotion the individual in the photo must have been
experiencing.
Design. Stimuli were presented in a block design with each photo presented for 10
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total
scan time was 5.5 minutes.
Preprocessing. Image processing was completed using SPM2. Preprocessing steps
for the functional imaging data included a 6-parameter rigid-body affine realignment
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image,
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing.
Analysis. Voxelwise statistics on the salmon data were calculated through an
ordinary least-squares estimation of the general linear model (GLM). Predictors of
the hemodynamic response were modeled by a boxcar function convolved with a
canonical hemodynamic response. A temporal high pass filter of 128 seconds was A t-contrast was used to test for regions with significant BOLD signal change
during the photo condition compared to rest. The parameters for this
comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent
threshold.
Several active voxels were discovered in a cluster located within the salmon’s
brain cavity (Figure 1, see above). The size of this cluster was 81 mm3 with a
cluster-level significance of p = 0.001. Due to the coarse resolution of the
echo-planar image acquisition and the relatively small size of the salmon
brain further discrimination between brain regions could not be completed.
Out of a search volume of 8064 voxels a total of 16 voxels were significant.
Identical t-contrasts controlling the false discovery rate (FDR) and familywise
error rate (FWER) were completed. These contrasts indicated no active
voxels, even at relaxed statistical thresholds (p = 0.25). Evidence
What is “evidence”? How much evidence is enough
evidence?
What does “inference” mean?
What is a hypothesis test? Simple Review Example #1
Bottling Line A: Is the volume of whisky diﬀerent from the
intended 750 ml?
Questions:
• What hypothesis test should we use? • What is the null hypothesis? • After conducting this test, what is the conclusion? Example #2
Are Lines A and B diﬀerent?
Questions:
• What hypothesis test should we use? • What is the null hypothesis? • After conducting this test, what is the conclusion? Is There a 3-Sample t-Test?
Are Lines A, B, and C diﬀerent?
Questions:
• What hypothesis test should we use? • What is the null hypothesis? • After conducting this test, what is the conclusion? ANOVA
ANOVA = ANalysis Of VAriance
• Intended for comparing three or more groups in a singlefactor experiment. • Can also (cautiously) be used for observational data. • Used to answer the question: Is at least one of the means
different?
H0: µ1 = µ2 = … = µk
HA: At least one mean is different.
ANOVA works by calculating Mean Square Treatment divided
by Mean Square Error, then looking this ratio up in the
appropriate F-distribution. (See Excel output)
•
• • ANOVA Assumptions
✓ Independence Assumption — The groups must be
independent of each other.
➡ ✓ Equal Variance Assumption — The true variances of the
groups are equal.
➡ ✓ Randomization? Check the box plots to see if the spreads are close enough. Normal Population Assumption — The residuals are
“nearly Normal”… but, thanks to the CLT, the larger
the samples the less this matters.
• Note that the residual for each value is the distance !om that
value to the group average (+ or –). Marketing Example
Average dollars spent on schamazon.com under each of
three possible themes.
A. Schamazon.com B. Schamazon.com Toys
Tools
Trucks C. Toys
Tools
Trucks Schamazon Healthcare Example
What examples can you think of in healthcare?
Other examples where ANOVA might be used? ANOVA Beyond BABS 540
Two-way ANOVA.
Data transformations.
Applications. ...

View
Full Document