1
Statistics and samples
1.1
What is statistics?
Biologists study the properties of living things. Measuring these properties is a challenge, though, because no two individuals from the same biological population are
ever exactly alike. We cant measure ev
Au
g
Sep
2
J uly
t
us
Deaths by:
Infectious disease
Wounds
Other causes
er
temb
June
May
Marc
Oc t
ober
April 1854
m
be
r
h
ve
No
b
Fe
ru
a
ry
ber
Decem
Janu
ary 18
5
5
Displaying data
T
he human eye is a natural pattern detector, adept at spotting trends
Appendix 3. Statistical tables
This appendix gives numerical values for a few of the most commonly used probability
distributions. More can be found in references such as Rohlf and Sokal, Biostatistical
Tables.
Table 1: 2 distributions
Table 2: Z distribu
Contingency analysis
Contingency analysis:
associations between
categorical variables
! Test the independence of two or more
categorical variables
! Well learn one kind: !2 contingency
analysis
Chapter 9
Music and wine buying
OBSERVED
Bottles of
French wi
Comparing means
Paired vs. 2 sample
comparisons
! Tests with one categorical and one
numerical variable
! Goal: to compare the mean of a
numerical variable for different groups.
Paired comparisons allow us
to account for a lot of
extraneous variation
2-sa
Sample size 10 from Normal distribution with =13 and !2=16
Estimating with uncertainty
Chapter 4
Frequency!
2
1.5
1
0.5
5
10
_
25
X = 13.5
s 2 = 12.1
X
2
2
1.5
1
0.5
5
_
20
A third sample of 10 from the same distribution
Frequency!
Frequency!
Another samp
Goals of experiments
! Eliminate bias
! Reduce sampling error (increase
precision and power)
Controls
! A group which is identical to the
experimental treatment in all respects
aside from the treatment itself.
Design features that reduce
bias
! Controls
!
The normal distribution is very
common in nature
Normal distribution
0.4
f ( x) =
1
2!"
2
e
#
( x # )
2"
2
2
0.3
0.2
Human body temperature
0.1
-2
-1
0
1
Measurement
A normal distribution is fully
described by its mean and
standard deviation
2
3
Human bir
Two common descriptions of
data
! Location (or central tendency)
Describing data
! Width (or spread)
Measures of location
Mean
Median
Mode
Mean
n
!Y
i
Y=
i=1
n
n is the size of the sample!
Mean
Median
Y1=56, Y2=72, Y3=18, Y4=42
! The median is the middle
KEY: MID-TERM BIOL 300: October 2010
For all statistical tests, make sure that you clearly state your
hypotheses. Unless otherwise stated, assume = 0.05. Show your
work. Be as precise as possible about P-values.
1. (10 points)A recent issue of the Globe a
Name:
TAs name:
Student number:
MID-TERM BIOL 300: October 2009
For all statistical tests, make sure that you clearly state your
hypotheses. Unless otherwise stated, assume = 0.05. Show your
work. Be as precise as possible about P-values.
Some questions h
MID-TERM BIOL 300: October 2008
For all statistical tests, make sure that you clearly state your
hypotheses. Unless otherwise stated, assume = 0.05. Show your
work. Be as precise as possible about P-values.
Some questions have a box for the final answer.
MID-TERM BIOL 300: October 2007
For all statistical tests, make sure that you clearly state your hypotheses. Unless otherwise
stated, assume = 0.05. Show your work. Be as precise as possible about P-values.
Some questions have a box for the final answer.
BIOL 300: Biostatistics
Course web address:
http:/www.zoology.ubc.ca/!
~whitlock/bio300/
Professor:!
Dr. Michael Whitlock
Professor
Department of Zoology
Office: 216 Biodiversity
e-mail:
[email protected]
Textbook
Office hours: Mon. 1:30-3:00
and af
Probability
The probability of an event is its true
relative frequency, the proportion of times
the event would occur if we repeated the
same process over and over again.!
A and B are mutually exclusive
Two events are mutually exclusive if
they cannot bot
Regression
Correlation vs. regression
! Predicts Y from X
! Linear regression assumes that the
relationship between X and Y can be
described by a line
Regression assumes.
! Random sample
! Y is normally distributed with equal
variance for all values of X
Publication bias
Researcher and statistician
error
Papers are more likely to be published if P<0.05
~8% of
biomedical
papers have
substantial
statistical
flaws
This causes a bias in the science reported in the literature.
1
2
Computer-intensive methods
Si
Analysis of variance (ANOVA)
Comparing the means of more
than two groups
Null hypothesis for simple
ANOVA
1 = 2 = 3
! H0 : Variance among groups = 0
X1
OR
X2
X3
Not all 's equal
HA: at least one
population mean is
different.
Frequency
! H0 : 1 = 2 = 3 = 4
Writing a Lab Report
Format
Include a descriptive title
Times New Roman; 12 point font
Double spaced
Figures need to be legible (dont make them too small)
Dont go over the page limit (which includes figures!)
Write in paragraph form
Label each sect
N
= 67.4
Inference about means
! = 3.9
Because Y is normally distributed, we can convert
its distribution to a standard normal distribution:
Y is normally distributed
Y = = 67.4
!Y =
whenever:
Y is normally distributed
or
n is large
!
3.9
=
= 1.7
n
5
Z=
Final exam
2.5 hours allotted
Chapters 1-17, 19
Excluding section 9.2, 9.5,14.7, 15.3,
15.6, confidence interval for r
Bring: Calculator (not programmable),
pen or pencil, UBC ID
You will be given: formula sheet & stats
tables
One variable: Which tes
Assumptions of t-tests
! Random sample(s)
! Populations are normally distributed
! (for 2-sample t) Populations have equal
variances
Detecting deviations from
normality: by histogram
Frequency
Biomass ratio
Detecting deviations from
normality
! Previous d
Proportions
Example:
2092 adult passengers on the
Titanic;
654 survived
Proportion of survivors = 654/2092
! 0.3
A proportion is the fraction of individuals
having a particular attribute.
Probability that two out of three randomly
chosen passengers surviv
Sir Francis Galton
The history of statistics has its
roots in biology
Inventor of fingerprints,
study of heredity of quantitative traits
Regression & correlation
Karl Pearson
PolymathStudied genetics
Correlation coefficient
!2 test
Standard deviation
Sir
Hypothesis testing
Hypothesis testing asks how unusual it is to
get data that differHypothesisthe nullnutshell
from testing in a hypothesis.
We want to know something
about this population, say, are
If the data would be quite unlikely under H0,
Population
Discrete distribution
Fitting probability models to
frequency data
A probability distribution describing a
discrete numerical random variable
For example,
! Number of heads from 10 flips of a coin
! Number of flowers in a square meter
! Number of disease
MID-TERM BIOL 300: February 2005
For all statistical tests, make sure that you clearly state your hypotheses. Unless
otherwise stated, assume = 0.05. Show your work. Be as precise as possible
about p-values.
Some questions have a box for the final answer.