Random Variables
A variable that takes different values with certain probabilities
Example: Number of ear infections a child has up to age 2
Discrete random variable X=0,1,2,3,4
x
P(X=x)
.129
.264
.271
.185
.095
.039
.017
1.00
0
1
2
3
4
5
6
Defn. The expe
SHAPES OF DISTRIBUTIONS
1. RIGHT SKEWED
Mean > median
2. LEFT SKEWED
Mean < median
3. SYMMETRIC
4. BIMODAL
NONPARAMETRIC METHODS
Statistical methods that
dont make parametric assumptions about distributions
insensitive to wild observations
can handle coarse quantitative variables
generally, based on ranks
Disadvantages
often less powerful than parametric meth
ALTERNATIVE APPROACH FOR COMPARING 2
PROPORTIONS (in LARGE SAMPLES)
The Chi-Square Statistic 2
Success
failure
X
a
b
n1
Y
c
d
n2
S
F
N
2
= c=
TS
(O E )
4cells
2
E
O=observed count in each of the 4 cells
E=expected count in each cell if Ho is true
Biostati
KEY COMPETENCIES TO BE A
SUCCESSFUL PRACTICING
BIOSTATISTICAN
1. Biostatistical methods and theory
2. Computation & Data Science
3. Collaboration with other researchers
4. Communication
Probability
m
EXACT CONFIDENCE INTERVAL FOR
BINOMIAL PROBABILITY
X ~ binomial (N, p)
Observe the value x
Find exact (1-) x 100% confidence interval for p
KEY IDEA
EXPLOIT CONNECTION BETWEEN
CONFIDENCE INTERVALS AND HYPOTHESIS
TESTS
The set of all values of parameter wh
Biostatistics 200B:
Methods in Biostatistics B
Lecture 22:
One-Way Analysis of
Variance
(ANOVA), continued
One-way ANOVA model
We have 2 groups, = 1, , , with
observations in each group.
Assume the observations are all independent,
each population has
How do we estimate the population variance 2 ?
N
(x x )
The sample variance is:
S 2 = i =1
2
i
N 1
S2 is an unbiased estimator of 2
That means, E(S2)= 2
N
2
x
x
(
)
i
1
N
2
2
i =1
E
S
E
E
=
=
( xi x )
(
)
PROOF
N 1 N 1 i =1
1 N
2
2
E ( xi ) N ( X )
Biostatistics 200B:
Methods in Biostatistics B
Lecture 23:
One-way ANOVA,
continued
Analysis of variance for one-way
ANOVA
Recall that in the linear regression model we partitioned
the total sum of squares into a regression SS and error SS.
= +
( )2
COMPARISON OF TWO GROUPS
1. Paired Design
Before vs after
Twin studies
Matching
2. Independent Samples
Persons exposed vs persons unexposed
Persons treated vs. Persons untreated
Exampl
STATISTICAL INFERENCE OF POPULATION MEAN, : CONTINUOUS DATA
One-sample
Hypothesis test for , 2 known (Z-test)
If Xn is a random sample from a normal population or a large random sample from any population with unknown mean and
known variance 2, then do
Ho
Lab 2: Descriptive Techniques in Summarization of Public Health Data
Public Health Data
I.
Analysis:
1. Summary of measures by subgroup for hemoglobin levels.
Acyanotic group
Measures
Median
Mean
Range (Max - Min)
IQR (75th -25th
1. 95% confidence interval for the mean was computed as [25,50] means it only
be said that 95% each of theses confidence intervals (more samples taken with
SAME n) would contain the TRUE mean of that particular distribution of the
sample. C.I. always esti
Lab 3
I.
Analysis:
1.
a) Ladders of powers transforms for the variable DAGE and CITH that
have large P values and small Chi2 values.
DAGE
Skewnes
Variable
Mean
Median
Range
SD
IQR
s
Raw
1
(Identity)
Square Roo
Lab 4: Normal Distribution and The Form of the Sampling
Distribution of the Mean and Variance of a Random Variable
I.
ANALYSIS:
1. The sample mean and variance for the sample size of 5, 10, and 50.
Sample Size
Lab 1: Introduction to STATA
I.
1.
Analysis:
The distribution of decom data is skewed because the median is much
closer to the 25th percentile of the data (shown in the boxplot below).
Particularly, the distrib
