counting problems.
"let me count the ways"
1. How many car license plates can a state have if each plate consists of 2 letters followed by 4 digits? (Use the alphabet with 26 letters.) 26 * 26 * 10 4 = 6,760,000 How many car license plates can the state h
outliers and influential points in regression
an outlier is an observation with a large residual
the standardized residuals are often used to look for outliers
because the standardized residuals have
mean 0 and standard deviation 1
and because their dist
two independent sample problems
finding sample sizes needed for required margin of error
confidence interval for the difference of two means
well start out assuming the sampling distribution of 1 2
is approximately normal curve
as we did for one sample-pr
correlation vs. slope
correlation measures the direction and strength
of the linear relationship between two quantitative variables
Baldi and Moore, box, page 75
slope estimates the difference between the means of the Y values
for two populations whose X
predict command calculating fitted values and residuals in Stata
this command is sensitive to the context
refers to the last model we fit
"predict" isnt the best name
the predict command can calculate many useful quantities,
depending on the option we spe
piano lessons and preschoolers spatio-temporal reasoning
two-sample comparison of means
we discussed the need for a comparison group
when we analyzed the data for the piano lesson group
a "difference in differences" analysis
now we have the tools to do th
testing association between two categorical variables
we need a new distribution to do this:
the 2 distribution
there are actually several situations in which this distribution can help us
so when someone says: do a 2 test
we need to ask: which one ?
Bald
matched-pairs with binary data
measurements on same individuals under 2 conditions
before and after an intervention
under 2 different treatments
2 individuals matched on several characteristics
and then each randomly assigned to one of two treatments
exac
comparing means or proportions from two independent samples
really useful material for both
experiments where individuals are randomly assigned
to one of 2 treatments, or to a treatment or a control
random samples are used to collect observational data
on
relationship between two independent sample z test
and 2 test for 2 by 2 table
Baldi and Moore, pages 551 to 552
Reader, pages 13 to 14
binary outcome: Yes or No
two populations or individuals randomly assigned to two treatments
null hypothesis p 1 = p 2
regression and correlation
general picture for regression:
continuous outcome variable, denoted by Y
mean of Y varies
as a function of a continuous explanatory variable X
as a function of several explanatory variables
X1 , X2 , X3 , . . . , Xp
which can b
extending "linear" regression to fit curved relationships
these examples provide more practice using plots to check assumptions
both examples constructed for teaching purposes
curve with constant variance of y values?
consider polynomials
curve and varian
introduction to multiple regression
well focus on models and interpreting coefficients
continuous outcome Y
mean of Y varies
as a function of a continuous explanatory variable X
as a function of several explanatory variables
X1 , X2 , X3 , . . . , Xp
whic
pedometer steps and BMI in college women
adapted from Baldi and Moore
Reader, page 47 to 54
data source in the Reader
100 women college students
wore pedometers for a week as they went about usual activities
not an intervention, except they knew they were
after weve checked the assumptions, ready for inference!
when weve checked that the regression model assumptions
are approximately satisfied, were ready for inference!
what tools do we need?
focus on what sample can tell us about population slope
sampling
tests of the null hypothesis p 1 = p 2 or p 1 p 2 = 0
if we assume that the population proportions are equal,
then we estimate this common proportion
using the data
=
1 + 2
use this common to estimate the standard error of 1 2
1
1
1
+
2
large-sample
ph 141 brief course description:
probability and statistics in biology and public health summer 2010 probability descriptive statistics sampling distributions confidence intervals and tests for means and proportions comparing the means of 2 populations co
ph 141 important dates Thursday, July 8 Tuesday, July 13 Friday, July 16 Friday, July 23 Monday, July 26 Friday, July 30 Wednesday, August 4 Monday, August 9 Wednesday, August 11 Friday, August 13 quiz 1 quiz 2 exam 1 quiz 3 quiz 4 exam 2
summer 2010
case
sketch of solutions for hypergeometric distribution practice problems try to do these by reasoning from first principles, using combinations for selecting at random without replacement and the multiplication rule m 1 ways to select items/individuals of on
ph 141
summer 2010
ph 141 is an intensive introduction to applied statistical methods. Although the course has been designed specifically for those embarking on a one-year MPH degree, the course meets the needs of a broad range of undergraduate and gradua
outline of solutions for problems from chapter 4 of Moore, McCabe, and Craig, 6th ed depending on the context of the problem, Venn diagrams and trees are really useful but they're not included here because they're hard to make on computer! 4.106, 4.107 an
sketch of solutions to screening problems drawing the corresponding trees will be helpful! Breast cancer and mammograms from Principles of Biostatistics, Marcello Pagano and Kimberlee Gauvreau, Duxbury Suppose that the sensitivity of a mammogram, a screen
ph 141 reminders:
preparation for 1st in-class exam you will need a calculator and your textbook. since this is an open-book exam, you may also bring your notes, the handouts, and any other sort of written summary you have made. the working time for the e
support treatments for those getting off cocaine
Baldi and Moore exercise 22.37, pages 536 and 537
randomized controlled trial, 3 year study
n
no relapse
relapse
despiramine
24
14
10
(no relapse)
lithium
24
6
18
.250
placebo
24
4
20
.167
.583
lets have S
association between variables
two independent sample t test for means
association between a binary variable (2 groups, 2 treatments)
and a continuous outcome
extends to more than two groups:
ANOVA, analysis of variance
association between a categorical va
using fitted values
= a + b x *
is both the estimate of the population mean
and the predicted value for an individual
the standard errors we use to quantify uncertainty are different
when we estimate the mean y for a population,
there is uncertainty be
sums of squares and R 2
in Stata output,
these quantities are actually displayed before the fitted line
Baldi and Moore dont discuss these!
Reader, pages 42 to 45
weve already discussed the Sum of Squares Residual
( ) 2
df = n 2
= 1
compares observed val
"linear" regression model
outcome Y must be continuous
explanatory variables X can be continuous or binary
for now, were using one continuous X
simplest statistical model asks:
can we use a line to summarize relationship between Y and X ?
start out by loo