Probability: With a random sample or a randomized experiment, the
probability an observation has a particular outcome is the proportion of
times that outcome would occur in a very long sequence of obs
Analyzing association between
categorical variables
Recall, there is an association between two variables if the
distribution of the response variable changes in some way as the
value of the explanato
Significance test
for a mean
For quantitative variables, significance tests usually refer to the
population mean
The null hypothesis has the form Ho: = o, where o is a particular
value for the popula
2-way
contingency tables
Party Identification
Gender
Dem.
Rep.
Total
Females
400
150
50
200
Total
100
Males
300
450
150
600
Some notation:
We will define nij as the number of observations in row i and
Why pair?
Example: The subjects in the previous study were asked to
rate their hunger at the end of each two week period.
Find the P-value for a paired sample t
test.
A.P=.013
B.P=.026
C.P=.073
s
D.P=
Pearson chi-square
test of independence
If the null is true, the expected cell counts ij should be close to
the observed cell counts nij.
The larger the errors, the greater the evidence against the
nu
Comparing
two proportions
As with means, hypothesis tests between two proportions generally
are testing if the proportions are the same. That is we generally
have
Ho: p1 = p2
This is again equivalent
ANOVA
We discussed before how to compare the means of two groups, 1
and 2.
We will now discuss comparing the means of several groups.
The mean of a quantitative response variable is compared among
gro
Normal
Error Model
No matter what distribution the error terms i have (and hence the
Yi), the least squares method provides unbiased point estimates of
0 and 1.
These estimates have minimum variance a
ANOVA
The F statistic is the ratio of the estimated variance between groups
over the estimated variance within groups.
If the estimated variance between groups is large compared to the
estimated varia
Relations between variables
A functional relationship between two variables is expressed by a
mathematical formula.
For an independent variable X and a dependent variable Y, a
functional relationship
ANOVA
One assumption of the ANOVA models is that the standard
deviation is the same for each group. This common standard
deviation is denoted by . Thus we assume that
1 = 2 = . = I =
homoskedasticity
ANOVA
Example: Two-week weight gains (lbs) of lambs on three different
diets 1: State H and H
Step
0
a
Diet1
SS(within ) 210
SS(within)
df(within ) 9
df(within)
Step 3:
Step 4:
Step5: MS(within)
MS(wi
SQL
SELECT
DISTINCT
WHERE
AND OR
IN
BETWEEN
LIKE
ORDER BY
COUNT
GROUP BY
HAVING
ALIAS
CONCATENATE
SUBSTRING
TRIM
CREATE TABLE
CONSTRAINT
NOT NULL
UNIQUE
CHECK
CREATE VIEW
CREATE INDEX
ALTER TABLE
DRO
Finding Z values
Many inferential methods use z-values corresponding to certain
probabilities from a normal distribution. This entails the reverse use
of the z-table.
Starting with a lower tail probab
Confidence Intervals
for means
Like a confidence interval for a proportion, the confidence interval for a
mean has the form: point estimate margin of error.
The margin of error for is z1-/2* y(bar) ,
Comparing
two means
To compare two populations, we estimate the difference between their
parameters.
To compare population means 1 and 2, we treat 1 - 2 as the
parameter.
The point estimate of this pa
Definitions relating samples to populations
Population
Parameter
A characteristic
of a population
(e.g. true mean,
true SD)
Sample
Statistic
A characteristic
of a sample
(e.g. sample mean,
sample SD)
Example
APD did a study of the relationship between the number of
accidents at intersections and panhandlers.
They found that there were a high number of accidents at
intersections that had a high num
Good graphics boxplots
Max
Q3
Median
Q1
Min
Boxplot: visual representation of the five number
summary
This distribution is
skewed to the:
A.
B.
60
65
70
75
80
Where would the mean be?
Right
Left
?
Goo
Binomial Distribution
For Categorical data, the following conditions are often true:
1)
Each observation falls into one of two categories (success or
failure, heads or tails)
2)
The probabilities for
Describing
variability of data
A measure of center alone is not adequate for numerically describing
data for a quantitative variable.
It describes a typical value, but not the spread of the data about
Point Estimation
We want to estimate parameters from our data. A point estimate is
a single number that is used to estimate the parameter.
An interval estimate is an interval of numbers around the poi
Descriptive statistics
The purpose of descriptive statistics is to summarize data in order to
make it easier to assimilate the information.
Tables and graphs are used to show the number of times vario
Confidence intervals: Locating an invisible man!
y
Themaniswithin1SEabouttwothirdsofthetime
95%confidenceinterval
Themaniswithin2SEsofthedogabout95%ofthetime
y
Weknowwherethedogis,andwedliketoestimate
The Normal Distribution
IfarandomvariableYfollowsanormaldistributionwithmeanandstandard
deviation,thenwewrite:
Y~N(,)
ThedensitycurveofthedistributionofanormalvariableYisgivenby:
1
f(y)=e
2
( )
1y
2
Note the mean of y(bar) is the same as the mean
of the true population. That is the mean of y(bar)
is the same as the mean of an observation.
Note the standard deviation of y(bar) is
smaller than the
Significance Tests
In statistics, a hypothesis is a statement about a population. It is
usually a statement that a parameter takes a particular numerical
value or falls in a certain range of values.
A
Significance test
for a mean
When the alternative hypothesis is one sided, technically the null
hypothesis is as well.
The null and alternative hypotheses are compliments and comprise
all possibilitie
Significance test
for a proportion
When the sample size is large, the sampling distribution of p(hat) is
approximately normal with mean p, the true parameter.
We are interested in testing a null hypot