STATISTICS 211 PROF EMANUEL PARZEN
Chapter 7 ONE SAMPLE, TWO SAMPLE STATISTICAL METHODS
STATISTICAL INFERENCE PARAMETERS
,
p
μ
Our Data Modeling Strategy has VALIDATION
action, phase,
problem 3 whose
goal is to find parameters of probability models that are VALID or PLAUSIBLE.
This means that for these parameter values observed data has higher probability of
being observed. This chapter provides a handbook or summary of methods used in
applied statistics to learn knowledge from data about the
following parameters:
(one sample continuous variable Y: parameter mean \mu)
(one sample, 01 valued variable: probability p)
(two independent samples, 01 variables: difference of probabilities)
(two independent samples, 01 variables: pooled p, 2 by 2 table)
(two independent samples, continuous variable Y: difference of means)
(two independent samples, continuous variable Y: equal
variances)
(one sample, continuous variable: parameter \sigma)
Bivariate continuous paired data (X,Y): difference of means
Bivariate 01 valued: difference of probabilities, 2 by 2 table

#(ONE SAMPLE, CONTINUOUS VARIABLE Y:PARAMETER
Mean
μ
)
Compute from sample:
SAMPLE SIZE
n
, MEAN
(
M Y
, STANDARD DEVIATION
S
For HYPOTHESIS TEST
0
0
:
H
μ
μ
=
COMPUTE
(
SAMPLE MEAN;TRUE MEAN
Z
TEST STATISTIC
(
29
(
29
(
(
29
(
29
0
0
;
M Y
Z M Y
SE M Y
μ
μ

=
(
(
SE M Y
n
σ
=
if known
[
]
2
VAR Y
σ
=
(
(
SE M Y
S
n
=
if unknown
σ
is estimated by S
“SAMPLING DISTRIBUTION” of
(
(
0
;
Z M Y
μ
ASSUMING
0
μ
TRUE MEAN
(
0,1
NORMAL
Z
exact if
Y
Normal and
σ
known
(
0,1
NORMAL
Z
approximate by central limit theorem,
σ
known
(
0,1
NORMAL
Z
approximate if
40
n
≥
,
σ
estimated by
S
(
1 degrees of freedom
STUDENT n

distribution if
40
n
<
, and
σ
estimated
by
S
“Rejection Region” for hypothesis
0
0
:
H
μ
μ
=
is interval of values of
(
(
0
;
Z M Y
μ
with (1) low specified probability under distributions above, and (2) maximum
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
possible probability under specified appropriate alternative hypothesis of form
0
0
0
,
,
μ
μ
μ
μ
μ
μ
≠
<
.
“ACCEPTANCE REGION” of values
(
(
0
;
Z M Y
μ
to accept
0
0
:
H
μ
μ
=
at
significance level .05 (specified probability of
TYPE I ERROR: rejecting
0
H
assuming
0
H
is true)
(
(
(
(
0
.025;
;
.975;
Q
Z
Z M Y
Q
Z
μ
≤
≤
if alternate hypothesis
0
μ
μ
≠
(
(
(
0
.05;
;
Q
Z
Z M Y
μ
≤
if alternate hypothesis
0
μ
μ
<
(
(
(
0
;
.95;
Z M Y
Q
Z
μ
≤
if alternate hypothesis
0
μ
μ
Confidence 95% confidence interval, defined as interval of values
0
μ
that satisfy:
Accept
0
0
:
H
μ
μ
=
at significance level 5%, is computed
by
computing endpoints of confidence interval from endpoint function (also called
“confidence quantile of parameter” ) which has several formulas:
If
(
(
0
;
Z M Y
μ
has
(
0,1
NORMAL
distribution confidence interval endpoint
function is
(
(
(
(
(
; TRUE MEAN
;
Q P
M Y
SE M Y
Q P Z
μ
=
+
When Y is Normal,
sample size n<40, and we use S to estimate \sigma
it is standard
practice to use more accurate formula for confidence interval endpoints
based on
fact first discovered in 1904
by W.S. Gossett, whose day job was Guinness beer
brewer:
(
(
0
;
Z M Y
μ
has
(
1
STUDENT t n

distribution
(
(
(
(
(
(
;
;
1
Q P
M Y
SE M Y
Q P Student n
μ
=
+

(
(
SE M Y
n
σ
=
if
σ
known
S
n
=
if
σ
estimate by
S
Two sided 95% CI can be expressed
(
(
.025;
.975;
Q
Q
μ
μ
μ
≤
≤
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '07
 Parzen
 Statistics, Normal Distribution, Probability, Statistical hypothesis testing, Emanuel Parzen, endpoint function

Click to edit the document details