•
Probability—the probability of an event is the proportion of times we would expect the event to
occur in an infinitely long series of identical sampling experiments
•
If all the possible outcomes are equally likely, the probability of the occurance of an event is equal
to the proportion of the possible outcomes characterized by the event
•
Confidence statement: Statistic +
Margin of error
•
Margin of error = 1/ n
1/2
•
Confidence interval: a range of values constructed from sample data so the parameter occurs
within the range at a specified probability.
The specified level of probability is called the level of
confidence
•
Confidence interval for a sample mean:
X
+
Z (S/N
1/2
)
•
Confidence interval for a sample proportion:
p +
Z [(p(1p))/n]
0.5
•
A value, computed from sample information that is used to estimate the population parameter
•
Standard error of the sample mean—the standard deviation of the sampling distribution of the
sample means.
It is a measure of the variability of the sampling distribution of the sample mean
•
Experimental process:
o
Subjects
treatment
observation
•
Variables
o
Explanatory variable (independent variable)
o
Response variable (dependent variable)
o
Lurking or confounding variable
•
Alternative experimental designs
o
Completely randomized design
simplest design strategy;
each subject is randomly
assigned to one group;
typically, group sizes are identical
o
Block design—used when known extraneous variables may influence the experiment;
subjects are presorted by the influencing variables, then partitioned into similar blocks;
subjects from each block randomly assigned to groups
o
Matched pairs design—each subject receives each treatment;
treatment sequence is
randomly chosen for each subject
o
Double blind design—neither the subjects nor the investigators know which treatment is
administered
•
Control—minimize the effects of lurking/confounding variables on the response, most simply by
comparing several treatments
•
Randomize—use impersonal chance to assign subjects to treatments
•
Replicate—repeat the experiment on many subjects to reduce chance variation in the results
•
Statistical significance—an observed effect so large that it would rarely occur by chance
•
Sampling frame:
the list of units from which a sample is chosen
•
Built in bias
o
Convenience sample—sample where the patients are selected, in part or in whole, at the
convenience of the researcher
o
Voluntary response sample –consists of people who chose themselves by responding to a
general appeal.
They often over represent people w/ strong opinions, most often
negative opinions.
•
Simple Random Sample (SRS)—A sample of N units from the sampling frame chosen in such a
way that every possible group of N units has the same chance of being chosen
o
Random: fair, representative, unbiased
o
Random sample designs: simple; multistage; systematic; stratified; cluster; hybrid
o
Possible shortcomings: bias due to poor sampling frame, cost of sampling
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '07
 Cleveland
 Data Mining, Null hypothesis, Statistical hypothesis testing, random sample designs

Click to edit the document details