Chapter3:Producingdata
Sampling Experimentation
Sampling design and experimental design
Producingdata: Designofexperiments
A carefully planned and executed experiment can provide good evidence for causation.
2
Obtainingdata
Beware of drawing conclusions f
Samplingdesign
Sampling methods Simple random sampling Stratified random sampling Multi-stage sampling
Sampling distribution and statistical inference
Caution about sampling
Samplingmethods
Convenience sampling: Just ask whoever is around.
Example: Man o
Probability
Randomness;Probabilitymodels
Randomness and probability Probability models: sample spaces, events Assigning probabilities: finite number of outcomes Basic probability rules
Random phenomena vs deterministic phenomena
Randomnessandprobability
A
Randomvariables(4.3,4.4)
Discrete random variables Continuous random variables Probability distributions Normal probability distributions Mean of a random variable Variance of a random variable Law of large numbers
Discreterandomvariables
A random variabl
Conditionalprobability(4.5)
Definition of conditional probability General multiplication rule Probability trees Bayes rule Independence
Conditional probabilities reflect how the probability of an event can change if we know that some other event has occur
Samplingdistributions forcountsandproportions(5.1)
Sampling distributions for counts and proportions
Sampling distribution of a count Sampling distribution of a proportion Binomial distributions Normal approximation
Binomialdistribution
A coin with P(H) =
Samplingdistributionofsamplemean
We take many random samples of a given size n from a population with mean and standard deviation . Some sample means will be above the population mean and some will be below, making up the sampling distribution.
Sampling d
Introductiontoinference
Estimatingwithconfidence
Assignment#8
5.48, 5.50, 5.52, 5.60, 6.18, 6.32, 6.66, 6.68 due October 28
Introductiontoinference
Estimatingwithconfidence
Confidenceinterval
A confidence interval is a range of values with an associated p
2.4 and 2.6
Regression diagnostics; residual plots Caution about correlation and regression Causation
Residuals
The vertical distances from the points in a scatter plot to the least-squares regression line give us potentially useful information about the
Probability
Stat 131A
Hank Ibser
Notation and Denitions A = an event which may or may not occur (A) Ac = the complement, or opposite, of event A (A complement) P (A) = the probability that event A occurs (probability of A) P (B |A) = the conditional proba
Lookingatdata:distributions Displayingdistributionswithgraphs
Exploratory Data Analysis
1
Spread sheet
Individuals in sample
Patient A Patient B Patient C Patient D Patient E Patient F Patient G
DIAGNOSIS
Heart disease Stroke Stroke Lung cancer Heart dise
Lookingatdata:distributions Describingdistributionswithnumbers
1
The pattern of variation of a variable is called its distribution. The distribution of a quantitative variable records its numerical values and how often each value occurs. The shape and spr
Assignment#2
1.120, 1.122, 1.126, 1.132, 1.136, 1.142, 1.146, 2.22, due September 16
Density Curves and Normal Distributions
2
Whatwehavelearned
Plot the data Look for overall pattern and deviations from the pattern Compute numerical summaries
Mathematica
Lookingatdata:relationships
Two variables measured at the same individuals
Both variables are quantitative One quantitative, one categorical Both variables are categorical Graphical and numerical summaries (scatter plot, correlation, etc.) Look for patter
Lookingatdata:relationships Leastsquaresregression
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables.
In addition, we would like to have a numerical description of how both variables
Introductiontoinference(6.2)
Testsofsignificance
Hypothesistesting
Use sample data to decide on the validity of a hypothesis.
Ladytastingtea
Given a cup of tea with milk, a lady claims she can discriminate as to whether milk or tea was first added to the
Introductiontoinference(6.3&6.4)
Significancelevelandpower;useandabuse oftests
Test as a decision Type I and II errors Significance level Power Use and abuse of tests
TypeIandIIerrors
A Type I error is made when we reject the null hypothesis and the null
1. P(neither subsystem fails) 1 P(at least one fails) " P(A fails) p(only B fails) !'& 2. No. P(A and B) 0.4 P(A or B) !( !) !% ", which is not possible. 3. a) (0.25)(0.05) (0.35)(0.04) (0.4)(0.02) 0.0345 b)
(0.35)(0.04) (0.25)(0.05)(0.35)(0.04)(0.4)(0.02
3.66. (a) This is a stratified random sample. (b) Label from 01 through 27; beginning at line 122, we choose 13 (805), 15 (760), 05 (916), 09 (510), 08 (925), 27 (619), 07 (415), 10 (650), 25 (909), and 23 (310). Note: The area codes are in north-south or
Sample questions 1. An electronic assembly consists of two subsystems, say A and B. From previous testing procedures, the following probabilities are assumed to be known: P(A fails) = 0.20, P(only B fails) = 0.15, P( both A and B fail ) = 0.15. What is th