The Law of Large Numbers (LLN)
The long-run relative frequency of one of the
outcomes of many repeated independent trials
gets closer and closer to a single value.
The long-run single value is the probability
Slide14 1
Formal or Mathematical Probability
1
1. A survey of families revealed that 58% of all families eat turkey at holiday meals, 44% eat ham, and 16% have
both turkey and ham to eat at holiday meals.
3. Draw a Venn diagram for the probabilities in this problem, with regions and probabilities clea
1
Lecture5Handout:ScientificDataAnalysisI,Fall2013
1. Car Repairs (Chapter 14, problems 19 and 21, page 352)
A consumer organization estimates that over a 1-year period 17% of cars will need to be repaired once, 7% will
need repairs twice, and 4% will req
1,
x".
/ MATH 410, Fall 2010, Midterm Exam 1
i
Name Hng @9ng
Show all Work to receive maximum credit.
1. Suppose that a Normal model described student scores in a history class. Parker has a standardized score
(zscore) of +2.5. This means that Parker
1
Lecture 1 Examples: Scientific Data Analysis I, Fall 2015
To determine if people's preference in dogs had changed in the recent years, organizers of a local dog show asked people
who attended the show to indicate which breed was their favorite. This inf
W -.
/
MATH 410, Spring 2011, Midterm Exam 1
Name Q! r
1. Which is true of the data shown in the histogram?
/I./ The distribution is skewed to the right.
e mean is probably smaller than the median.
: Hige should use median and IQR to summarize these data.
The Ws
To provide context we need the Ws
Why
HoW
Who
What (and in what units)
When
Where
of the data.
The first step of any data analysis should be to examine the
Wsthis is a key part of the Think step of any analysis.
Know the Why, Who, What, and How, be
Experiments with Random Outcomes
Experiments outcome cannot be predicted:
outcome = random variable
All possible outcomes = sample space S
Experiment: toss a 6-sided die once
Random variable: the number that comes up on the die
Sample space = S =cfw_1, 2,
1
Syllabus for MATH 410, Scientific Data Analysis I, Fall 2014
(Version 9/23/2014, subject to revision: posted at MyStatLab.com and by email)
Lecture
T
4:00 5:50 PM, Disque 103
Labs
W
W
W
M
F
R
Instructor
Ken Swartz
8:00 9:50 AM, Korman 103B (Sec 061)
10:
Straight to the Point
We cannot use a linear model unless the
relationship between the two variables is linear
.
Often reexpression can save the day,
straightening bent relationships so that we can
fit and use a simple linear model.
Slide 10 - 1
Straight
Experiments with Random Outcomes
Experiments outcome cannot be predicted:
outcome = random variable
All possible outcomes = sample space S
Experiment: toss a 6-sided die once
Random variable: the number that comes up on the die
Sample space = S =cfw_1, 2,
The Law of Large Numbers (LLN)
The long run relative frequency of one of the
long-run
outcomes of many repeated independent trials
gets closer and closer to a single value
value.
The long-run single value is the probability
Formal or Mathematical Probabil
The Linear Model
The sample data is used to estimate the regression
model, which is a model for the population.
The purpose of the model is to predict the mean
response for a given value of the predictor
.
The model is a straight line through the data.
Th
The Law of Large Numbers (LLN)
The long-run relative frequency of one of the
outcomes of many repeated independent
trials gets closer and closer to a single value.
The long-run single value is the
Empirical Probability
Formal Rules of Probability
1.
Proba
Why Be Random?
What is it about chance outcomes being
random that makes random selection seem
fair? Two things:
Nobody can guess the outcome before it happens.
When we want things to be fair, usually some
underlying set of outcomes will be equally like
Sheet1
Blood pressure
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Low
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Sheet1
Mortality Calcium Derby
1702
44 South
1309
59 South
1259
133 South
1427
27 North
1724
6 North
1175
107 South
1486
5 South
1456
90 South
1696
6 North
1236
101 South
1711
13 North
1444
14 North
1591
49 North
1987
8 North
1495
14 North
1369
68 South
1
Why Be Random?
What is it about chance outcomes being
random that makes random selection seem
fair? Two things:
Nobody can guess the outcome before it happens.
When we want things to be fair, usually some
underlying set of outcomes will be equally like
Experiments with Random Outcomes
Random: experiments outcome cannot be predicted:
outcome = random variable
all possible outcomes = sample space S
Experiment: toss a 6-sided die once
Random variable: the number that comes up on the die
Sample space = S =c
Lecture 4 Handout: Scientiﬁc Data Analysis I, Fall 2015
1. The following is a scatterplot of the average ﬁnal exam score versus midterm score for 11 sections of an
introductory statistics class:
Scatterpwt esf Scatter: Fina} vs Section Hiiﬁerm
The corre
1
Lecture 4 Handout: Scientific Data Analysis I, Fall 2015
1. The following is a scatterplot of the average final exam score versus midterm score for 11 sections of an
introductory statistics class:
The correlation coefficient for these data is r = 0.829.
4 Summary Statistics for Linear Regression
Slope: On average, for each
additional gram of Sugar,
Calories increase by 2.5
Calories by Sugar for 77 Different Cereals
160
Slope = 2.5 Cal/g
Intercept = 89.2 Cal
140
Intercept: On average, there are
89.2 Calor
1
Syllabus for MATH 410, Scientific Data Analysis I, Fall 2015
(Version 9/21/2015, subject to revision)
Lecture
T
4:00 5:50 PM, Stratton 113
Labs
W
W
W
R
F
M
Instructor
Ken Swartz
8:00 9:50 AM, Korman 110E
10:00 11:50 AM, Korman 110E
4:00 5:50 PM, Korman
Mortality Calcium Derby
1702
44 South
1309
59 South
1259
133 South
1427
27 North
1724
6 North
1175
107 South
1486
5 South
1456
90 South
1696
6 North
1236
101 South
1711
13 North
1444
14 North
1591
49 North
1987
8 North
1495
14 North
1369
68 South
1257
50
Data
A
B
1
13
1
2
2
14
Data
Malaria Outcomes :
Carriers and Non-carriers o
14
Not Sick
Sick
12
10
8
6
4
2
0
Carrier
Non-carrier
Group
Data
Calls of One Toss of a Fair C
Carriers and Non-carriers o
14
Correct
Incorrect
12
10
8
6
4
2
0
Carrier
Non-carrier
G
0e5a629b677b7125ba2422ceb2a00edf9b745a25
Player
Nolan Ryan
George Foster
Kirby Puckett
Jose Canseco
Roger Clemens
Ken Griffey, Jr.
Albert Belle
Pedro Martinez
Mike Piazza
Mo Vaughn
Kevin Brown
Carlos Delgado
Alex Rodriguez
Manny Ramirez
Alex Rodriguez
Der
faithfulCLT <- function(n) cfw_
POP <- faithful$waiting
m <- mean(POP)
s <- sd(POP)
means <- rep(0,10000)
for (i in 1:10000)cfw_
samp <- sample(POP,n)
means[i] <- mean(samp)
sm <- s/sqrt(n)
delm <- 0.006*sm
pts <- seq(m-3*sm,m+3*sm,delm)
fitMNS <- dnorm