Almost all physical processes are impacted by randomness
Has anyone taken a physics class?
What kind of phenomena did the equations represent?
Plane crashes Your parents are flying home during a rainstorm
Chance meeting you are on vacation and meet your
Producing Data: Sampling
The procedure of systematically acquiring and recording information about every member of the
given target population - _.
Sample surveys that select a part of a population of interest to represent the whole are one type
of _.
Looking at Data-Relationships
Given a bivariate dataset of size n as follows: (X1, Y1), (X2, Y2), , (Xn, Yn).
Want to study the relationship between X and Y.
Examples:
1. Intelligence of husbands and wives
let X be the IQ of husbands and let Y be the IQ o
QUIZ#4
4.1 (0.25 Marks) A 99% condence interval for the mean 41 of a population is
computed from a random sample and found to be 6 i 3. We may conclude that
a. there is a 99% probability that y is between 3 and 9.
b. there is a 99% probability that
QUIZ#5 » Kein Swims of 51th
The information given below is used in Questions 5.1 to 5.4.
A company runs a three-day workshop on strategies for working effectively in teams. On each
day, a different strategy is presented. Forty-eight employees of
Experimental Designs
Q. Is the new weight loss drug effective?
ANS. Lets look at the following design:
<Design 1>
Randomly select 100 people (subjects) and label them.
An Important Distinction: Observational Studies versus Randomized
Experiments
Observation Versus Experiment
An observational study observes individuals and measures variables of interest but does not
An experiment, on the other hand, deli
Regression
Given a dataset (X1, Y1), (X2, Y2), , (Xn, Yn). To study the relationship between X and Y, we
begin with a scatterplot.
1. If the scatter diagram is football-shaped, it suggests a linear relationship
between X and Y.
1. Determine whether the following variables are categorical or quantitative.
(a) Number of laps a subject can swim
(b) Color of a randomly selected beetle
(c) Birth weight
(d) Hometown
(e) Life of a battery measured in hours
2. The data below represents
A coin is tossed 1 time
Could observe S = cfw_H,T
The list of all possible outcomes is called sample space
A coin is tossed 2 times
S= cfw_ (H,T), (T,H), (H,H), (T,T)
Probability
Distributions
Use of mathematics to describe the patterns formed by chance
Probability is based on models!
Model for a coin that says
No one flip influences any other (Independence)
Model for a die says
1,2,3,4,5,6 are equally likely
H and T are equally likely
Rolls are independent
10 coin flips:
HHHHHHHHHH
HHHHTHHHHH
HHTHTHHHH
In the previous plot:
Look for the overall pattern
Look for striking deviations from the pattern
Describe the pattern by the form (are there clusters of points?),
direction and strength of the relationship (Positive/negative
relationship based on the slop
No one can measure everything.
Instead have a fixed list of products & services
Carefully make changes over time
Add computers, cell phones, flat-screen TV,
Drop vinyl records, stereo consoles, board games,
Measure prices on these items
Figure out tota
R-Square measures proportion of variability in y that is explained by x
R-square actually = r^2
Values close to 1 mean that points are tight to line
Values close to 0 mean that line doesnt explain much
Example: the r^2 value for the old faithful geyser d
Model for a relationship on a scatter-plot
Mathematical formula for a shape
Many shapes available
Simple vs. Complex
Curvy vs. straight
Which model to use?
Aim is to summarize main feature of relationship
Appropriate if
y=response variable
x=explanatory
Other use for a regression model: Prediction
where there is no data
where a new data point will be observed
Just use equation
Can use RMSE to get an approximate interval that contains
new predicted values with given probability
Not quite right formula, b
Consider a histogram to shed some light upon the nature of the
intervals between eruptions
What do we observe?
Suggests potential of multiple
populations; bimodal
What does the scatter-plot tell us?
Observe a linear relationship between waiting time betw
Correlation requires that both variables be quantitative, so that it
makes sense to do the arithmetic indicated by the formula r
We cant calculate a correlation between incomes of a group of
people and what city they live in, because city is a categorica
Looking at Data-Distributions
Statistics is the science of conducting studies to collect, organize, summarize,
analyze, and draw conclusions from data.
Uses and Misuse of Statistics
Statistical techniques can be used to describe data, compare two or more
TABLES AND FORMULAS FOR MOORE
Basic Practice of Statistics
Exploring Data: Distributions
Look for overall pattern (shape, center, spread)
and deviations (outliers).
Mean (use a calculator):
x=
1
x1 + x2 + + xn
=
n
n
xi
Standard deviation (use a calcula
