Investigate the relationship between Cereal Variables and per capita GNP (use regression to determine factors
which predict log(GNP/capita)
1. Identify variables, response & predictors
Response variable: Log-transformed GNP per Capit
STAT 103: Homework #3
1. random: unbiased (every sample of size n has equal chance of being selected) & independent (selection
of 1 unit has no influence on the selection of other units)
systematic: take every x unit that comes along
Welcome to STAT 101a-109a
Introduction to Statistics
What is Statistics?
Like dreams, statistics are a form
On classes server under web page and syllabus.
of wish fulfillment Jean Baudrillard
About The Sections .
VARIANCE and STANDARD DEVIATION (SD)
Most common and useful measure of SPREAD of a
Standard Deviation Variance
Sample Variance = s2,
Standard Deviation = s
The Sample Variance s2 is
Idea of variance
Scatterplot Notation :
The Horizontal axis is ALWAYS
called the X axis.
Today: describing relationship of two
quantitative variables :
The Vertical axis is ALWAYS
called the Y axis.
mammals. Last time
we used regression
on the logs of brain
and body weight :
Once Again the simple linear leastsquares regression model :
y b0 b1 x
Simple Linear Least-Squares regression assumes that
Sampling and Experiments
If in doubt, consult with a Statistician : you can save
immense heartache, loss of resources, etc. by checking
with an expert first!
Research Design : Some General Advice
Decide What you Want to Know : Explicitly define the
Chapter 3 in Cartoon Guide
Toss a coin:
S = cfw_H,T.
Watch a tree for a year and see if it dies :
Probability is crucial to statistical inference
S = cfw_Dead, Alive.
Inferences are always expressed i
Introduction to Statistics: Homework #1
1. Analysis finds California Students attend school more than U.S. Peers (L.A.
Population: United State
TA: Michelle Roh
1. a. There was 75% support from Republicans and 30% support from Democrats.
Therefore, the poll estimated the difference in support to be 45%.
b. Using Minitab, the 95% confidence interval for the difference in propo
A. Make a boxplot showing the distribution of correct choices by treatment group combination (i.e. grapes
and hours of deprivation).
B. Calculate the mean and standard deviation in each subgroup. Are the equal variance requirements o
The regression assumptions seem to be met, because all of the data falls is reasonably within the
predicted values (depicted by the regression line). The prediction bands denote the region in which
we are 95% confident that fut
Problem Set #5
a. Republicans: 75% support
Democrats: 30% support
Diff. in support b/t Reps & Dems = Republican support Democrat support
75% - 30 %
b. Republicans = 310/1000 = 31% of respondents
75% of Republicans
Probability in Practice
(The trick is knowing when to apply which probability rule!)
Probabilities of events so far only add to 73% or 81%, investigation reveals that we forgot the last category (Not at
All : 26% and 10%)
Question : what is
Probability Models for Count Data
Bernoulli Random Variables
Example : Evolution vs. Creationism. A
Gallup poll conducted in 2008 (sample size n
1000 people nationwide) found that 44% of
American adults believed that humans were
created directly by God w
Central Limit Theorem Interpretation 1
The Central Limit Theorem
Take sample of size n from any distribution and calculate
the sample mean. Repeat this process many times. As
Two more views
long as n is large enough :
1. The histogram of all the
Example : IQ Tests.
The general population has a mean
IQ score of 100 and the population
standard deviation of scores is
Like proof by contradiction : (think back to geometry)
Example : there are infinitely prime numbers
Boxplot of illiteracy by Gender
Pooled Standard Deviation Estimate
Example : World Poverty Data,
2000. Compare worldwide illiteracy rates
between men and women at the 99%
confidence level for countries with a GNI per
capita of at least $500
Theoretical Regression Model
Inference for Regression
(Return of the Regression!)
Mean of Y is a linear function of X.
Inference for Regression answers questions like
Y o 1 X
How strong is the evidence that there is a real
correlation between two varia
Test for the correlation coefficient
Midterm Thursday, 10/21 in OML202
(rho or row) is the true correlation,
estimated by r the sample correlation
Closed note, closed book. Any complicated
formulas needed will be provided.
Suppose we want
Introductory Statistics 09-01-11
o Quantifying variation
o Distinguishing fact from fiction (e.g. firefighting is not the most dangerous
o Collecting, organizing, & interpreting data
Example: Hershey Kisses & rocks
o Appeared to be
Introductory Statistics 09-06-11
o The average
o The balancing point
o Add all data together then divide by how many things you added
o For odd # of observations, its the middle one
o For even # of observations, its the one between the two mid
1. (10 points) Find an article in a recent newspaper or periodical or web site where the
results of a statistical study are discussed. Include a copy of the article with your
homework. See how well you can fit the stud
STAT 103: Homework #2
The data in this scatterplot shows a negative association. The variables have a linear relationship
and a moderate correlation.
2. The two variables have a correlation of -0.64.
I feel that this calculation is appropriate and accu
a. Bernoulli, because there are only two possible outcomes, at least 2 dice means that the trials
are independent therefore the probability of success p is identical for each trial.
b. None of theseits a simple random sample that d