STAT 103a
Homework #7
Investigate the relationship between Cereal Variables and per capita GNP (use regression to determine factors
which predict log(GNP/capita)
1. Identify variables, response & predictors
Response variable: Log-transformed GNP per Capit
STAT 103: Homework #3
1. random: unbiased (every sample of size n has equal chance of being selected) & independent (selection
of 1 unit has no influence on the selection of other units)
systematic: take every x unit that comes along
stratified: stratify
Welcome to STAT 101a-109a
Introduction to Statistics
What is Statistics?
Like dreams, statistics are a form
Syllabus Overview
On classes server under web page and syllabus.
Updated periodically.
of wish fulfillment Jean Baudrillard
About The Sections .
VARIANCE and STANDARD DEVIATION (SD)
Most common and useful measure of SPREAD of a
distribution
Relationship:
Standard Deviation Variance
Notation :
Sample Variance = s2,
Standard Deviation = s
The Sample Variance s2 is
Idea of variance
i=2
x2= 156cm
Scatterplot Notation :
Data Relationships
The Horizontal axis is ALWAYS
called the X axis.
Today: describing relationship of two
quantitative variables :
Y
The Vertical axis is ALWAYS
called the Y axis.
Scatterplots
Correlation
Regression
X
Associati
Example :
Brain/Body weight
relationship in
mammals. Last time
we used regression
on the logs of brain
and body weight :
Once Again the simple linear leastsquares regression model :
y b0 b1 x
Simple Linear Least-Squares regression assumes that
Fitted Line
Sampling and Experiments
If in doubt, consult with a Statistician : you can save
immense heartache, loss of resources, etc. by checking
with an expert first!
Research Design : Some General Advice
Decide What you Want to Know : Explicitly define the
para
Examples :
PROBABILITY
Discrete
Chapter 3 in Cartoon Guide
STRONGLY RECOMMENDED
Toss a coin:
S = cfw_H,T.
Watch a tree for a year and see if it dies :
Probability is crucial to statistical inference
S = cfw_Dead, Alive.
Inferences are always expressed i
Madison Bickel
Introduction to Statistics: Homework #1
Stat 103
1. Analysis finds California Students attend school more than U.S. Peers (L.A.
Times)
http:/www.latimes.com/local/education/la-me-school-attendance20140902-story.html
Population: United State
Madison Bickel
TA: Michelle Roh
Hw #5
1. a. There was 75% support from Republicans and 30% support from Democrats.
Therefore, the poll estimated the difference in support to be 45%.
b. Using Minitab, the 95% confidence interval for the difference in propo
STAT 103a
Homework #8
A. Make a boxplot showing the distribution of correct choices by treatment group combination (i.e. grapes
and hours of deprivation).
B. Calculate the mean and standard deviation in each subgroup. Are the equal variance requirements o
STAT 103a
Homework #6
1.
a.
The regression assumptions seem to be met, because all of the data falls is reasonably within the
predicted values (depicted by the regression line). The prediction bands denote the region in which
we are 95% confident that fut
STAT 103
Problem Set #5
1.
a. Republicans: 75% support
Democrats: 30% support
Diff. in support b/t Reps & Dems = Republican support Democrat support
=
75% - 30 %
=
45%
b. Republicans = 310/1000 = 31% of respondents
75% of Republicans
(.75)(.31)(1000)
232.
Probability in Practice
(The trick is knowing when to apply which probability rule!)
Probabilities of events so far only add to 73% or 81%, investigation reveals that we forgot the last category (Not at
All : 26% and 10%)
Suggestions :
Question : what is
Probability Models for Count Data
Bernoulli Random Variables
Example : Evolution vs. Creationism. A
Gallup poll conducted in 2008 (sample size n
1000 people nationwide) found that 44% of
American adults believed that humans were
created directly by God w
Central Limit Theorem Interpretation 1
The Central Limit Theorem
Revisited
Take sample of size n from any distribution and calculate
the sample mean. Repeat this process many times. As
Two more views
long as n is large enough :
1. The histogram of all the
Example : IQ Tests.
The general population has a mean
IQ score of 100 and the population
standard deviation of scores is
16 .
Hypothesis Testing
Like proof by contradiction : (think back to geometry)
Example : there are infinitely prime numbers
(attribu
Boxplot of illiteracy by Gender
Pooled Standard Deviation Estimate
50
40
30
Rate
Example : World Poverty Data,
2000. Compare worldwide illiteracy rates
between men and women at the 99%
confidence level for countries with a GNI per
capita of at least $500
Theoretical Regression Model
Inference for Regression
(Return of the Regression!)
Mean of Y is a linear function of X.
Inference for Regression answers questions like
Y o 1 X
How strong is the evidence that there is a real
correlation between two varia
Test for the correlation coefficient
Announcements
Midterm Thursday, 10/21 in OML202
(rho or row) is the true correlation,
estimated by r the sample correlation
Closed note, closed book. Any complicated
formulas needed will be provided.
Suppose we want
Introductory Statistics 09-01-11
Statistics
o Quantifying variation
o Distinguishing fact from fiction (e.g. firefighting is not the most dangerous
occupation)
o Collecting, organizing, & interpreting data
Example: Hershey Kisses & rocks
o Appeared to be
Introductory Statistics 09-06-11
Mean
o The average
o The balancing point
o Add all data together then divide by how many things you added
Median
o For odd # of observations, its the middle one
o For even # of observations, its the one between the two mid
Yohanna Pepa
Introductory Statistics
1. (10 points) Find an article in a recent newspaper or periodical or web site where the
results of a statistical study are discussed. Include a copy of the article with your
homework. See how well you can fit the stud
STAT 103: Homework #2
1.
The data in this scatterplot shows a negative association. The variables have a linear relationship
and a moderate correlation.
2. The two variables have a correlation of -0.64.
I feel that this calculation is appropriate and accu
STAT 103
Homework #4
1.
a. Bernoulli, because there are only two possible outcomes, at least 2 dice means that the trials
are independent therefore the probability of success p is identical for each trial.
b. None of theseits a simple random sample that d