Chapter 11 : Multiple Linear Regression
We have:
height weight . . . age
person 1:
person 2:
:
x12
x22
x11
x21
.
.
x1k
x2k
amount of
lemonade purchased
y1
y2
where we assume
Yi =
0
+
for i = 1, . . . , n and i N (0,
1 xi1
2
+
2 xi2
+ +
k xik
+ i
). The xi
Chapter 7 Notes - Inference for Single Samples
You know already for a large sample, you can invoke the CLT so:
X N (,
2
).
Also for a large sample, you can replace an unknown
by s.
You know how to do a hypothesis test for the mean, either:
calculate z-
Basic Concepts of Inference
Statistical nference is the process of making conclusions using data that is subject to
random variation.
Here are some basic denitions.
Bias( ) := ( )
computed from data.
, where
is the true parameter value and is an estimate
Chapter 9 Notes, Part 1 - Inference for Proportion and Count Data
We want to estimate the proportion p of a population that have a specic attribute, like
what percent of houses in Cambridge have a mouse in the house?
We are given X1 , . . . , Xp where Xi
Central Limit Theorem
(Convergence of the sample means distribution to the normal distribution)
Let X1 , X2 , . . . , Xn be a random sample drawn from any distribution with a nite mean
and variance 2 . As n ! 1, the distribution of:
X
p
/ n
converges to t
Chapter 9 Notes, 9.3 First Part
Inference for One Way Count Data
Chi-Square Test using the Multinomial Distribution
An example of the multinomial distribution: preference of ice cream avors:
Cells are numbered 1, . . . , c
P
Cell probabilities are p1 ,
Condence Intervals
Instead of reporting a point estimator, that is, a single value, we want to report a
condence interval [L, U ] where:
P cfw_L U = 1
,
the probability of the true value being within [L, U ] is pretty large.
Here, [L, U ] is a 100(1
U 6=
Chapter 9 Notes, 9.4
Inferences for Two Way Count Data
Lets say we want to test the association of income to job satisfaction. We
could do a survey in at least 2 ways:
Sampling Model 1 (n xed): Draw n people randomly from the population
and ask their inco
15.075 Exam 3
Instructor: Cynthia Rudin
TA: Dimitrios Bisias
November 22, 2011
Grading is based on demonstration of conceptual understanding, so you need to show all of your work.
Problem 1
A company makes high-denition televisions and does not like to ha
Probability Review
15.075 Cynthia Rudin
A probability space, dened by Kolmogorov (1903-1987) consists of:
A set of outcomes S, e.g.,
for the roll of a die, S = cfw_1, 2, 3, 4, 5, 6,
1
1
2
1
6
for the roll of two dice, S =
,
,
,
,.,
1
2
1
3
6
temperat
Chapter 4 - Summarizing Numerical Data
15.075 Cynthia Rudin
Here are some ways we can summarize data numerically.
Sample Mean:
x :=
Pn
i=1
n
xi
.
Note: in this class we will work with both the population mean and the sample
mean x. Do not confuse them! R
Chapter 10 Notes, Regression and Correlation
Regression analysis allows us to estimate the relationship of a response variable
to a set of predictor variables
Let
x 1 , x2 , x n
y1 , y2 , yn
be settings of x chosen by the investigator and
be the correspon
Chapter 8 : Inferences for Two Samples
In previous chapters, we had only one sample and we wanted to see whether
its mean or variance might be above or below a certain value. In Chapter
8 we compare statistics from 2 populations, and we want to know wheth
Chapter 14 Nonparametric Statistics
A.K.A. distribution-free statistics! Does not depend on the population tting
any particular type of distribution (e.g, normal). Since these methods make
fewer assumptions, they apply more broadly. at the expense of a le
15.075 Exam 2
Instructor: Cynthia Rudin
TA: Dimitrios Bisias
October 25, 2011
Grading is based on demonstration of conceptual understanding, so you need to show all of your work.
Problem 1
You are in charge of a study that compares how two weight-loss tec
15.075 Exam 1
Instructor: Cynthia Rudin
TA: Dimitrios Bisias
September 29, 2011
Grading is based on demonstration of conceptual understanding, so you need to show all of your work.
Problem 1
A very large bin contains 3 dierent types of disposable ashlight