Linear Regression

Remember that correlation
tells us the linear relationship between two
variables but does not tell us about predicting one variable from another

In the given output, the explanatory variable will be under constant and
the response will be the variable at the top of the output

Regression line y = ax + b, where a = variable coeff where b = constant
coeff from output

To predict y value from x, we simply input the x value into the equation
or vice versa

We will not get the same formula if we switch the variables

Only works if there is a somewhat strong linear relationship between
variables

Rsquared = % of variability in Y accounted for by X, Correlation =
(sign of a) * sqrt (R²)

Extrapolation is bad!!!! We cannot predict values outside the range
of our data!!
Probability

Probability of an event will be between 0 or 1 and the sum of all
probabilities is 1

Remember that if two outcomes are disjoint, then the probability of both
is the sum of their probabilities, meaning P(A or B) = P(A) + P(B).
(picking out a red or blue shirt)

If events are independent, then the probability of both is the
multiplication of each probability, meaning P(A and B) = P(A) * P(B).
(think of rain today and rain tomorrow)

Law of averages
tells us that probabilities stabilize in the long run

LOA does not mean that if we see a string of heads there is a better
chance of seeing a tail

Each event is independent, so the next event will have the same prob. as
it should
Sampling Distributions (Use when looking at average of sample or
proportions)

The sampling distribution will be normal no matter what the original
distribution looks like and it will have mean = original mean, and
sd_new = sd_old / sqrt(n)

Two concepts to remember about sampling distributions
o
String idea – meaning that as we take more and more samples
the sampling distribution gets pulled upward at the true
mean, and the sides shrink.
The sides shrinking
corresponding to the standard deviation getting smaller and
smaller
o
In or out of box – meaning that when we are trying to figure
out what is more likely with either a small or large sample,
draw a line where the average or median is.
Then draw a box
around the probability you are trying to compare with.
If the
line is inside the box then you want the larger sample, outside
the box you want the smaller sample

When we look at proportions, the sampling distribution has mean p,
which is the true proportion and sd = sqrt(p(1p) / n).
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.