The test statistic used to test our hypothesis is a
chi-square statistic
, represented by
X
2
:

X
2
=
(Observed
−
Expected)
2
Expected
Σ

What happens to
X
2
is the observed and expected values are very similar?
What happens to
X
2
if the observed and expected values are very different?
To find the appropriate p-value for our hypothesis test, we must compare our test statistic to a
chi-
square distribution
,
χ
2
.
12.1
Chi-Square (
χ
2
) Distribution
The
χ
2
Distribution is a skewed distribution, whose shape depends on
degrees of freedom
.
The degrees of freedom is equal to
df
= (
r
−
1)
×
(
c
−
1). The p-value is always calculated as p-value =
P
(
χ
2
≥
X
2
)

We can use software to find the area under a
χ
2
distribution, with the Excel command
CHISQ.DIST.RT
:
CHISQ.DIST.RT(x, deg freedom)

EXAMPLE: Return to our data above, regarding company size and Facebook page. Our interest is in
determining whether there is a relationship between these two variables - that is, does a company’s size
influence whether or not it will have a Facebook page?
Company Size and Social Media
Facebook
Yes
No
Page
Company Size
Large
Small
30
76
364
130
Total
106
494
Total
394
206
600

It is much faster to conduct this analysis in software. We can use the function
CHISQ.TEST
, which
will return the p-value for our test for independence. To use this function, we need to have a table of our
observed and expected values.
The Excel command
CHISQ.TEST(G3:H4,G7:H8)
yields the following results:
Assumptions Required for a Valid Chi-Square Test
:
In order for our chi-square analysis to be valid, the following assumptions must be satisfied:
•
Our sample must be a simple random sample
•
For a 2
×
2 table, the expected values for each cell must be at least 5
If our table is larger than 2
2, then the mean of all expected values must be at least 5, and each
individual expected value count must be at least 1
×

13
Introduction to Regression & Correlation
In
regression analysis
, we explore the possible relationship between a quantitative
response
vari- able and one (or more)
explanatory
variables. The explanatory variable(s) can be a
quantitative or qualitative random variable, but for now we will only consider quantitative variables.
When we have one explanatory variable:
When we have more than one explanatory variable:
Regression analysis allows us to describe the relationship between
X
and
Y
with a
model
.
13.1
The Linear Regression Model
To motivate our regression model, reconsider the following example which we first saw when we intro-
duced the idea of correlation:
(Adapted from Question 13, page 675 in
Business Statistics
, 2
nd
Canadian Ed. (2014). Sharpe, N.R., DeVeaux, R.D.,
Velleman, P.F., Wright, D. Pearson Toronto).
Data on the number of sales associates working, and the number of
sales (in $1000s), were recorded for 10 randomly selected small book stores. The objective of the study
was to determine if there was a linear relationship between the number of sales associates on the floor
(explanatory variable) and the amount of business done in sales (response variable). A partial table of
the data is presented below.