My project discusses the regression of Engel Coefficient (Y) with Disposable
Income (X1), New Residential Area (X2), and Year-end Public Saving Deposit (X3) in
China during the year 1978, 1980, 1985-2012. The entire statistic com
Regression with Categorical Variables
1. To express concepts that are not generally quantitative we
use categorical (qualitative) variables.
2. In regression, categorical variables are built using dummy
3. Always use one less dummy variable
Multiple Comparison Tests
Fisher's Least Significance Difference Procedure
The result of an ANOVA test will only tell us if there is a difference among
the groups, i.e. treatments, that were tested. The results do not tell us
which, if any group or treatm
Adjusted R Square
Y = Travel Time
X1 = Miles Travelled
Coefficient of Determination
We have determined the best regression line for our Consumer Durables
data set using the Least Squares Method. The question now becomes, How
well does this regression line fit (represent) the data?
We recall th
Analysis of Variance
I. We have just studied the case where we wish to make
inferences about the means of two populations. Now we
will look at the case where we have more than two
populations to consider.
II. An example
Consider a problem in which it is d
Chi Square Test of Independence
Bank Location Problem
Calculating chi square
I. Confidence Intervals for the Mean Population Standard
Deviation, , Known
We would like to estimate the value of from sample data
We begin with the value of X which we can calculate from sample data.
We know that the value X , hav
Consumer Durables Expenditures Problem
Recall from our earlier lecture that we wish to fit the following
y = 0 + 1x1+ 2x2
y=consumer durables expenditures (CDX) ($100)
x1 = net income (NI) ($1000)
x2 = family size .
Two Sample Tests on the Mean
I. Previously we have studied single population statistical inference on the
mean. That is we have used the sample mean X to infer something about the
population mean .
II. Two Sample Tests
We can also make inferences about m
Some Regression Model Assumptions
Recall the following from our first lecture on Simple Linear
First Order Linear Model
y = 0 + 1x +
y = E(y) = Dependent variable
x = Independent Variable
0 = y intercept of line
1 = Slope of line
Simple Linear Regression
1. Regression Analysis Provides a Methodology to:
Relate the mean value of a single dependent variable to the values of
one or more independent variable.
2. Exact Relationships vs. Statistical Relationships
Exact = c2 = a2 + b2 ;
Statistical Inference and Sampling Distributions
I. What is Statistical Inference?
We want to estimate or make a statement about a population parameter - say the
population mean or the population standard deviation.
The way to do this without experienci
Test for Independence
Testing for the Independence of Two Variables
In this test we use the chi square distribution to test if two categorical
variables are Independent of each other.
Let's look at the following problem:
Suppose a savings bank in a metrop
S = cfw_ace of clubs, ace of diamonds, ace of hearts, ace of spades
S = cfw_2 of clubs, 3 of clubs, . . . , 10 of clubs, J of clubs, Q of clubs, K of clubs, A of clubs
There are 12; jack, queen, or king in each of the four suits.
15, 20, 25, 25, 27, 28, 30, 34
2nd position = 20
6th position = 28
Order the data from low 6.7 to high 36.6
Use 5th and 6th positions.
Mode = 7.2 (occurs 2 times)
Use 3rd position. Q1 = 7.2
Use 8th position. Q3 = 17.2
3. a. 360 x 58/120 = 174
b. 360 x 42/120 = 126
Management should be very pleased with the survey results. 40% + 46% = 8
Category A values for x are always associated with category 1 values for y. Category B values for x are usually
associated with category 1 values for y. Category C values for x are usually associated with category 2 values for
less than or equal to 19
less than or equal to 29
less than or equal to 39
less than or equal to 49
less than or equal to 59
Cumulative Relative Frequency
f (x) 0 for all values of x.
f (x) = 1 Therefore, it is a proper probability distribution.
Probability x 25 is f (20) + f (25) = .20 + .15 = .35
Probability x = 30 is f (30) = .25
Probability x > 30 is f (35) = .
f (0) = .3487
f (2) = .1937
P(x 2) = f (0) + f (1) + f (2) = .3487 + .3874 + .1937 = .9298
P(x 1) = 1 - f (0) = 1 - .3487 = .6513
E(x) = n p = 10 (.1) = 1
Var(x) = n p (1 - p) = 10 (.1) (.9) = .9
= = .95
No because P(A | B) P(A)
P(A B) = 0
No. P(A | B) P(A); the events, although mutually exclusive, are not independent.
Mutually exclusive events are dependent.
Upper tail p-value is the area to the right of the test statistic
Using normal table with z = 1.48: p-value = 1.0000 - .9306 = .0694
p-value > .01, do not reject H0
Reject H0 if z 2.33
1.48 < 2.33, do not reject H0