Regression Analysis III
Model buliding
Variable selection
Residual Analysis
General Linear Model
Models in which the parameters (0, 1, . . . , p ) all
have exponents of one are called linear models.
A general linear model involving p independent
variables
Experimental Design and Analysis of
Variance
Elements of a designed experiment
Completely Randomized Designs
Multiple comparison of means
Randomized Block Design
Factorial Experiments
Elements of experimental design
Statistical studies can be classifi
Test of Hypotheses
Hypothesis: An assumption, theory, or claim concerning a
parameter of a population.
For example:
Average annual salary of business undergraduate students =
$60,000
Average mile per gallon of a U.S.build SUV = 20 miles /
gallon.
A na
Optimization
In this section we are going to look at optimization problems. In
optimization problems we are looking for the largest value or the
smallest value that a function can take. We saw how to solve
one kind of optimization problem in the Absolute
Classification
The linear regression model assumes that the response variable Y is quantitative.
But in many situations, the response variable is instead qualitative
Often qualitative variables are referred to as categorical; we will use these terms
inter
Basics of Statistics
Statistical learning refers to a set of tools for modeling and understanding
complex datasets
It is a recently developed area in statistics and blends with parallel developments
in computer science and, in particular, machine learning
The Definition of the Definite Integral
In this section we will formally define the definite integral and
give many of the properties of definite integrals. Lets start off
with the definition of a definite integral.
Definite Integral
Given a function
that
Resampling Method
Resampling methods are an indispensable tool in modern statistics
They involve repeatedly drawing samples from a training set and refitting a model
of interest on each sample in order to obtain additional information about the
fitted mod
Linear Regression
Linear regression is a useful tool for predicting a quantitative response
The importance of having a good understanding of linear regression before
studying more complex learning methods cannot be overstated
Simple Linear Regression
It i
Statistical Learning
Example: Suppose that we are statistical consultants hired by a client to provide
advice on how to improve sales of a particular product. The Advertising data set
consists of the sales of that product in 200 different markets, along w
Multiple Linear Regression
Simple linear regression is a useful approach for predicting a response on the
basis of a single predictor variable
However, in practice we often have more than one predictor.
One option is to run three separate simple linear re
Linearity
Linear models are relatively simple to describe and implement, and have
advantages over other approaches in terms of interpretation and inference
However, standard linear regression can have significant limitations in terms of
predictive power
P
Linear Model
The linear model has distinct advantages in terms of inference and, on realworld
problems, is often surprisingly competitive in relation to nonlinear methods.
Alternative fitting procedures can yield better prediction accuracy and model
int
Support Vector Machine
Support Vector Machines have been shown to perform well in a variety of
settings, and are often considered one of the best out of the box classifiers.
The support vector machine is a generalization of a simple and intuitive classifi
Models
There is no free lunch in statistics: no one method dominates all others over all
possible data sets
On a particular data set, one specific method may work best, but some other
method may work better on a similar but different data set
Quality of F
Testing categorical probability: oneway table
Consider a multinomial experiment with k outcomes
that corresponds to a single qualitative variable.
The experiment would consist of n identical trials.
There are k outcomes for each trial. The probability
Distribution and sampling
Discrete random variable
Continuous random variable
Sampling and central limit theorem.
1
Random Variables
A random variable is a numerical description of
the outcome of an experiment.
A mapping from event to a real value.
A disc
Comparing k Population Means
Example: Reed Manufacturing
Janet Reed would like to know if there is any
significant difference in the mean number of
hours worked per week for the department
managers at her three manufacturing plants
(in Buffalo, Pittsburgh
Multiple Regression
Multiple Regression Model
Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the indepe
Multiple Regression II
Ftest for overall significance
Using the Estimated Regression Equation
for Estimation and Prediction
Qualitative Independent Variables
Testing for Significance: F Test
Hypotheses
H0 : 1 = 2 = . . . = p = 0
Ha: One or more of the pa
Statistical Methods in Business
Professor: Xiaodong
Course
Lin
#: 33:623:385
Office: Levin
252
1
Statistics and Probability review
Descriptive statistics
Chapter 2
Probability:
Chapter 3
Random variables and sampling:
Chapter 4
One sample inferences:
C
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
In a manufacturing plant we wish t
Section H1
Stat Methods in Business (33:136:385)
Summer 2015
Homework #5
Due date: August 11, 2015
Name:
RUID:

Instructions:
1. Show all your work. Clearly indicate your final answer.
2. Carry all computations to at least two decimal places.
1. List the
Section H1
Stat Methods in Business (33:136:385)
Summer 2015
Take Home Exam_Final
Due: August 11, 2015
Name:
SOLUTION
RUID:
INSTRUCTOR

Instructions:
1. Show all your work. Clearly indicate your final answer.
2. Carry all computations to at least three d