Exploratory Data Analysis Tools (Chapter 1)
The term exploratory data analysis (EDA) was originally coined by John Tukey (Exploratory Data Analysis, Addison-Wesley, 1977). It refers to the initial exploration of data
usually by means of graphical tools be
One Sample Inferences: Review
The purpose of statistical inference is to draw conclusions about a population based on
a sample of data from that population.
There are two basic types of inference used:
1. Condence Intervals: Making statements with some
One Sample Odds and Ends: Review
This handout reviews some additional aspects of one sample inferences, much of which will
extend naturally to a regression or ANOVA setting. These extra topics include prediction
intervals, assessing normality, variance st
Some Regression Estimation Examples (2.8)
5
10
15
4000
10
15
0
2000
wage
5
educ
40
50
60
0
Example 1: Recall the US weekly wages data
collected as part of the Current Population
Survey in 1988. Suppose we want to model
the weekly wages as a function of ye
Linear Model Diagnostics (Chapter 4)
The previous two chapters discussed methods of estimation and inference used for linear
models. Here, we address what is known as model diagnostics. The term model diagnostics
refers to methodology for examining whethe
Inuential Observations (4.2.3) : Recall that inuential observations are data values
whose removal has a large eect on the model t, typically through the partial slopes or
the predicted values themselves. A number of inuence statistics have been developed
Factors Aecting Linear Model Inferences (3.6-3.8)
Statistical studies may be roughly divided into one of two types according to the possible
scope of inference as either observational studies (3.7) or controlled experiments (3.6).
In addition to consideri
Condence Intervals and Condence Regions (3.4, 3.5)
This handout reviews the construction of a normal-based condence interval for individual
linear model parameters, then introduces the use of simultaneous condence regions for two
or more parameters, as we
Sequential and Partial Sums of Squares
In most applications, there are two types of sums of squares of potential interest. These two
sums of squares types, sequential & partial, are outlined in this handout.
Sequential Sums of Squares : Sometimes there is
Hypothesis Tests to Compare Two Models (3.1, 3.2)
With an understanding of how to incorporate both quantitative and categorical explanatory
variables into a model, we now introduce the use of F-tests to compare two general linear
models where one model is
Inference in Regression Models (Chapter 3)
The previous chapter discussed the method of least squares as a method of model parameter
estimation in the general linear model. Here, we explore t-distribution and F-distribution
based method of inference on bo
Solutions - Homework #3
1. Problem 3.1: A multiple linear regression model with lpsa as the response variable and the variables
(lcavol,lweight,age,lbph,svi,lcp,gleason, pgg45) as explanatory variables was t in R giving the Coecients table below.
(a) Let
Solutions - Homework #2
1. Problem 2.1: The regression model was t using the lm function in R with the output given below
and the code found on the course webpage.
(a) The percentage of
variation in the
response explained
by the four predictors
is R2 = 0.
Solutions - Homework #1
1. Problem 1
(a) A quick inspection of the concentrations does not indicate any gross departures from normality
nor any unusual observations. Since the sample size is small (n = 15), we appeal to the robustness
of t-procedures to j
Geometric Interpretation of Linear Regression (2.3) : For any linear model y =
X + , one goal is to nd the values of the parameters in that in some sense provide the
closest agreement between the observed responses in y and those predicted by the model.
T
Gauss-Markov Theorem (2.6)
Least squares provides but one way to estimate the parameters in a linear model. However, if model assumptions are met, there are a number of reasons why least squares is
preferred over other estimators:
1. It makes sense geomet
Finding Unusual Observations (4.2)
There are two types of potentially troublesome points in linear models: outliers and inuential observations. This handout introduces tools for identifying both types of points
through the use of residuals and a set of st
Estimation in Linear Models (Chapter 2)
This handout introduces the notion of a linear statistical model, provides the matrix formulation of the general linear model, introduces the method of least squares as an estimation
method including its geometric i
Applied Linear Models - STAT 542
Linear models are the most common type of statistical model used in practice due to their
simplicity and a well-developed theory for performing statistical inferences based on a linear
model. In more recent years, due both