The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Chapter 3
Multiple Linear Regression
1
3.1 Multiple Regression Models
Suppose that the yield in pounds of
conversion in a chemical process depends on
temperature and the catalyst concentration. A
multiple regression model that might
describe this relatio
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
10/13/2014
Chapter 9
Multicollinearity
Linear Regression Analysis 5E
Montgomery, Peck & Vining
1
9.1 Introduction
Multicollinearity is a problem that plagues many
regression models. It impacts the estimates of the
individual regression coefficients.
Use
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
9/24/2014
Chapter 4
Model Adequacy Checking
Linear Regression Analysis 5E
Montgomery, Peck & Vining
1
4.1 Introduction
Assumptions
1. Relationship between response and regressors
is linear (at least approximately).
2. Error term, has zero mean
3. Error te
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
10/8/2014
Chapter 8
Indicator Variables
Linear Regression Analysis 5E
Montgomery, Peck & Vining
1
8.1 The General Concept of Indicator Variables
Qualitative variables also known as
categorical variables. Qualitative variables
do not have a scale of measu
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
9/24/2014
Chapter 5
Transformations and Weighting
to Correct Model Inadequacies
Linear Regression Analysis 5E
Montgomery, Peck & Vining
1
5.1 Introduction
Linear Regression Analysis 5E
Montgomery, Peck & Vining
2
1
9/24/2014
5.1 Introduction
Data Transfo
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Review of Vectors and Matrix Algebra
Definition: Vector: An array of n real numbers x1 , x2 , . . . , xn is called an
n 1 column vector and is written as
x=
x1
x2
.
.
xn
We can also write it as a row vector where 0 denotes a transpose,
0
x = (x1 , x2 , .
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
10/13/2014
Chapter 10
Variable Selection and
Model Building
Linear Regression Analysis 5E
Montgomery, Peck & Vining
10.1
1
Introduction
10.1.1 ModelBuilding Problem
Two conflicting goals in regression model building:
1. Want as many regressors as possibl
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Chapter 6
Diagnostics for Leverage
and Influence
Linear Regression Analysis 5E
Montgomery, Peck and Vining
1
6.1 Importance of Detecting Influential
Observations
Leverage Point:
unusual xvalue;
very little effect
on regression
coefficients.
Linear Reg
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Chapter 1. Introduction
Regression Analysis describes a collection of statistical tools that
allow us to model and draw inferences about the relationship between
different variables.
The term regression was first proposed by an English scientist Sir
Franc
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
10/8/2014
Chapter 7
Polynomial Regression Models
Linear Regression Analysis 5E
Montgomery, Peck & Vining
1
7.1 Introduction
A secondorder polynomial in one variable:
y = 0 + 1 x + 2 x 2 +
A secondorder polynomial in two variables:
y = 0 + 1 x1 + 2 x2 +
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Class Project 3
1. Multiple Regression Analysis of Wine Data
White wine data will be analyzed. All wines are produced in a particular area of Portugal. Data
are collected on 12 different properties of the wines one of which is Quality, based on sensory
da
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Class Project 4
Using the same data set for class project 3, conduct the following residual analysis using SAS code below.
1. Residual Analysis
a. Examine normal probability plot of errors and determine if errors are normally
distributed. Verify with norm
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Class Project 2
1. Simple Linear Regression CAPM
In this project we will estimate the parameters of the capital asset pricing model (CAPM) for several
securities. CAPM states that the expected return of a security or a portfolio equals the rate on a
risk
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Class Project 5
Logistic Regression
In 1846, the Donner party (Donner and Reed families) left Springfield, Illinois for California in
covered wagons. After reaching Fort Bridger, Wyoming, the leaders decided to find a new route
to Sacramento. They became
The University of Texas at San Antonio San Antonio
Applied Regression
STATISTICS 4713

Fall 2016
Class Project 1
1. Hypothesis testing & Confidence Interval of Population Mean
Given that the EPA has a maximum benzene concentration allowed of 1 ppm, answer the following
questions using SAS code below and verify with your own calculations. Show all you
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
title: "markdown example"
author: "dj"
date: "August 31, 2016"
output: word_document
install.packages("package's name", repos=c("http:/rstudio.org/_packages",
"http:/cran.rstudio.com")
This post examines the features of [R Markdown]
(http:/www.rstudio.o
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
title: "Homework 4"
author: "Teresa Martinez"
date: "October 2, 2016"
output: word_document
`cfw_r setup, include=FALSE
knitr:opts_chunk$set(echo = TRUE)
`
QUESTION #6
1.
(a) Estimate the probability that a student who studies for 40h and has an
undergr
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
title: "Habits"
author: John Doe
date: March 22, 2005
output: word_document

install.packages("knitr")
install.packages("rmarkdown")
install.packages("ggplot2")
install.packages("lattice")
# EnsurePackage(x)  Installs and loads a package
# if necessary
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
#R programing
install.packages("mosaic") #one time only
require("mosaic") #every session you use
require("mosaicData")
?Births78
#documentation on Births78
data(Births78) #make Births78 data available
head(Births78) #print first 6 lines of the data
xyplot
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
#program
#URL file reading
births <read.table('http:/www.calvin.edu/~rpruim/data/births.txt',
header=TRUE)
head(births) # live births in the US each day of 1978.
#single variable selection
b78 < Births78$births
b78
with(Births78, births)
#Creating your o
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
title: "Untitled"
author: "Teresa Martinez"
date: "September 14, 2016"
output: word_document
`cfw_r setup, include=FALSE
knitr:opts_chunk$set(echo = TRUE)
`
# R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for
authorin
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
#data analysis example
pima < read.csv("pima.csv", header =TRUE)
pima #this will print pima data on console pane
summary(pima)
sort(pima$bp)
# data correction
pima$bp[pima$bp = 0] < NA
pima$glu[pima$glu = 0] < NA
pima$skin[pima$skin = 0] < NA
pima$ins
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
Lesson 1
R and RStudio
R is a statistical computer program made
available through the Internet under the General
Public License (GPL). That is, it is supplied with a
license that allows you to use it freely, distribute
it, or even sell it, as long as the
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
Data Mining / Data Analytics
Using R
DJ KO
Management Science and Statistics
UTSA
Data Analytics/ScienceWiki
Data analytics is the practice of deriving valuable
insights from data.
Data science is emerging to meet the challenges
of processing very larg
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
Programing R functions
Start Teaching with R
Read Start Teaching with R p. 106115
The Most Important Template
goal
y
~
x
, data=
mydata
What do you want R to do? (goal)
This determines the function to use
What must R know to do that?
This determines t
The University of Texas at San Antonio San Antonio
data mining
STATISTICS 4143

Fall 2015
Grammar in Data
tydyr
What is Data Science?
dplyr
Will need the packages
install.packages("devtools")
install.packages("tidyr")
install.packages("dplyr")
require(tidyr)
require(dplyr)
require(mosaic)
require(devtools)
devtools:install_github("rstudio/EDAW