MACC 590 Homework
#7
Erika Reagan
Question 1 (Light EDA)
There are no
missing rows
Question 1 (Light EDA)
Question 1 (Clustering)
This graph is telling me that I should have
either 4 or 5 clusters. It would be insufficient
to have any more than 4 or 5. We

Erika Reagan
MACC 590
Homework #2
1. One may question the presumption the reduction in heart attacks from 2003 to 2007 is due to the
Clean Indoor Air Act for many reasons. The first reason is due to a sampling bias. This sampling
bias could have occurred

MACC 590 Homework
#6
Erika Reagan
1a
1b
This means that
they are a male
and his age is
greater than or
equal to 9.5, so
the chance of
survival was 18%.
This represents
59% of the
sample
population.
This means that
they are a male,
his age is less than
9.5

MACC 590 Regression
Assignment #2
Erika Reagan
Part 1
We need a p-value of <.01,
so drop the circled
variables. We will remove
WT, DIS and ACC.
*ETV is for Engine Type V
only.
Part 1
All of the p-values are <.01 and all
of the VIF values are <10, so we wi

MACC 590 Regression
Assignment #1
Erika Reagan
Part 1
We need a p-value of <.01, so drop the
circled variables. We will remove HmRun,
Runs, RBI, PutOuts, Assists and Errors.
Part 1
AtBats p-value is greater than .01,
so we need to drop this variable
and r

Roadmap for EDA for Homework #4
Data review
# DATA REVIEW: Read data
data<-read.csv("hitters.csv", na.strings=c("NA","NaN", " "),header = TRUE)
attach(data)
head(data)
One variable at a time Histogram/boxplot for continuous variable; barcharts for qualita

MACC 590 Homework
#5
Erika Reagan
Part A
Part B
Part C
There are 90 samples
that have less than 4.5
years. Log.Salary is
around 5.1.
There are 90 samples
that have more than
4.5 years, but have
less than 118 hits.
Log.salary is around
6.
There are 83 samp

MACC 590 Regression
Assignment #3
Erika Reagan
Step 1
Step 2
Step 2
Step 3
The best 3 subsets contain at least 4
variables.
The top three are:
Model 1: Contains variables NC, HP, ACC, ETV
Model 2: Contains variables WT, DIS, NC, HP,
ETV
Model 3: Contains

Data Analytics
Homework #4
Erika Reagan
Xs and Ys
Xs:
Continuous: AtBat, Hits, HmRun,
Runs, RBI, Walks, Years, PutOuts,
Assists, Errors
Qualitative: League, Division,
NewLeague
Ys:
Continuous: Salary and/or
log(Salary)
Analysis Plan
Variables one at a tim