Stat 415 HW6 Solution
1. Initially we apply the K-means clustering method with 3 clusters. We also recode the original
class labels as: A=1, B=2, C=3. Note that after applying the K-means, the cluster
STATS 415 - Homework 8
Due Friday, April 10, 2015
1. The data set is a collection of 4601 emails of which 1813 were considered spam, i.e., unsolicited commercial email. The data set consists of 58 var
HW3 Solution
1.
(a) For LDA, we assume X |Y = k N (k , ) for k=1,-1, also we need to have the prior probabilities k for class k. So, the parameters here would be: 1 , 1 , 1 , 1 , .
For QDA, we assume
Stats 415 - Data
Ji Zhu, Michigan Statistics
Data
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
1
Stats 415 - Data
Ji Zhu, Michigan Statistics
What is Data?
Collection of data objects
Stats 415 - Homework 3
Due Monday, February 25, 2013
1. For the one dimensional (training) data below, give the linear discriminant analysis and quadratic discriminant analysis classiers.
x
y
-3
-1
-2
Stats 415 - Homework 2
Due Monday, February 11, 2013
1. The fish dataset is from a study conducted to distinguish different
species of sh. It contains seven variables:
Species: specie of the sh
Weig
Stats 415 - HW4 Solution
1. Consider the following simulation example: First we generate
10 means from a bivariate Gaussian distribution N (1, 0)T , I )
and label this class green. Similarly, 10 more
Stats 415 - Homework 5
Due Wednesday, April 10, 2013
Spam Email. The data set is a collection of 4601 emails of which 1813
were considered spam, i.e., unsolicited commercial email. The data
set consi
Stat 415 HW5 Solution
1. The training data set and test data set have 57 continuous variables and one class label with
3067 and 1534 observations, respectively.
The goal is to predict whether an email
HW1 Solution
1. Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have mo
Lab 7
Greg Hunt
February 23, 2017
Subset Selection Methods
Here we apply the best subset selection approach to the Hitters data. We wish to predict a baseball players
Salary on the basis of various st
STATS 415: Classification - LDA, QDA and LR
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
Classification - LDA, QDA and LR
1 / 50
Examples of Classificat
Lab 3: Principal Components Analysis
Greg Hunt
January 23, 2017
PCA Viewpoint
At its heart PCA is a dimensionality reduction method. That means it takes the variables we measure
X1 , . . . , XN and lo
Lab 1
Greg Hunt
January 11, 2017
R code for this Lab note can be found online: http:/www-bcf.usc.edu/~gareth/ISL/data.html
Chapter 2 Lab: Introduction to R
Installing R on your Personal Computer
Downl
Lab 8 Splines and GAM
Xuefei Zhang
March 10, 2017
We begin by loading the ISLR library, which contains the Wage data.
library(ISLR)
attach(Wage)
Polynomial Regression and Step Functions
fit=lm(wage~po
Homework 3
Stats 415, Winter 2015
Zhiyuan Wang
Problem 1
(1) Even the true relationship between x and y is linear, adding extra predictors x2 and x3 will necessarily make the training-data RSS non-inc
Homework 1
Stats 415, Winter 2015
Zhiyuan Wang
Problem 1
(1) Time in terms of AM or PM.
Binary, qualitative, ordinal
(2) Brightness as measured by a light meter.
Continuous, quantitative, ratio
(3) Br
Homework 6
Stats 415, Winter 2015
Zhiyuan Wang
Problem 1
(1) The standard logistic regression is given by:
.
(2) The logistic regression discarding the first observation is given by:
.
(3) The predict
Homework 5
Stats 415, Winter 2015
Zhiyuan Wang
Problem 1
(1) We have the following plots:
and observe from plots we see that displacement, horspower, cylinders and weight seem to be able
to predit mpg
Statistics 408
Homework Set I
Winter 2016
1. What is a system? Provide an example.
A system is a collection of components that come together repeatedly for a purpose. An example
would be an assembly l
Conner Marion
Stats 415 HW #2
1) a. college <- read.csv("~/Desktop/College.csv", header = TRUE, sep = ",")
b.
c. i. summary(college)
ii. pairs(college[, 1:10]) -
iii. boxplot(college$Outstate~college$
Conner Marion
Stats 415 HW 3
1. a) There are a large number of observations, but a small amount of
observable features p. Due to the large sample size, the flexible model will be
more accurate because
STATS 500, HOMEWORK #8, due Wednesday, April 8, 1st
1. Use the aatemp data from 1881 to 2000 and let temp be the response and year as the predictor
and consider the following models:
Orthogonal Polyn
STAT 415: Tree-Based Methods
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
Tree-Based Methods
1 / 52
Tree-Based Methods
Classication tree (CART)
Ensemb
STAT 415: Support Vector Machines
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
SVM
1 / 22
Separating Hyperplanes
Imagine a situation where you have a t
STAT 415: Cluster Analysis
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
Cluster Analysis
1 / 76
What is Cluster Analysis
Finding groups of objects such
Lab 6
Xuefei Zhang
February 17, 2017
The Validation Set Approach
We explore the use of the validation set approach in order to estimate the test error rates that result from
fitting various linear mod
Stats 415 Lab 4: LDA and QDA
Xuefei Zhang
February 3, 2017
The Stock Market Data
We begin by examining some numerical and graphical summaries of the Smarket data, which is part of the
ISLR library. Th
STATS 415: Overview
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
Overview
1 / 30
What is Data Mining?
Data mining is a multi-disciplinary field of stud
STATS 415: Course Information
Ji Zhu
Professor of Statistics
455 West Hall
[email protected]
Ji Zhu (University of Michigan)
Information
1/7
Personnel
Ji Zhu
Lecture, TThu 11:30-1pm, 1324 East Hall
Of
Lab 5
Greg Hunt
February 8, 2017
Logistic Regression
Logistic regression is one of the most basic and widely used method for classification. As the name implies
its application in R mimics that of lin