Stats 415 - Homework 3
Due Monday, February 25, 2013
1. For the one dimensional (training) data below, give the linear discriminant analysis and quadratic discriminant analysis classiers.
x
y
-3
-1
-2
-1
0
-1
1
-1
-1
1
2
1
3
1
4
1
5
1
(a) What are the par
HW3 Solution
1.
(a) For LDA, we assume X |Y = k N (k , ) for k=1,-1, also we need to have the prior probabilities k for class k. So, the parameters here would be: 1 , 1 , 1 , 1 , .
For QDA, we assume X |Y = k N (k , k ) for k=1,-1, also we need to have th
Stat 415 HW6 Solution
1. Initially we apply the K-means clustering method with 3 clusters. We also recode the original
class labels as: A=1, B=2, C=3. Note that after applying the K-means, the cluster labels are
assigned independently of the original clas
STATS 415 - Homework 8
Due Friday, April 10, 2015
1. The data set is a collection of 4601 emails of which 1813 were considered spam, i.e., unsolicited commercial email. The data set consists of 58 variables of which 57 are continuous predictors and one is
Stats 415 - Data
Ji Zhu, Michigan Statistics
Data
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Data
Ji Zhu, Michigan Statistics
What is Data?
Collection of data objects and their attributes
Object is also known as record, p
Stats 415 - HW4 Solution
1. Consider the following simulation example: First we generate
10 means from a bivariate Gaussian distribution N (1, 0)T , I )
and label this class green. Similarly, 10 more are drawn from
N (0, 1)T , I ) and labeled class red. T
Stats 415 - Homework 2
Due Monday, February 11, 2013
1. The fish dataset is from a study conducted to distinguish different
species of sh. It contains seven variables:
Species: specie of the sh
Weight: weight of the sh (in grams)
Length1: length from t
Stats 415 - Homework 5
Due Wednesday, April 10, 2013
Spam Email. The data set is a collection of 4601 emails of which 1813
were considered spam, i.e., unsolicited commercial email. The data
set consists of 58 variables of which 57 are continuous predicto
Stat 415 HW5 Solution
1. The training data set and test data set have 57 continuous variables and one class label with
3067 and 1534 observations, respectively.
The goal is to predict whether an email is a spam (1) or not (0). First, we t a single classic
HW1 Solution
1. Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more
than one interpretation, so briey indicate your reas
STAT 415: Cluster Analysis
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
Ji Zhu (University of Michigan)
Cluster Analysis
1 / 76
What is Cluster Analysis
Finding groups of objects such that the objects in a group will be
similar (or relate
STAT 415: Support Vector Machines
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
Ji Zhu (University of Michigan)
SVM
1 / 22
Separating Hyperplanes
Imagine a situation where you have a two class classication problem
with two predictors X1 an
STAT 415: Tree-Based Methods
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
Ji Zhu (University of Michigan)
Tree-Based Methods
1 / 52
Tree-Based Methods
Classication tree (CART)
Ensemble methods: bagging, random forest, boosting
Ji Zhu (Un
STATS 500, HOMEWORK #8, due Wednesday, April 8, 1st
1. Use the aatemp data from 1881 to 2000 and let temp be the response and year as the predictor
and consider the following models:
Orthogonal Polynomials to the 4-th degree.
Orthogonal Polynomials to t
Conner Marion
Stats 415 HW 3
1. a) There are a large number of observations, but a small amount of
observable features p. Due to the large sample size, the flexible model will be
more accurate because there are lower chances to model the overfit/noise to
Conner Marion
Stats 415 HW #2
1) a. college <- read.csv("~/Desktop/College.csv", header = TRUE, sep = ",")
b.
c. i. summary(college)
ii. pairs(college[, 1:10]) -
iii. boxplot(college$Outstate~college$Private,col=
c("blue","green"),main="OutstateversusPriv
Stats 415 - Classication Part I
Ji Zhu, Michigan Statistics
Classication: Part I
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Classication Part I
Ji Zhu, Michigan Statistics
Examples of Classication
Predicting tumor cells as
Stats 415 - Homework 1
Due Wednesday, January 30, 2013
1. Classify the following variables as binary, discrete, or continuous.
Also classify them as qualitative (nominal or ordinal) or quantitative
(interval or ratio). Some cases may have more than one in
Stats 415 - Classication Part II
Ji Zhu, Michigan Statistics
Classication: Part II
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Classication Part II
Generative methods
Discriminative methods
Ji Zhu, Michigan Statistics
2
Sta
Stats 415 - Association Analysis
Ji Zhu, Michigan Statistics
Association Analysis
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Association Analysis
Ji Zhu, Michigan Statistics
Association Rule Mining
Given a set of transacti
Stats 415 - Classication Part III
Ji Zhu, Michigan Statistics
Classication: Part III
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Classication Part III
Ji Zhu, Michigan Statistics
Discriminative Methods
Logistic regression
K-
Stats 415 - Cluster Analysis
Ji Zhu, Michigan Statistics
Cluster Analysis
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Cluster Analysis
Ji Zhu, Michigan Statistics
What is Cluster Analysis
Finding groups of objects such that
Stats 415 - Homework 4
Due Wednesday, March 13, 2013
1. Consider the following simulation example: First we generate 10 means
T
from a bivariate Gaussian distribution N (1, 0) , I ) and label this
T
class green. Similarly, 10 more are drawn from N (0, 1)
Stats 415 - Exploring Data
Ji Zhu, Michigan Statistics
Exploring Data
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
1
Stats 415 - Exploring Data
Ji Zhu, Michigan Statistics
What is Data Exploration?
Preliminary exploration of the data to be
Stats 415 - Homework 6
Due Wednesday, April 17, 2013
Wine. These data (le wine-train) are the results of a chemical analysis of wines grown in the same region in Italy but derived from three
different cultivars. The three possible classes of the various
HW2 Solution
1. The sh dataset is from a study conducted to distinguish dierent species of sh. There are
87 observations and 7 attributes in total. The attributes are Species, Weight, Length1, Length2,
Length3, Height and Width. The purpose of this analys
STATS 415: Course Information
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
Ji Zhu (University of Michigan)
Information
1/7
Personnel
Ji Zhu
Lecture, TThu 11:30-1pm, 1324 East Hall
Office hour: TBA
Greg Hunt
Discussion, F 10-11am, B760 E
STATS 415: Overview
Ji Zhu
Professor of Statistics
455 West Hall
jizhu@umich.edu
Ji Zhu (University of Michigan)
Overview
1 / 30
What is Data Mining?
Data mining is a multi-disciplinary field of study concerned with the
design of algorithms that allow co
Statistics 408
Homework Set I
Winter 2016
1. What is a system? Provide an example.
A system is a collection of components that come together repeatedly for a purpose. An example
would be an assembly line creating a car.
2. Provide an example of something