36462/36662 Data Mining Homework 5 Solution
TA: Cong Lu
Problem 3
Figure 1 shows the CV error curve and the degree of freedom is 19. Figure 2 shows the
curve computed from the right-hand side of formulae We can hardly tell the difference
between the t
Data Mining: 36-462/36-662
Homework 4 Solutions
Jack Rae
April 15, 2013
Problem 1 [30]
(a) [10]
Recall the least squares criterion
S ( ) = |y X |2
2
(1)
If we let be the minimizer of (1), and = + c v ; where v Rnp satises Xv = 0 and
c R, then we see
y X B
Data Mining: 36-462/36-662
Homework 1 Solutions
Cong Lu
February 6, 2013
Problem 1 (25 points)
a). (5 points) See the attached code for computing dtm1 and dtm2. We compute dtm1
by normalizing each row rst and then scaling each column by adding IDF weights
Data Mining 36-462/36-662
Homework 3 Solution
Cong Lu
March 20, 2013
Problem 1(25 points)
We generate the data cfw_(x, y ) : x = cos(), y = sin(), 2 [0, 6 ] in Figure 1 .
First, apply the traditional PCA on the data and we can get the rst principal compon
Similarity and Invariance; Searching for Similar
Images
36-350: Data Mining
2 September 2009
Reading: Section 14.5 in the textbook.
So far, we have seen how to search and categorize texts by representing them
as feature vectors bag-of-word vectors, approp
Lecture 3 Page Rank
36-350, Data Mining
31 August 2009
The combination of the bag-of-words representation, cosine distance, and
inverse document frequency weighting forms the core of lots of information retrieval systems, because it works pretty well. How
Lecture 2: More Similarity Searching;
Multidimensional Scaling
36-350: Data Mining
28 August 2009
Reading: Principles of Data Mining, sections 14.114.4 (skiping 14.3.3 for
now) and 3.7.
Lets recap where we left similarity searching for documents. We repre
Predicting Quantitative Features: Regression
36-350, Data Mining
6 October 2008
Reading: sections 6.16.3 and 11.1 in Principles of Data Mining.
Optional Reading: chapter 1 of Berk.
Weve already looked at some examples of predictive modeling in the form
of
Additive Models
36-350, Data Mining, Fall 2009
2 November 2009
Readings: Principles of Data Mining, pp. 393395; Berk, ch. 2.
Contents
1 Partial Residuals and Backtting for Linear Models
1
2 Additive Models
3
3 The Curse of Dimensionality
4
4 Example: Cali
Making Better Features
36-350: Data Mining
16 September 2009
Reading: Sections 2.4, 3.4 and 3.5 in the textbook.
Contents
1 Standardizing and Transforming
1
2 Relationships Among Features and Low-Dimensional Summaries
3 Projections: Linear Dimensionality
Lecture 1: Similarity Searching and Information
Retrieval
36-350, Data Mining
26 August 2009
Readings: Principles of Data Mining, ch. 1, and sections 14.1
and 14.3.014.3.1.
One of the fundamental problems with having a lot of data is nding what
youre look
Data Mining: 36-462/36-662
Homework 2 Solutions
Jack Rae
February 19, 2013
Problem 1
(a) The following code implements the ch.index function.
ch.index = function(x,kmax,iter.max=100,nstart=10,algorithm="Lloyd") cfw_
ch = numeric(length=kmax-1)
n = nrow(x)
Dimension reduction 1: Principal component
analysis
Ryan Tibshirani
Data Mining: 36-462/36662
February 5 2013
Optional reading: ISL 10.2, ESL 14.5 Clustering as dimension reduction
We've thought about clustering observations, given features. But
in m
R Environment
-R is an integrated suite of software facilities for data manipulation, calculation and graphical
display.
-effective data handling and storage facility
-a suite of operators for calculations on arrays, in particular matrices
-a large, coher