Correlation and Regression Example and Homework
Example: Test scores and hours watching TV the number of hours 12 students watched TV during the
weekend and the scores of each student who took a test the following Monday.
Hours spent
watching TV
x
0
1
Tes
Comparing the Mean and Median
Symmetric Distributions (mean = median)
Bell- Shaped Curve
U Shaped Curve:
Uniform
Skewed Distributions When one of the tails is longer
Right skewed: Median < Mean
Left skewed: Median > Mean
2
1
PCA
PCA can be motivated and understood in several different (yet related) ways: 1) The first is
to (linearly) transform correlated variables into a set of uncorrelated ones. Having uncorrelated
predictors is useful in dealing with multicollinearity in
_
Problem 1: Missing rate (in percentage) for variables
Using the attached R code, I looked at the missing rates for the variables. Those are as
follows:
Variable
We will use the apriori algorithm to create association rules for the Adult data set available
on the UCI Machine Learning Repository (http:/archive.ics.uci.edu/ml/datasets/Adult). The
data is adult.data and the attr
Principal Component Analysis (PCA) is a method of dimension reduction. This is
not directly related to prediction problem, but several regression methods are directly
dependant on it. Now a motivation for dimension reduction is being set up.
Notation
The
Downloading and Installing R from CRAN
R, a free version of S o
INTRODUCTION
Data mining and statistical learning essentially pro
# #
# R-code for principal component analysis (PCA)
# Some Codes were Modified from Dr. Marloes Maathuis's Class Notes:
# http:/stat.ethz.ch/~maathuis/
# and the text An Introduction to Statistical Learning
# http:/www-bcf.usc.edu/~gareth/ISL/Chapter%2010
If you are not familiar with R, please watch these videos by yourself.
1) Introduction to R
https:/www.youtube.com/watch?v=BlI3OVztQfM
2) Writing R Functions
https:/www.youtube.com/watch?v=w6nYISxAJmA
3) Linear Regression in R
https:/www.youtube.com/watch
Clustering of variables around latent components
Ricco RAKOTOMALALA
Ricco Rakotomalala
Tutoriels Tanagra - http:/tutoriels-data-mining.blogspot.fr/
1
Overview
1.
Clustering variables
2.
Correlations, distances and latent variables
3.
HAC based on latent v