# 4/26/17 Stat 101C Discussion Code
# By: Jeffrey Chao
library(ISLR)
# The validation set approach
# First, fit a logistic regression model.
# We'll be using the college data set in the ISLR package.
CART
Classification and
Regression Trees
banknote data (Site Info)
Large Scale Overview
Using trees to do classification
Easy modification to using trees to do regression
Bootstrapping trees to improv
Lecture 9.2
unsupervised learning
context
There is no response variable, and we have
no gold standard to help us determine if
classification is correct or not.
Predictors are numerical (but there ar
neural nets
The simplest neural networks look something like this
X
Yi = f ( (xj wj )
where
1
f (x) =
1 + exp( x)
But thats not a helpful way to picture it. This is better:
Input Layer
Hidden Layer
Ou
Boosting
The story so far
CART, classification and regression trees, increase flexibility over standard regression
approaches
Random forests consist of many randomly generated trees. The randomness is
Feature Selection
Outline
Ridge Regression
Lasso
Principal Components
Ridge Regression
Uses a linear model
y=
0
+
p
X
j Xj
j=1
with parameters chosen to minimize (for fixed value
of lambda):
0
12
p
p
Feature selection
load pgatour2006
outline
Study selecting variables with linear models first
Study linear models fit with Least Squares
Study approaches other than LS
Study non-linear models
download
Practice Problems
Choose one or two to work on. Problems 1 and 2 use the house.csv dataset;
problem 3 uses morehouses.csv. Hint: Before beginning, consider a transformation
of the response variab
5/23/2017
Midterm Solutions Spring 2017
Midterm Solutions Spring 2017
le:/localhost/Users/rob/Dropbox/101CS17/midterm/midterm_solutions_spring_2017.html
1/8
5/23/2017
Midterm Solutions Spring 2017
# P
1
TRANSFORMATIONS TO OBTAIN EQUAL VARIANCE
General method for finding variance-stabilizing transformations: If Y has mean
and variance 2, and if U = f(Y), then by the first order Taylor approximation
Central Limit Theorem
These notes give a heuristic derivation of the central limit theorem. They
are heuristic since we need to be more careful1 with the error estimates as
well as some other points d
556: M ATHEMATICAL S TATISTICS I
A SYMPTOTIC A PPROXIMATIONS A ND T HE D ELTA M ETHOD
To approximate the distribution of elements in sequence of random variables cfw_Xn for large n, we
d
attempt to f
Lecture 4.2
Cross Validation
require(ISLR)
data(Auto)
and maybe
pgatour2006 from CCLE
The dilemma
Youve fit a model, and youve found the MSE for
your data.
But you know that when the testing data come
principal components
and review
lizards.csv, houses.csv and morehouses.csv
The first principal component is a vector that points along
the axis that has the most variation. Scores are projected
onto t
# 05/02/17 Stat 101C Discussion Code
# By: Jeffrey Chao
library(ISLR)
library(boot)
oj = OJ
# Doing resampling with the boot package.
# To use the boot function with something like the median, you are
HW 5
Lindsey London (303769968)
February 7, 2017
Stat 102B TA Session 1
Problem 1
#read in file
library(readr)
class <- read_csv("~/stats 102b/distancedata-hwk.csv")
# Parsed with column specification
Stat 102B -Computation and Optimization in Statistics
Projections
NAME (Last, First): UCLA ID: Date:
J. Sanchez
UCLA Department of Statistics
< xc , yc >
n1
< xc , yc >
Corr(x, y) = cos(angle(xc , y
Syllabus 101C
Spring 2017
Introduction to Statistical Models and Data Mining
Robert Gould, Math Sciences 8945, [email protected]
Office Hours: Thursdays: 2-4pm
In 1
#lecture 4.2
#cross validation
require(ISLR)
data(Auto)
#randomly split data into training and testing (validation)
set.seed(1)
#n=number of observations (length)
n<-dim(Auto)[1]
#Split data randomly
#lecture 4.1
#Bootstrapping: we can use it to estimate "SE=SD"; and "CI"
#estimate "SE":
require(ISLR)
data(Auto)
#estimate the standard error of "median"
bs.est=c()
set.seed(1234)
#prepar to take a r
Lecture 4.1
Comparing classification methods; bootstrapping
Last Time
You used the bechdel data to see if you could
predict whether a film would pass the bechdel test.
You were asked to compare LDA, Q
# 04/12/2017 Discussion Code
# By: Jeffrey Chao
require(ggplot2)
# Here, we will use the American Time-Use Survey data mentioned in the first
# lecture as an example. (This is the "atus copy.csv" file
Lecture 2.2
Classification: logistic regression, Bayes Classifier
knn classificaiton algorithm
Set k equal to an integer.
To classify an observation x_0, find the k nearest
neighbors to x_0 (using euc
upload corealsample.csv into Rstudio
Lecture 3.1
Graphics and Linear Discriminant Analysis
Charles Joseph Minard
(1781-1870)
The greatest statistical graphic ever?
1869
William Playfair, 1759-1823
Fra
lecture 3.2
data:banknote (CCLE under Site Info)
data: library(fivethirtyeight); data(bechdel); help(bechdel)
Determining Errors in the Testing Data
Two approaches
1. Use data to estimate
Train
estima
Chapter 8
The exponential family: Basics
In this chapter we extend the scope of our modeling toolbox to accommodate a variety of
additional data types, including counts, time intervals and rates. We i
Stat 1023 - Computation and Optimization in Statistics J. Sanchez
Homework 1 UCLA Department of Statistics
Instructions
(1) You must turn in this copy with your answers.
(2) Homework must be stapled
Fall 2017 STATS 130 Getting Up to Speed with SPSS, Stata, SAS, and R
Course Information
Instructor: Maria Cha/ Math Sciences 8967/ [email protected]
Lecture: MWF 3-3:50pm / Powell 320C
TA: Yidan