Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Mathematics of Random Forests
1 Probability: Chebyshev inequality
Theorem 1 (Chebyshev inequality): If \ is a random
variable with standard deviation 5 and mean ., then
for any % !,
T l\ .l %
5#
%#
Probability background
Theorem 2 (Bounded convergence th
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
MA 751
Part 4
Measurability and Hilbert Spaces
1. Measurable functions and integrals
Let G be the set of continuous functions on . Let Q
be the set of measurable functions:
Def: The set Q of measurable functions on (or an
interval of ) is the set of point
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
MA 751
Part 3
Infinite Dimensional Vector Spaces
1.
Motivation: Statistical machine learning and
reproducing kernel Hilbert Spaces
Microarray experiment:
Question: Gene expression  when is the DNA in a
gene 1 transcribed and thus expressed (as RNA) in a
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
MA 751
Part 2
Inner products
1. Dot product and angle:
Can show using basic trigonometry in $ d:
v w mvm mwm cos )
where ) is the angle between v and w.
[this can be understood entirely geometrically]
fig. 4
"
"
Ex 1: @ ; A
!
"
) 1%,
Inner product is
@A"
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
MA 751
Part 1
Linear Algebra
1. Linear Algebra:
Recall: = set of real numbers; $ = set of all triples of
real numbers
a
Definition (part 1): A vector space is a collection of
objects Z with the property that addition and scalar
multiplication are defined.
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Machine learning: Boosting
1. Basic definition:
Assume again we have a classification task (e.g. cancer
classification) with data
H x3 C 3 3 .
and C3 ".
Boosting
Assume we have a classifier :x which takes feature
vector x, and classify
C"
C "
if :x !
if
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Bayesian Distributions: Prior and Posterior
We will discuss the details of the derivation of equation (8.27) as a brief summary of the
Bayesian approach to statistics.
The probability model is that, for a given parameter ! , the distribution of a random
d
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Suggestions, PS 3
3.3 (a) Recall
= (XT X )1 Xy .
T
T
We are considering = a as an estimator for = a , along with an
alternative estimator = c y . We assume
T
E ( ) = E (cT y ) = aT .
(why?). Show
!
E ( ) = E (cT y ) = E (cT X( + ) = cT X ,
so
aT = cT X .
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
MA 751
M. Kon
Suggestions  Problem Set 2
2.5 (a) This derivation will be similar to the one done in class, with the additional use of
equation (3.8) in the last line.
First some comments about notation see the Notes on Matrix Notation on the web page
for
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Some notes on our matrix notation
We will introduce some notation here. First consider a random vector
C!
C
y C! C8 X " . Note that in some cases y is considered a fixed vector, but
C8
here we consider it to be random, i.e. that the entries C3 are ra
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Decision Trees and Random Forests
Reference: Leo Breiman,
http:/www.stat.berkeley.edu/~breiman/RandomForests
1. Decision trees
Example (Guerts, Fillet, et al., Bioinformatics 2005):
Patients to be classified: normal vs. diseased
Decision trees
Classificat
Machine and Statistical Learning: Advanced Statistical Methods
MATH 751

Spring 2013
Part 5
Statistical machine learning and kernel methods
Primary references:
John ShaweTaylor and Nello Cristianini, Kernel
Methods for Pattern Analysis
Christopher Burges, A tutorial on support vector
machines for pattern recognition, Data Mining and
Know