This preview shows page 1. Sign up to view the full content.
Unformatted text preview: i - Foundations of Machine Learning page 34 Mean Estimation
Theorem: Let X be a random variable taking values
in [0, 1] and let x0 , . . . , xm be i.i.d. values of X. Deﬁne
the sequence(µm )m∈N by µm+1 = (1 − αm)+ αm xm
with µ0 = x0 , αm ∈ [0, 1] , αm = +∞ and α2 < +∞.
a.s µm −→ E[X ].
− Proof: By the independence assumption, for m ≥ 0 , • Var[µm+1 ] = (1 − αm )2 Var[µm ] + α2 Var[xm ]
≤ (1 − αm )Var[µm ] + α2 .
We have αm → 0 since m≥0 α2 < +∞.
m Mehryar Mohri - Foundations of Machine Learning page 35 Mean Estimation
0 and suppose
• Let >m ≥ N Var[µ ] ≥there exists N ∈ N such that
for all which implies , m . Then, for m ≥ N , Var[µm+1 ] ≤ Var[µm ] − αm + α2 ,
m +N 2
Var[µm+N ] ≤ Var[µN ] − n=N αn + n=N αn ,
→−∞ when m→∞ contradicting Var[µm+N ] ≥ 0 . • Thus, for all N ∈ N there exists m ≥ N such that
0 Var[µm0 ] < . Choose N large enough so that
∀m ≥ N, αm ≤ . Then,
Var[µm0 +1 ] ≤ (1 − αm0) + αm0= . • Therefore, µ m ≤ Mehryar Mohri - Foundations of Machine Learning for all m ≥ m0 (L2 convergence).
page 36 Notes
special case: αm = m .
Strong law of large numbers. • Connection with stochastic approximation. Mehryar Mohri - Foundations of Machine Learning page 37 Stochastic Approximation
Problem: ﬁnd solution of x = H (x) with x ∈ RN while
H (x) cannot be computed, e.g., H not accessible;
i.i.d sample of noisy observations H (xi )+ wi ,
available, i ∈ [1, m] , with E[w] = 0. •
• Idea: algorithm based on iterative technique:
xt+1 = (1 − αt )xt + αt [H (xt ) + wt ]
= xt + αt [H (xt ) + wt − xt ]. • more generally x t+1 Mehryar Mohri - Foundations of Machine Learning = xt + αt D(xt , wt ) . page 3...
View Full Document
- Spring '12
- Machine Learning