*We can bound the mean by passing to the variance:E[D(ω1:m)]≤pE[D(ω1:m)2][Jensen’s inequality](425)=vuuutE1mmXi=1(α(ωi)φωi-f*)2[expand](426)=vuut1m2mXi=1Ekα(ωi)φωi-f*k2[variance of i.i.d. sum](427)≤C√m[use|α(ωi)| ≤C].(428)*Applying McDiarmid’s inequality (Theorem8), we get thatPD(ω1:m)≥C√m+≤exp-22∑mi=1(2C/m)2.(429)Rearranging yields the theorem.–Remark: the definition ofαhere differs from the Rahimi/Recht paper.–Corollary:*Suppose we had a loss function‘(y, v) which is 1-Lipschitz in the secondargument. (e.g., the hinge loss). Define the expected risk in the usual way:L(f)def=E(x,y)∼p*[‘(y, f(x))].(430)Then the approximation ratio is bounded:L(ˆf)-L(f*)≤E[|‘(y,ˆf(x))-‘(y, f*(x))|][definition, add| · |](431)≤E[|ˆf(x)-f*(x)|][fixy,‘is Lipschitz](432)≤ kˆf-f*k[concavity of√·].(433)–So far, we have analyzed approximation error due to having a finitem, but as-suming an infinite amount of data. Separately, there is the estimation error dueto havingndata points:L(ˆfERM)-L(ˆf)≤OpC√n,(434)whereˆfERMminimzes the empirical risk over the random hypothesis classˆF. So,the total error, which includes approximation error and estimation error isL(ˆfERM)-L(f*) =OpC√n+C√m.(435)134