Course Hero Logo

IE598-lecture13-case-study-on-logistic-regression.pdf - IE...

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 1 - 3 out of 9 pages.

IE 598: Big Data OptimizationFall 2016Lecture 13: Case Study on Logistic Regression – October 06Lecturer: Niao HeScriber: Jialin SongOverview:In this lecture we summarize algorithms we have covered so far for smooth convex optimizationproblems. To better understanding the pros and cons of different algorithms, we conduct a case study onlogistic regression model to investigate the their empirical performances.13.1SummaryWe summarize performance of algorithms on smooth convex optimization (including Interior Point Method(IPM), Gradient Descent (GD), Accelerated Gradient Descent (AGD), Projection Gradient Descent (PGD),Frank-Wolfe Algorithm (FW), Block Coordinated Gradient Descent (BCGD)) in terms of convergence rate,iteration complexity and iteration cost in Table 13.1.Table 13.1: Performance of algorithms on structured convex optimization and smooth convex optimizationAlgorithmConvergence RateIteration ComplexityIteration CostStructuredConvex Optimization(LP/SOCP/DCP)IPMO(νexp(-tν))O(νlog(1))one Newton stepConvexStrongly ConvexConvexStrongly ConvexSmoothConvex OptimizationGDO(LD2t)O((1-κ1+κ)2t)O(LD2)Lμlog(1)one gradientAGDO(LD2t2)O((1-κ1+κ)2t)O(LD)O(rLμlog(1))one gradientPGDO(LD2t)O((1-μL)t)O(LD2)Lμlog(1)one gradientone projectionFWO(LD2t)O(LD2)one gradientone linear minimizationBCGDO(bLD2)tO((1-μbL)t)O(bLD2)O(bLμlog(1))O(1)-block gradient (randomized)O(b)-block gradient (cyclic)O(b)-block gradient (Gauss Southwell)Notations:L is L-smooth constantμis strong convexity parameterκ=Lμibs the condition numberDis eitherkx0-x*k2or diameter of compact set X.13.2Logistic RegressionIn this section, we first introduce the basic fundamentals of classification models and then formulate logisticregression model.13-1
Lecture 13: Case Study on Logistic Regression – October 0613-213.2.1Preliminaries on Classification ModelWe consider a binary classification problem: Given (x1, y1),(x2, y2), ...,(xn, yn),xiRd,yi∈ {-1,1}. Notethat inputxiis called feature vector, outputyiis label. Suppose there is a family of functionf(x, w), binaryclassification aims to specify the bestwsuch that x and y are related.Based on the input feature vectors and certain type of function we adopt, the output can be predicted basedon certain prediction rules. For binary classification problem, the prediction rule can be shown as follows:y=1,f(x, w)0-1, f(x, w)<0An error term can be defined as follows to denote whether the output is successfully predicted or not:error =1,yf(x, w)<00,yf(x, w)0Obviously our goal is to find the best parameterwin order to minimize the total error in the process ofprediction. The optimization step in the binary classification model can be extracted below:

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 9 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Summer
Professor
NoProfessor
Tags
Econometrics, Least Squares, Regression Analysis, Gradient descent, Loss function

Newly uploaded documents

Show More

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture