Bán linh kiện đồng hồ| Parts of watch

Optimization for Machine Learning

Edited by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright

Since its earliest days as a discipline, machine learning has made use of optimization formulations and algorithms. Likewise, machine learning has contributed to optimization, driving the development of new optimization approaches that address the signiﬁcant challenges presented by machine learning applications. This cross-fertilization continues to deepen, producing

a growing literature at the intersection of the two ﬁelds while attracting leading researchers to the eﬀort.

Optimization approaches have enjoyed prominence in machine learning be- cause of their wide applicability and attractive theoretical properties. While techniques proposed twenty years and more ago continue to be reﬁned, the increased complexity, size, and variety of today’s machine learning models demand a principled reassessment of existing assumptions and techniques.

This book makes a start toward such a reassessment. Besides describing the resurgence in novel contexts of established frameworks such as ﬁrst- order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods, the book devotes signiﬁcant attention to newer themes such as regularized optimization, robust optimization, a vari- ety of gradient and subgradient methods, and the use of splitting techniques and second-order information. We aim to provide an up-to-date account of the optimization techniques useful to machine learning — those that are established and prevalent, as well as those that are rising in importance.

To illustrate our aim more concretely, we review in Section 1.1 and 1.2 two major paradigms that provide focus to research at the conﬂuence of machine learning and optimization: support vector machines (SVMs) and regularized optimization. Our brief review charts the importance of these problems and discusses how both connect to the later chapters of this book.

We then discuss other themes — applications, formulations, and algorithms — that recur throughout the book, outlining the contents of the various chapters and the relationship between them.

Audience. This book is targeted to a broad audience of researchers and students in the machine learning and optimization communities; but the material covered is widely applicable and should be valuable to researchers in other related areas too. Some chapters have a didactic ﬂavor, covering recent advances at a level accessible to anyone having a passing acquaintance with tools and techniques in linear algebra, real analysis, and probability.

Other chapters are more specialized, containing cutting-edge material. We hope that from the wide range of work presented in the book, researchers will gain a broader perspective of the ﬁeld, and that new connections will be made and new ideas sparked.

For background relevant to the many topics discussed in this book, we refer to the many good textbooks in optimization, machine learning, and related subjects.We mention in particular Bertsekas (1999) and Nocedal andWright (2006) for optimization over continuous variables, and Ben-Tal et al. (2009) for robust optimization. In machine learning, we refer for background to Vapnik (1999), Sch¨ olkopf and Smola (2002), Christianini and Shawe-Taylor (2000), and Hastie et al. (2009). Some fundamentals of graphical models and the use of optimization therein can be found in Wainwright and Jordan (2008) and Koller and Friedman (2009).

1 Introduction: Optimization and Machine Learning

S. Sra, S. Nowozin, and S. J. Wright 1

1.1 Support Vector Machines 2

1.2 RegularizedOptimization 7

1.3 SummaryoftheChapters 11

1.4 References. 15

2 Convex Optimization with Sparsity-Inducing Norms

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski 19

2.1 Introduction 19

2.2 GenericMethods . 26

2.3 ProximalMethods 27

2.4 (Block) Coordinate Descent Algorithms 32

2.5 Reweighted-[1]2 Algorithms 34

2.6 Working-SetMethods 36

2.7 QuantitativeEvaluation . 40

2.8 Extensions. 47

2.9 Conclusion 48

2.10References. 49

3 Interior-Point Methods for Large-Scale Cone Programming

M. Andersen, J. Dahl, Z. Liu, and L. Vandenberghe 55

3.1 Introduction 56

3.2 Primal-DualInterior-PointMethods 60

3.3 LinearandQuadraticProgramming 64

3.4 Second-OrderConeProgramming 71

3.5 SemideﬁniteProgramming 74

3.6 Conclusion 79

3.7 References. 79

4 Incremental Gradient, Subgradient, and Proximal Methods

for Convex Optimization: A Survey

D. P. Bertsekas 85

4.1 Introduction 86

4.2 Incremental Subgradient-Proximal Methods 98

4.3 ConvergenceforMethodswithCyclicOrder

4.4 ConvergenceforMethodswithRandomizedOrder

4.5 SomeApplications

4.6 Conclusions

4.7 References.

5 First-Order Methods for Nonsmooth Convex Large-Scale

Optimization, I: General Purpose Methods

A. Juditsky and A. Nemirovski

5.1 Introduction

5.2 Mirror Descent Algorithm: Minimizing over a Simple Set . . .

5.3 ProblemswithFunctionalConstraints .

5.4 MinimizingStronglyConvexFunctions.

5.5 MirrorDescentStochasticApproximation .

5.6 Mirror Descent for Convex-Concave Saddle-Point Problems .

5.7 Setting up a Mirror Descent Method

5.8 NotesandRemarks

5.9 References.

6 First-Order Methods for Nonsmooth Convex Large-Scale

Optimization, II: Utilizing Problem’s Structure

A. Juditsky and A. Nemirovski

6.1 Introduction

6.2 Saddle-Point Reformulations of Convex Minimization Problems

6.3 Mirror-ProxAlgorithm

6.4 Accelerating the Mirror-Prox Algorithm

6.5 Accelerating First-Order Methods by Randomization .

6.6 NotesandRemarks

6.7 References.

7 Cutting-Plane Methods in Machine Learning

V. Franc, S. Sonnenburg, and T. Werner

7.1 IntroductiontoCutting-planeMethods

7.2 RegularizedRiskMinimization .

7.3 MultipleKernelLearning

7.4 MAPInferenceinGraphicalModels

7.5 References.

8 Introduction to Dual Decomposition for Inference

D. Sontag, A. Globerson, and T. Jaakkola

8.1 Introduction

8.2 MotivatingApplications .

8.3 Dual Decomposition and Lagrangian Relaxation .

8.4 Subgradient Algorithms .

8.5 Block Coordinate Descent Algorithms .

8.6 RelationstoLinearProgrammingRelaxations.

8.7 Decoding:FindingtheMAPAssignment

8.8 Discussion

8.10References.

9 Augmented Lagrangian Methods for Learning, Selecting,

and Combining Features

R. Tomioka, T. Suzuki, and M. Sugiyama

9.1 Introduction

9.2 Background

9.3 ProximalMinimizationAlgorithm .

9.4 Dual Augmented Lagrangian (DAL) Algorithm

9.5 Connections

9.6 Application

9.7 Summary .

9.9 References.

10 The Convex Optimization Approach to Regret

Minimization

E. Hazan

10.1Introduction

10.2TheRFTLAlgorithmandItsAnalysis.

10.3The“Primal-Dual”Approach

10.4ConvexityofLossFunctions.

10.5 Recent Applications . . .

10.6References.

11 Projected Newton-type Methods in Machine Learning

M. Schmidt, D. Kim, and S. Sra

11.1Introduction

11.2ProjectedNewton-typeMethods

11.3Two-MetricProjectionMethods

11.4InexactProjectionMethods.

11.5TowardNonsmoothObjectives .

11.6SummaryandDiscussion

11.7References.

12 Interior-Point Methods in Machine Learning

J. Gondzio

12.1Introduction

12.2Interior-PointMethods:Background

12.3PolynomialComplexityResult .

12.4Interior-PointMethodsforMachineLearning .

12.5 Accelerating Interior-Point Methods

12.6Conclusions

12.7References.

13 The Tradeoﬀs of Large-Scale Learning

L. Bottou and O. Bousquet

13.1Introduction

13.2ApproximateOptimization .

13.3AsymptoticAnalysis .

13.4Experiments

13.5Conclusion

13.6References.

14 Robust Optimization in Machine Learning

C. Caramanis, S. Mannor, and H. Xu

14.1Introduction

14.2BackgroundonRobustOptimization

14.3 Robust Optimization and Adversary Resistant Learning . . .

14.4RobustOptimizationandRegularization

14.5RobustnessandConsistency.

14.6RobustnessandGeneralization .

14.7Conclusion

14.8References.

15 Improving First and Second-Order Methods by Modeling

Uncertainty

N.LeRoux,Y.Bengio,andA.Fitzgibbon

15.1Introduction

15.2OptimizationVersusLearning

15.3BuildingaModeloftheGradients .

15.4 The Relative Roles of the Covariance and the Hessian

15.5ASecond-OrderModeloftheGradients

15.6 An Eﬃcient Implementation of Online Consensus Gradient:

TONGA

15.7Experiments

15.8Conclusion

15.9References.

16 Bandit View on Noisy Optimization

J.-Y. Audibert, S. Bubeck, and R. Munos

16.1Introduction

16.2ConcentrationInequalities

16.3DiscreteOptimization

16.4OnlineOptimization .

16.5References.

17 Optimization Methods for Sparse Inverse Covariance

Selection

K. Scheinberg and S. Ma

17.1Introduction

17.2 Block Coordinate Descent Methods .

17.3AlternatingLinearizationMethod

17.4RemarksonNumericalPerformance

17.5References.

18 A Pathwise Algorithm for Covariance Selection

V. Krishnamurthy, S. D. Ahipa¸ sao˘ glu, and A. d’Aspremont

18.1 Introduction

18.2 CovarianceSelection .

18.3 Algorithm.

18.4 NumericalResults

18.5 OnlineCovarianceSelection.

18.6References.

Bán linh kiện đồng hồ| Parts of watch

Tìm kiếm Blog này

Optimization for Machine Learning

Edited by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright

Nhận xét

Đăng nhận xét

Bài đăng phổ biến từ blog này

Đề tài: Hoạt động marketing của công ty cổ phần bút bi Thiên Long

SÁCH TRUNG QUỐC DANH PHƯƠNG TOÀN TẬP

VẬN TẢI VÀ GIAO NHẬN TRONG NGOẠI THƯƠNG