Optimization for Machine Learning
Edited by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright
Since
its earliest days as a discipline, machine learning has made use of
optimization formulations and algorithms. Likewise, machine learning has
contributed to optimization, driving the development of new
optimization approaches that address the significant challenges presented
by machine learning applications. This cross-fertilization continues to
deepen, producing
a growing literature at the intersection of the two fields while attracting leading researchers to the effort.
Optimization
approaches have enjoyed prominence in machine learning be- cause of
their wide applicability and attractive theoretical properties. While
techniques proposed twenty years and more ago continue to be refined, the
increased complexity, size, and variety of today’s machine learning
models demand a principled reassessment of existing assumptions and
techniques.
This
book makes a start toward such a reassessment. Besides describing the
resurgence in novel contexts of established frameworks such as first-
order methods, stochastic approximations, convex relaxations,
interior-point methods, and proximal methods, the book devotes
significant attention to newer themes such as regularized optimization,
robust optimization, a vari- ety of gradient and subgradient methods,
and the use of splitting techniques and second-order information. We aim
to provide an up-to-date account of the optimization techniques useful
to machine learning — those that are established and prevalent, as well
as those that are rising in importance.
To
illustrate our aim more concretely, we review in Section 1.1 and 1.2
two major paradigms that provide focus to research at the confluence of
machine learning and optimization: support vector machines (SVMs) and
regularized optimization. Our brief review charts the importance of
these problems and discusses how both connect to the later chapters of
this book.
We
then discuss other themes — applications, formulations, and algorithms —
that recur throughout the book, outlining the contents of the various
chapters and the relationship between them.
Audience.
This book is targeted to a broad audience of researchers and students
in the machine learning and optimization communities; but the material
covered is widely applicable and should be valuable to researchers in
other related areas too. Some chapters have a didactic flavor, covering
recent advances at a level accessible to anyone having a passing
acquaintance with tools and techniques in linear algebra, real analysis,
and probability.
Other
chapters are more specialized, containing cutting-edge material. We
hope that from the wide range of work presented in the book, researchers
will gain a broader perspective of the field, and that new connections
will be made and new ideas sparked.
For
background relevant to the many topics discussed in this book, we refer
to the many good textbooks in optimization, machine learning, and
related subjects.We mention in particular Bertsekas (1999) and Nocedal
andWright (2006) for optimization over continuous variables, and Ben-Tal
et al. (2009) for robust optimization. In machine learning, we refer
for background to Vapnik (1999), Sch¨ olkopf and Smola (2002),
Christianini and Shawe-Taylor (2000), and Hastie et al. (2009). Some
fundamentals of graphical models and the use of optimization therein can
be found in Wainwright and Jordan (2008) and Koller and Friedman
(2009).
1 Introduction: Optimization and Machine Learning
S. Sra, S. Nowozin, and S. J. Wright 1
1.1 Support Vector Machines 2
1.2 RegularizedOptimization 7
1.3 SummaryoftheChapters 11
1.4 References. 15
2 Convex Optimization with Sparsity-Inducing Norms
F. Bach, R. Jenatton, J. Mairal, and G. Obozinski 19
2.1 Introduction 19
2.2 GenericMethods . 26
2.3 ProximalMethods 27
2.4 (Block) Coordinate Descent Algorithms 32
2.5 Reweighted-[1]2 Algorithms 34
2.6 Working-SetMethods 36
2.7 QuantitativeEvaluation . 40
2.8 Extensions. 47
2.9 Conclusion 48
2.10References. 49
3 Interior-Point Methods for Large-Scale Cone Programming
M. Andersen, J. Dahl, Z. Liu, and L. Vandenberghe 55
3.1 Introduction 56
3.2 Primal-DualInterior-PointMethods 60
3.3 LinearandQuadraticProgramming 64
3.4 Second-OrderConeProgramming 71
3.5 SemidefiniteProgramming 74
3.6 Conclusion 79
3.7 References. 79
4 Incremental Gradient, Subgradient, and Proximal Methods
for Convex Optimization: A Survey
D. P. Bertsekas 85
4.1 Introduction 86
4.2 Incremental Subgradient-Proximal Methods 98
4.3 ConvergenceforMethodswithCyclicOrder
4.4 ConvergenceforMethodswithRandomizedOrder
4.5 SomeApplications
4.6 Conclusions
4.7 References.
5 First-Order Methods for Nonsmooth Convex Large-Scale
Optimization, I: General Purpose Methods
A. Juditsky and A. Nemirovski
5.1 Introduction
5.2 Mirror Descent Algorithm: Minimizing over a Simple Set . . .
5.3 ProblemswithFunctionalConstraints .
5.4 MinimizingStronglyConvexFunctions.
5.5 MirrorDescentStochasticApproximation .
5.6 Mirror Descent for Convex-Concave Saddle-Point Problems .
5.7 Setting up a Mirror Descent Method
5.8 NotesandRemarks
5.9 References.
6 First-Order Methods for Nonsmooth Convex Large-Scale
Optimization, II: Utilizing Problem’s Structure
A. Juditsky and A. Nemirovski
6.1 Introduction
6.2 Saddle-Point Reformulations of Convex Minimization Problems
6.3 Mirror-ProxAlgorithm
6.4 Accelerating the Mirror-Prox Algorithm
6.5 Accelerating First-Order Methods by Randomization .
6.6 NotesandRemarks
6.7 References.
7 Cutting-Plane Methods in Machine Learning
V. Franc, S. Sonnenburg, and T. Werner
7.1 IntroductiontoCutting-planeMethods
7.2 RegularizedRiskMinimization .
7.3 MultipleKernelLearning
7.4 MAPInferenceinGraphicalModels
7.5 References.
8 Introduction to Dual Decomposition for Inference
D. Sontag, A. Globerson, and T. Jaakkola
8.1 Introduction
8.2 MotivatingApplications .
8.3 Dual Decomposition and Lagrangian Relaxation .
8.4 Subgradient Algorithms .
8.5 Block Coordinate Descent Algorithms .
8.6 RelationstoLinearProgrammingRelaxations.
8.7 Decoding:FindingtheMAPAssignment
8.8 Discussion
8.10References.
9 Augmented Lagrangian Methods for Learning, Selecting,
and Combining Features
R. Tomioka, T. Suzuki, and M. Sugiyama
9.1 Introduction
9.2 Background
9.3 ProximalMinimizationAlgorithm .
9.4 Dual Augmented Lagrangian (DAL) Algorithm
9.5 Connections
9.6 Application
9.7 Summary .
9.9 References.
10 The Convex Optimization Approach to Regret
Minimization
E. Hazan
10.1Introduction
10.2TheRFTLAlgorithmandItsAnalysis.
10.3The“Primal-Dual”Approach
10.4ConvexityofLossFunctions.
10.5 Recent Applications . . .
10.6References.
11 Projected Newton-type Methods in Machine Learning
M. Schmidt, D. Kim, and S. Sra
11.1Introduction
11.2ProjectedNewton-typeMethods
11.3Two-MetricProjectionMethods
11.4InexactProjectionMethods.
11.5TowardNonsmoothObjectives .
11.6SummaryandDiscussion
11.7References.
12 Interior-Point Methods in Machine Learning
J. Gondzio
12.1Introduction
12.2Interior-PointMethods:Background
12.3PolynomialComplexityResult .
12.4Interior-PointMethodsforMachineLearning .
12.5 Accelerating Interior-Point Methods
12.6Conclusions
12.7References.
13 The Tradeoffs of Large-Scale Learning
L. Bottou and O. Bousquet
13.1Introduction
13.2ApproximateOptimization .
13.3AsymptoticAnalysis .
13.4Experiments
13.5Conclusion
13.6References.
14 Robust Optimization in Machine Learning
C. Caramanis, S. Mannor, and H. Xu
14.1Introduction
14.2BackgroundonRobustOptimization
14.3 Robust Optimization and Adversary Resistant Learning . . .
14.4RobustOptimizationandRegularization
14.5RobustnessandConsistency.
14.6RobustnessandGeneralization .
14.7Conclusion
14.8References.
15 Improving First and Second-Order Methods by Modeling
Uncertainty
N.LeRoux,Y.Bengio,andA.Fitzgibbon
15.1Introduction
15.2OptimizationVersusLearning
15.3BuildingaModeloftheGradients .
15.4 The Relative Roles of the Covariance and the Hessian
15.5ASecond-OrderModeloftheGradients
15.6 An Efficient Implementation of Online Consensus Gradient:
TONGA
15.7Experiments
15.8Conclusion
15.9References.
16 Bandit View on Noisy Optimization
J.-Y. Audibert, S. Bubeck, and R. Munos
16.1Introduction
16.2ConcentrationInequalities
16.3DiscreteOptimization
16.4OnlineOptimization .
16.5References.
17 Optimization Methods for Sparse Inverse Covariance
Selection
K. Scheinberg and S. Ma
17.1Introduction
17.2 Block Coordinate Descent Methods .
17.3AlternatingLinearizationMethod
17.4RemarksonNumericalPerformance
17.5References.
18 A Pathwise Algorithm for Covariance Selection
V. Krishnamurthy, S. D. Ahipa¸ sao˘ glu, and A. d’Aspremont
18.1 Introduction
18.2 CovarianceSelection .
18.3 Algorithm.
18.4 NumericalResults
18.5 OnlineCovarianceSelection.
18.6References.
Nhận xét
Đăng nhận xét