|
|
Line 1: |
Line 1: |
| '''Subgradient methods''' are [[iterative method]]s for solving [[convex optimization|convex minimization]] problems. Originally developed by [[Naum Z. Shor]] and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, subgradient methods for unconstrained problems use the same search direction as the method of [[gradient descent|steepest descent]].
| | Hi there, I am Felicidad Oquendo. Alabama has always been his home and his family members enjoys it. Camping is some thing that I've done for years. Bookkeeping is what he does.<br><br>My homepage [http://Wtfesports.altervista.org/index.php?mod=users&action=view&id=8092 extended auto warranty] |
| | |
| Subgradient methods are slower than Newton's method when applied to minimize twice continuously differentiable convex functions. However, Newton's method fails to converge on problems that have non-differentiable kinks.
| |
| | |
| In recent years, some [[interior-point methods]] have been suggested for convex minimization problems, but subgradient projection methods and related bundle methods of descent remain competitive. For convex minimization problems with very large number of dimensions, subgradient-projection methods are suitable, because they require little storage.
| |
| | |
| Subgradient projection methods are often applied to large-scale problems with decomposition techniques. Such decomposition methods often allow a simple distributed method for a problem.
| |
| | |
| ==Classical subgradient rules==
| |
| | |
| Let <math>f:\mathbb{R}^n \to \mathbb{R}</math> be a [[convex function]] with domain <math>\mathbb{R}^n</math>. A classical subgradient method iterates
| |
| :<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
| |
| where <math>g^{(k)}</math> denotes a [[subgradient]] of <math> f \ </math> at <math>x^{(k)} \ </math>. If <math>f \ </math> is differentiable, then its only subgradient is the gradient vector <math>\nabla f</math> itself.
| |
| It may happen that <math>-g^{(k)}</math> is not a descent direction for <math>f \ </math> at <math>x^{(k)}</math>. We therefore maintain a list <math>f_{\rm{best}} \ </math> that keeps track of the lowest objective function value found so far, i.e.
| |
| :<math>f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.</math>
| |
| which is resultant convex optimized.
| |
| | |
| ===Step size rules===
| |
| | |
| Many different types of step-size rules are used by subgradient methods. This article notes five classical step-size rules for which convergence [[mathematical proof|proof]]s are known:
| |
| | |
| *Constant step size, <math>\alpha_k = \alpha.</math>
| |
| *Constant step length, <math>\alpha_k = \gamma/\lVert g^{(k)} \rVert_2</math>, which gives <math>\lVert x^{(k+1)} - x^{(k)} \rVert_2 = \gamma.</math>
| |
| *Square summable but not summable step size, i.e. any step sizes satisfying
| |
| :<math>\alpha_k\geq0,\qquad\sum_{k=1}^\infty \alpha_k^2 < \infty,\qquad \sum_{k=1}^\infty \alpha_k = \infty.</math>
| |
| *Nonsummable diminishing, i.e. any step sizes satisfying
| |
| :<math>\alpha_k\geq0,\qquad \lim_{k\to\infty} \alpha_k = 0,\qquad \sum_{k=1}^\infty \alpha_k = \infty.</math>
| |
| *Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2</math>, where
| |
| :<math>\gamma_k\geq0,\qquad \lim_{k\to\infty} \gamma_k = 0,\qquad \sum_{k=1}^\infty \gamma_k = \infty.</math>
| |
| For all five rules, the step-sizes are determined "off-line", before the method is iterated; the step-sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step-size rules used for descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, where step-sizes typically depend on the current point and the current search-direction.
| |
| | |
| ===Convergence results===
| |
| | |
| For constant step-length and scaled subgradients having [[Euclidean norm]] equal to one, the subgradient method converges to an arbitrarily close approximation to the minimum value, that is
| |
| :<math>\lim_{k\to\infty} f_{\rm{best}}^{(k)} - f^* <\epsilon</math> by a result of [[Naum Z. Shor|Shor]].<ref>
| |
| The approximate convergence of the constant step-size (scaled) subgradient method is stated as Exercise 6.3.14(a) in [[Dimitri P. Bertsekas|Bertsekas]] (page 636): {{cite book
| |
| | last = Bertsekas
| |
| | first = Dimitri P.
| |
| | authorlink = Dimitri P. Bertsekas
| |
| | title = Nonlinear Programming
| |
| | edition = Second
| |
| | publisher = Athena Scientific
| |
| | year = 1999
| |
| | location = Cambridge, MA.
| |
| | isbn = 1-886529-00-0
| |
| }} On page 636, Bertsekas attributes this result to Shor: {{cite book
| |
| | last = Shor
| |
| | first = Naum Z.
| |
| | authorlink = Naum Z. Shor
| |
| | title = Minimization Methods for Non-differentiable Functions
| |
| | publisher = [[Springer-Verlag]]
| |
| | isbn = 0-387-12763-1
| |
| | year = 1985
| |
| }}
| |
| </ref> | |
| These classical subgradient methods have poor performance and are no longer recommended for general use.<ref name="Lem"/><ref name="KLL"/>
| |
| | |
| ==Subgradient-projection & bundle methods==
| |
| During the 1970s, [[Claude Lemaréchal]] and Phil. Wolfe proposed "bundle methods" of descent for problems of convex minimization.<ref>
| |
| {{cite book
| |
| | last = Bertsekas
| |
| | first = Dimitri P.
| |
| | authorlink = Dimitri P. Bertsekas
| |
| | title = Nonlinear Programming
| |
| | edition = Second
| |
| | publisher = Athena Scientific
| |
| | year = 1999
| |
| | location = Cambridge, MA.
| |
| | isbn = 1-886529-00-0
| |
| }}
| |
| </ref> Their modern versions and full convergence analysis were provided by Kiwiel.
| |
| <ref>
| |
| {{cite book|last=Kiwiel|first=Krzysztof|title=Methods of Descent for Nondifferentiable Optimization|publisher=[[Springer Verlag]]|location=Berlin|year=1985|pages=362|isbn=978-3540156420 |mr=0797754}}
| |
| </ref> Contemporary bundle-methods often use "[[level set|level]] control" rules for choosing step-sizes, developing techniques from the "subgradient-projection" method of Boris T. Polyak (1969). However, there are problems on which bundle methods offer little advantage over subgradient-projection methods.<ref name="Lem">
| |
| {{cite book| last=Lemaréchal|first=Claude|authorlink=Claude Lemaréchal|chapter=Lagrangian relaxation|pages=112–156|title=Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000|editor=Michael Jünger and Denis Naddef|series=Lecture Notes in Computer Science|volume=2241|publisher=Springer-Verlag| location=Berlin|year=2001|isbn=3-540-42877-1|mr=1900016|doi=10.1007/3-540-45586-8_4|ref=harv}}</ref><ref name="KLL">
| |
| {{cite journal|last1=Kiwiel|first1=Krzysztof C.|last2=Larsson |first2=Torbjörn|last3=Lindberg|first3=P. O.|title=Lagrangian relaxation via ballstep subgradient methods|url=http://mor.journal.informs.org/cgi/content/abstract/32/3/669 |journal=Mathematics of Operations Research|volume=32|year=2007|number=3|pages=669–686|month=August|mr=2348241|doi=10.1287/moor.1070.0261|ref=harv}}
| |
| </ref>
| |
| | |
| ==Constrained optimization==
| |
| ===Projected subgradient===
| |
| One extension of the subgradient method is the '''projected subgradient method''', which solves the constrained optimization problem
| |
| :minimize <math>f(x) \ </math> subject to
| |
| :<math>x\in\mathcal{C}</math>
| |
| | |
| where <math>\mathcal{C}</math> is a convex set. The projected subgradient method uses the iteration
| |
| | |
| :<math>x^{(k+1)} = P \left(x^{(k)} - \alpha_k g^{(k)} \right) </math>
| |
| | |
| where <math>P</math> is projection on <math>\mathcal{C}</math> and <math>g^{(k)}</math> is any subgradient of <math>f \ </math> at <math>x^{(k)}.</math>
| |
| | |
| ===General constraints===
| |
| | |
| The subgradient method can be extended to solve the inequality constrained problem
| |
| | |
| :minimize <math>f_0(x) \ </math> subject to
| |
| :<math>f_i (x) \leq 0,\quad i = 1,\dots,m</math>
| |
| | |
| where <math>f_i</math> are convex. The algorithm takes the same form as the unconstrained case
| |
| | |
| :<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
| |
| | |
| where <math>\alpha_k>0</math> is a step size, and <math>g^{(k)}</math> is a subgradient of the objective or one of the constraint functions at <math>x. \ </math> Take
| |
| | |
| :<math>g^{(k)} =
| |
| \begin{cases}
| |
| \partial f_0 (x) & \text{ if } f_i(x) \leq 0 \; \forall i = 1 \dots m \\
| |
| \partial f_j (x) & \text{ for some } j \text{ such that } f_j(x) > 0
| |
| \end{cases}</math>
| |
| | |
| where <math>\partial f</math> denotes the subdifferential of <math>f \ </math>. If the current point is feasible, the algorithm uses an objective subgradient; if the current point is infeasible, the algorithm chooses a subgradient of any violated constraint.
| |
| | |
| ==References== | |
| <references/>
| |
| | |
| ==Further reading==
| |
| * {{cite book
| |
| | last = Bertsekas
| |
| | first = Dimitri P.
| |
| | authorlink = Dimitri P. Bertsekas
| |
| | title = Nonlinear Programming
| |
| | publisher = Athena Scientific
| |
| | year = 1999
| |
| | location = Cambridge, MA.
| |
| | isbn = 1-886529-00-0
| |
| }}
| |
| | |
| * {{cite book
| |
| | last = Shor
| |
| | first = Naum Z.
| |
| | authorlink = Naum Z. Shor
| |
| | title = Minimization Methods for Non-differentiable Functions
| |
| | publisher = [[Springer-Verlag]]
| |
| | isbn = 0-387-12763-1
| |
| | year = 1985
| |
| }}
| |
| | |
| * {{cite book|last=[[Andrzej Piotr Ruszczyński|Ruszczyński]]|first=Andrzej|title=Nonlinear Optimization|publisher=[[Princeton University Press]]|location=Princeton, NJ|year=2006|pages=xii+454|isbn=978-0691119151 |mr=2199043}}
| |
| | |
| ==External links==
| |
| * [http://www.stanford.edu/class/ee364a/ EE364A] and [http://www.stanford.edu/class/ee364b/ EE364B], Stanford's convex optimization course sequence.
| |
| {{optimization algorithms|convex}}
| |
| | |
| [[Category:Mathematical optimization]]
| |
| [[Category:Convex optimization]]
| |