|
|
Line 1: |
Line 1: |
| [[Image:Newton optimization vs grad descent.svg|right|thumb|A comparison of [[gradient descent]] (green) and Newton's method (red) for minimizing a function (with small step sizes). Newton's method uses curvature information to take a more direct route.]] | | Bryan can be a celebrity within the creating [http://lukebryantickets.asiapak.net chip tickets] along with the career progress 1st second to his third studio recording, & , may be the proof. He burst on the scene in 2007 along with his funny blend of straight down-home ease of access, film superstar good appears and lines, is scheduled t in the key way. The newest recording Top on the nation graph or chart and #2 on the put maps, producing it the second top very first at that time of 2014 to get a country designer. <br><br> |
| <!--__NOTOC__--> | |
|
| |
|
| In [[calculus]], [[Newton's method]] is an [[iterative method]] for finding [[Zero of a function|zeros]] (solutions to [[equation]]s of the form <math>f(x)=0</math>). In [[Mathematical optimization|optimization]], a procedure also called Newton's method is applied to the [[derivative]] of a function to find its zeros (solutions to <math>f'(x)=0</math>), also known as the [[stationary point]]s of the [[differentiable function]] <math>f(x)</math>.
| | The boy of the , is aware perseverance and willpower are key elements with regards to a successful career- . His very first album, Continue to be Me, produced the very best strikes “All My Girlfriends Say” and “Country Gentleman,” whilst his work, Doin’ Factor, located the vocalist-a few direct No. 5 single people: In addition Contacting Is usually a Fantastic Point.”<br><br>From the fall of 2013, Tour: Luke & which in fact had a remarkable listing of [http://www.cinemaudiosociety.org luke bryan tickets 2014] , which includes Urban. “It’s much like you’re acquiring a approval to visit to the next level, says individuals musicians that have been a part of the Concert tourover in to a greater measure of designers.” It covered as among the most successful trips in the 10-calendar year historical past.<br><br>Feel free to surf to my [http://lukebryantickets.omarfoundation.org justin bieber tickets] web-site tour dates for luke bryan ([http://www.netpaw.org www.netpaw.org]) |
| | |
| ==Method==
| |
| Newton's Method attempts to construct a [[sequence]] <math>x_n</math> from an initial guess <math>x_0</math> that converges towards <math>x_*</math> such that <math>f\, '(x_*)=0</math>. This <math>x_*</math> is called a [[stationary point]] of <math>f(\cdot)</math>.
| |
| | |
| The second order [[Taylor expansion]] <math>f_T(x)</math> of a function <math>f(\cdot)</math> around <math>x_n</math> (where <math>\Delta x = x-x_n</math>) is:
| |
| <math>\displaystyle f_T(x_n+\Delta x)=f_T(x)=f(x_n)+f'(x_n)\Delta x+\frac 1 2 f'' (x_n) \Delta x^2</math>,
| |
| and attains its extremum when its derivative with respect to <math>\Delta x</math> is equal to zero, i.e. when <math>\Delta x</math> solves the linear equation: | |
| :<math>\displaystyle f'(x_n)+f'' (x_n) \Delta x=0.</math>
| |
| (Considering the right-hand side of the above equation as a quadratic in <math>\Delta x</math>, with constant coefficients.)
| |
| | |
| Thus, provided that <math>\displaystyle f(x)</math> is a [[smooth function|twice-differentiable function]] well approximated by its second order Taylor expansion and the initial guess <math>\displaystyle x_0</math> is chosen close enough to <math>x_*</math>, the sequence <math>(x_n)</math> defined by:
| |
| <math>\Delta x = x-x_n = - \frac{f'(x_n)}{f''(x_n)}</math>
| |
| | |
| <math>x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}, \ n = 0, 1, \dots</math>
| |
| | |
| will converge towards a root of <math>f'</math>, i.e. <math>x_*</math> for which <math>f'(x_*)=0</math>.
| |
| | |
| ==Geometric interpretation==
| |
| The geometric interpretation of Newton's method is that at each iteration one approximates <math>f(\mathbf{x})</math> by a [[quadratic function]] around <math>\mathbf{x}_n</math>, and then takes a step towards the maximum/minimum of that quadratic function (in higher dimensions, this may also be a [[saddle point]]). Note that if <math>f(\mathbf{x})</math> happens to ''be'' a quadratic function, then the exact extremum is found in one step.
| |
| | |
| ==Higher dimensions==
| |
| The above [[iteration|iterative scheme]] can be generalized to several dimensions by replacing the derivative with the [[gradient]], <math>\nabla f(\mathbf{x})</math>, and the [[Multiplicative inverse|reciprocal]] of the second derivative with the [[Invertible matrix|inverse]] of the [[Hessian matrix]], <math>H f(\mathbf{x})</math>. One obtains the iterative scheme
| |
| | |
| :<math>\mathbf{x}_{n+1} = \mathbf{x}_n - [H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0.</math> | |
| | |
| Usually Newton's method is modified to include a small step size <math>\gamma>0</math> instead of <math>\gamma=1</math>
| |
| :<math>\mathbf{x}_{n+1} = \mathbf{x}_n - \gamma[H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n).</math> | |
| This is often done to ensure that the [[Wolfe conditions]] are satisfied at each step <math>\mathbf{x}_n \to \mathbf{x}_{n+1}</math> of the iteration.
| |
| | |
| Where applicable, Newton's method converges much faster towards a local maximum or minimum than [[gradient descent]]. In fact, every local minimum has a neighborhood <math>N</math> such that, if we start with <math>\mathbf{x}_0 \in N,</math> Newton's method with step size <math>\gamma=1</math> converges [[rate of convergence|quadratically]] (if the Hessian is [[invertible matrix|invertible]] and a [[Lipschitz continuity|Lipschitz continuous]] function of <math>\mathbf{x}</math> in that neighborhood).
| |
| | |
| Finding the inverse of the Hessian in high dimensions can be an expensive operation. In such cases, instead of directly inverting the Hessian it's better to calculate the vector <math>\mathbf{p}_{n} = [H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n)</math> as the solution to the [[system of linear equations]]
| |
| | |
| :<math>[H f(\mathbf{x}_n)] \mathbf{p}_{n} = \nabla f(\mathbf{x}_n)</math>
| |
| | |
| which may be solved by various factorizations or approximately (but to great accuracy) using [[iterative methods]]. Many of these methods are only applicable to certain types of equations, for example the [[Cholesky factorization]] and [[Conjugate gradient method|conjugate gradient]] will only work if <math>[H f(\mathbf{x}_n)]</math> is a positive definite matrix. While this may seem like a limitation, it's often useful indicator of something gone wrong, for example if a minimization problem is being approached and <math>[H f(\mathbf{x}_n)]</math> is not positive definite, then the iterations are converging to a [[saddle point]] and not a minimum. | |
| | |
| On the other hand, if a [[constrained optimization]] is done (for example, with [[Lagrange multipliers]]), the problem may become one of saddle point finding, in which case the Hessian will be symmetric indefinite and the solution of <math>\mathbf p_n</math> will need to be done with a method that will work for such, such as the '''LDL'''<sup>T</sup> variant of [[Cholesky factorization]] or the [[conjugate residual method]].
| |
| | |
| There also exist various [[quasi-Newton method]]s, where an approximation for the Hessian (or its inverse directly) is built up from changes in the gradient.
| |
| | |
| If the Hessian is close to a non-[[invertible matrix]], the inverted Hessian can be numerically unstable and the solution may diverge. In this case, certain workarounds have been tried in the past, which have varied success with certain problems. One can, for example, modify the Hessian by adding a correction matrix <math>B_n</math> so as to make <math>H_f(\mathbf{x}_n) + B_n</math> positive definite. One approach is to diagonalize <math>H_f</math> and choose <math>B_n</math> so that <math>H_f(\mathbf{x}_n) + B_n</math> has the same eigenvectors as <math>H_f</math>, but with each negative eigenvalue replaced by <math>\epsilon>0.</math>
| |
| | |
| An approach exploited in the [[Levenberg–Marquardt algorithm]] (which uses an approximate Hessian) is to add a scaled identity matrix to the Hessian, <math>\mu \mathbf I</math>, with the scale adjusted at every iteration as needed. For large <math>\mu</math> and small Hessian, the iterations will behave like [[gradient descent]] with step size <math>\frac 1 \mu</math>. This results in slower but more reliable convergence where the Hessian doesn't provide useful information.
| |
| | |
| == Other approximations ==
| |
| | |
| Some functions are poorly approximated by quadratics, particularly when far from a maximum or minimum. In these cases, approximations other than quadratic may be more appropriate.<ref>{{cite journal
| |
| | author = Thomas P. Minka
| |
| | title = Beyond Newton's Method
| |
| | date = 2002-04-17
| |
| | url = http://research.microsoft.com/en-us/um/people/minka/papers/minka-newton.pdf
| |
| | format = [[Portable Document Format|PDF]]
| |
| | accessdate = 2009-02-20}}</ref>
| |
| | |
| ==See also==
| |
| | |
| *[[Quasi-Newton method]]
| |
| *[[Gradient descent]]
| |
| *[[Gauss–Newton algorithm]]
| |
| *[[Levenberg–Marquardt algorithm]]
| |
| *[[Trust region]]
| |
| *[[Optimization (mathematics)|Optimization]]
| |
| *[[Nelder–Mead method]]
| |
| | |
| ==Notes==
| |
| {{reflist}}
| |
| | |
| ==References==
| |
| * Avriel, Mordecai (2003). ''Nonlinear Programming: Analysis and Methods''. Dover Publishing. ISBN 0-486-43227-0.
| |
| * {{cite book|last1=Bonnans|first1=J. Frédéric|last2=Gilbert|first2=J. Charles|last3=Lemaréchal|first3=Claude| authorlink3=Claude Lemaréchal|last4=Sagastizábal|first4=Claudia A.|title=Numerical optimization: Theoretical and practical aspects|url=http://www.springer.com/mathematics/applications/book/978-3-540-35445-1|edition=Second revised ed. of translation of 1997 <!-- ''Optimisation numérique: Aspects théoriques et pratiques'' --> French| series=Universitext|publisher=Springer-Verlag|location=Berlin|year=2006|pages=xiv+490|isbn=3-540-35445-X|doi=10.1007/978-3-540-35447-5|mr=2265882}}
| |
| * {{Cite book | last1=Fletcher | first1=Roger | title=Practical methods of optimization | publisher=[[John Wiley & Sons]] | location=New York | edition=2nd | isbn=978-0-471-91547-8 | year=1987 | postscript=<!--None-->}}.
| |
| * Nocedal, Jorge & Wright, Stephen J. (1999). ''Numerical Optimization''. Springer-Verlag. ISBN 0-387-98793-2.
| |
| | |
| {{optimization algorithms}}
| |
| | |
| {{DEFAULTSORT:Newton's Method In Optimization}}
| |
| [[Category:Optimization algorithms and methods]]
| |
| | |
| [[fr:Méthode de Newton]]
| |
Bryan can be a celebrity within the creating chip tickets along with the career progress 1st second to his third studio recording, & , may be the proof. He burst on the scene in 2007 along with his funny blend of straight down-home ease of access, film superstar good appears and lines, is scheduled t in the key way. The newest recording Top on the nation graph or chart and #2 on the put maps, producing it the second top very first at that time of 2014 to get a country designer.
The boy of the , is aware perseverance and willpower are key elements with regards to a successful career- . His very first album, Continue to be Me, produced the very best strikes “All My Girlfriends Say” and “Country Gentleman,” whilst his work, Doin’ Factor, located the vocalist-a few direct No. 5 single people: In addition Contacting Is usually a Fantastic Point.”
From the fall of 2013, Tour: Luke & which in fact had a remarkable listing of luke bryan tickets 2014 , which includes Urban. “It’s much like you’re acquiring a approval to visit to the next level, says individuals musicians that have been a part of the Concert tourover in to a greater measure of designers.” It covered as among the most successful trips in the 10-calendar year historical past.
Feel free to surf to my justin bieber tickets web-site tour dates for luke bryan (www.netpaw.org)