|
|
Line 1: |
Line 1: |
| In [[numerical analysis|numerical]] [[optimization (mathematics)|optimization]], the '''[[Charles George Broyden|Broyden]]–[[Roger Fletcher (mathematician)|Fletcher]]–[[Daniel Goldfarb|Goldfarb]]<ref>{{cite web|url=http://www.columbia.edu/~goldfarb/ |title=Donald Goldfarb: Home Page |publisher=Columbia.edu |date= |accessdate=2013-07-18}}</ref>–[[David F. Shanno|Shanno]]<ref>{{cite web|url=http://rutcor.rutgers.edu/~shanno/ |title=David Shanno |publisher=Rutcor.rutgers.edu |date= |accessdate=2013-07-18}}</ref>''' ('''BFGS''') '''algorithm''' is an [[iterative method]] for solving unconstrained [[nonlinear optimization]] problems.
| | [http://Www.Wonderhowto.com/search/Yesterday+I/ Yesterday I] woke up and noticed - I have also been single for a little while today and after much bullying from pals I now locate myself registered for internet dating. They guaranteed me that there are plenty of ordinary, sweet and enjoyable people to meet, so the pitch is gone by here!<br>My friends and fam are awesome and spending time with them at bar gigabytes or meals is obviously a necessity. As I find you could [http://lukebryantickets.hamedanshahr.com luke bryan cheap tickets] do not get a significant conversation using the noise I have never been in to night clubs. I likewise have 2 definitely cheeky and very adorable dogs that are constantly keen to meet fresh folks.<br>I make an effort to maintain as physically fit as potential [http://www.Wired.com/search?query=staying staying] at the gym many times a week. I love my athletics and attempt to play photos of luke bryan ([http://lukebryantickets.omarfoundation.org navigate to this web-site]) or see as numerous a possible. Being wintertime I am going to frequently at Hawthorn suits. Notice: Supposing that you will contemplated purchasing luke bryan concert tour, [http://lukebryantickets.asiapak.net navigate to this web-site], a sport I do not mind, I have experienced the carnage of fumbling suits at stocktake sales.<br><br> |
|
| |
|
| The BFGS method [[approximation theory|approximates]] [[Newton's method in optimization|Newton's method]], a class of [[hill climbing|hill-climbing optimization]] techniques that seeks a [[stationary point]] of a (preferably twice continuously differentiable) function. For such problems, a [[Kuhn–Tucker conditions|necessary condition for optimality]] is that the [[gradient]] be zero.
| | My homepage; [http://www.cinemaudiosociety.org luke bryan songs] |
| Newton's method and the BFGS methods do not need to converge unless the function has a quadratic [[Taylor expansion]] near an [[Local optimum|optimum]]. These methods use both the first and second derivatives of the function. However, BFGS has proven to have good performance even for non-smooth optimizations {{citation needed|date=October 2012}}.
| |
|
| |
| In [[quasi-Newton methods]], the [[Hessian matrix]] of second [[derivative]]s doesn't need to be evaluated directly. Instead, the Hessian matrix is approximated using rank-one updates specified by gradient evaluations (or approximate gradient evaluations). [[Quasi-Newton methods]] are generalizations of the [[secant method]] to find the root of the first derivative
| |
| for multidimensional problems. In multi-dimensional problems, the secant equation does not specify a unique solution, and quasi-Newton methods differ in how they constrain the solution. The BFGS method is one of the most popular members of this class.<ref>{{harvtxt|Nocedal|Wright|2006}}, page 24</ref> Also in common use is [[L-BFGS]], which is a limited-memory version of BFGS that is particularly suited to problems with very large numbers of variables (e.g., >1000). The BFGS-B<ref>R. H. Byrd, P. Lu and J. Nocedal. [http://www.ece.northwestern.edu/~nocedal/PSfiles/limited.ps.gz A Limited Memory Algorithm for Bound Constrained Optimization] (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190–1208.</ref> variant handles simple box constraints.
| |
| | |
| ==Rationale==
| |
| | |
| The search direction '''p'''<sub>'''''k'''''</sub> at stage ''k'' is given by the solution of the analogue of the Newton equation
| |
| | |
| :<math> B_k \mathbf{p}_k = - \nabla f(\mathbf{x}_k)</math>
| |
| | |
| where <math>B_k</math> is an approximation to the [[Hessian matrix]] which is updated iteratively at each stage, and <math>\nabla f(\mathbf{x}_k)</math> is the gradient of the function evaluated at '''x'''<sub>''k''</sub>. A [[line search]] in the direction '''p'''<sub>''k''</sub> is then used to find the next point '''x'''<sub>''k''+1</sub>. Instead of requiring the full Hessian matrix at the point '''x'''<sub>''k''+1</sub> to be computed as ''B''<sub>''k''+1</sub>, the approximate Hessian at stage ''k'' is updated by the addition of two matrices.
| |
| | |
| :<math>B_{k+1}=B_k+U_k+V_k\,\!</math>
| |
| | |
| Both ''U<sub>k</sub>'' and ''V<sub>k</sub>'' are symmetric rank-one matrices but have different (matrix) bases. The symmetric rank one assumption here means that we may write
| |
| | |
| :<math>C=\mathbf{a}\mathbf{b}^\mathrm{T}</math>
| |
| | |
| So equivalently, ''U<sub>k</sub>'' and ''V<sub>k</sub>'' construct a rank-two update matrix which is robust against the scale problem often suffered in the [[gradient descent]] searching (''e.g.'', in [[Broyden's method]]).
| |
| | |
| The quasi-Newton condition imposed on this update is
| |
| | |
| :<math>B_{k+1} (\mathbf{x}_{k+1}-\mathbf{x}_k ) = \nabla f(\mathbf{x}_{k+1}) -\nabla f(\mathbf{x}_k ).</math>
| |
| | |
| ==Algorithm==
| |
| | |
| From an initial guess <math>\mathbf{x}_0</math> and an approximate Hessian matrix <math>B_0</math> the following steps are repeated as <math>\mathbf{x}_k</math> converges to the solution.
| |
| | |
| # Obtain a direction <math>\mathbf{p}_k</math> by solving: <math>B_k \mathbf{p}_k = -\nabla f(\mathbf{x}_k).</math>
| |
| # Perform a [[line search]] to find an acceptable stepsize <math>\alpha_k</math> in the direction found in the first step, then update <math>\mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k\mathbf{p}_k.</math>
| |
| # Set <math> \mathbf{s}_k=\alpha_k \mathbf{p}_k.</math>
| |
| # <math>\mathbf{y}_k = {\nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k)}.</math>
| |
| # <math>B_{k+1} = B_k + \frac{\mathbf{y}_k \mathbf{y}_k^{\mathrm{T}}}{\mathbf{y}_k^{\mathrm{T}} \mathbf{s}_k} - \frac{B_k \mathbf{s}_k \mathbf{s}_k^{\mathrm{T}} B_k }{\mathbf{s}_k^{\mathrm{T}} B_k \mathbf{s}_k}.</math>
| |
| | |
| <math>f(\mathbf{x})</math> denotes the objective function to be minimized. Convergence can be checked by observing the norm of the gradient, <math>\left|\nabla f(\mathbf{x}_k)\right|</math>. Practically, <math>B_0</math> can be initialized with <math>B_0 = I</math>, so that the first step will be equivalent to a [[gradient descent]], but further steps are more and more refined by <math>B_{k}</math>, the approximation to the Hessian.
| |
| | |
| The first step of the algorithm is carried out using the inverse of the matrix <math>B_k</math>, which is usually obtained efficiently by applying the [[Sherman–Morrison formula]] to the fifth line of the algorithm, giving
| |
| | |
| : <math>B_{k+1}^{-1} = \left (I-\frac { s_k y_k^T} {y_k^T s_k} \right ) B_{k}^{-1} \left (I-\frac { y_k s_k^T} {y_k^T s_k} \right )+\frac
| |
| {s_k s_k^T} {y_k^T \, s_k}.</math>
| |
| | |
| This can be computed efficiently without temporary matrices, recognizing that <math>B_k^{-1}</math> is symmetric,
| |
| and that <math>\mathbf{y}_k^{\mathrm{T}} B_k^{-1} \mathbf{y}_k</math> and <math>\mathbf{s}_k^{\mathrm{T}} \mathbf{y}_k</math> are scalar, using an expansion such as
| |
| | |
| : <math>B_{k+1}^{-1} = B_k^{-1} + \frac{(\mathbf{s}_k^{\mathrm{T}}\mathbf{y}_k+\mathbf{y}_k^{\mathrm{T}} B_k^{-1} \mathbf{y}_k)(\mathbf{s}_k \mathbf{s}_k^{\mathrm{T}})}{(\mathbf{s}_k^{\mathrm{T}} \mathbf{y}_k)^2} - \frac{B_k^{-1} \mathbf{y}_k \mathbf{s}_k^{\mathrm{T}} + \mathbf{s}_k \mathbf{y}_k^{\mathrm{T}}B_k^{-1}}{\mathbf{s}_k^{\mathrm{T}} \mathbf{y}_k}.</math>
| |
| | |
| In statistical estimation problems (such as maximum likelihood or Bayesian inference), [[credible interval]]s or [[confidence interval]]s for the solution can be estimated from the [[matrix inverse|inverse]] of the final Hessian matrix. However, these quantities are technically defined by the true Hessian matrix, and the BFGS approximation may not converge to the true Hessian matrix.
| |
| | |
| == Implementations ==
| |
| In the MATLAB [[Optimization Toolbox]], the [http://www.mathworks.com/help/toolbox/optim/ug/fminunc.html fminunc] function uses BFGS with cubic [[line search]] when the problem size is set to [http://www.mathworks.com/help/toolbox/optim/ug/brnoxr7-1.html#brnpcye "medium scale."]
| |
| The [[GNU Scientific Library|GSL]] implements BFGS as [http://www.gnu.org/software/gsl/manual/html_node/Multimin-Algorithms-with-Derivatives.html gsl_multimin_fdfminimizer_vector_bfgs2]. The [http://code.google.com/p/ceres-solver/ ceres] solver implements both BFGS and [[L-BFGS]] for the subclass of nonlinear least squares problems. In [[SciPy]], the [http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_bfgs.html#scipy.optimize.fmin_bfgs scipy.optimize.fmin_bfgs] function implements BFGS.
| |
| It is also possible to run BFGS using any of the [[L-BFGS]] algorithms by setting the parameter L to a very large number.
| |
| | |
| A high-precision arithmetic version of BFGS ([http://www.loshchilov.com/pbfgs.html pBFGS]), implemented in C++ and integrated with the high-precision arithmetic package [http://crd-legacy.lbl.gov/~dhbailey/mpdist/ ARPREC] is robust against numerical instability (e.g. round-off errors).
| |
| | |
| ==See also==
| |
| * [[Quasi-Newton methods]]
| |
| * [[Davidon–Fletcher–Powell formula]]
| |
| * [[L-BFGS]]
| |
| * [[Gradient descent]]
| |
| * [[Nelder–Mead method]]
| |
| * [[Pattern search (optimization)]]
| |
| * [[BHHH algorithm]]
| |
| | |
| ==Notes==
| |
| <references/>
| |
| | |
| ==Bibliography==
| |
| * {{Citation | last1=Avriel |first1=Mordecai |year=2003|title=Nonlinear Programming: Analysis and Methods|publisher= Dover Publishing|isbn= 0-486-43227-0}}
| |
| * {{Citation|last1=Bonnans|first1=J. Frédéric|last2=Gilbert|first2=J. Charles|last3=Lemaréchal|first3=Claude| authorlink3=Claude Lemaréchal|last4=Sagastizábal|first4=Claudia A.|title=Numerical optimization: Theoretical and practical aspects|url=http://www.springer.com/mathematics/applications/book/978-3-540-35445-1|edition=Second revised ed. of translation of 1997 <!-- ''Optimisation numérique: Aspects théoriques et pratiques'' --> French| series=Universitext|publisher=Springer-Verlag|location=Berlin|year=2006|pages=xiv+490|isbn=3-540-35445-X|doi=10.1007/978-3-540-35447-5|mr=2265882}}
| |
| * {{Citation| last=Broyden | first=C. G. | authorlink=Charles George Broyden | year=1970 | title=The convergence of a class of double-rank minimization algorithms | journal=Journal of the Institute of Mathematics and Its Applications | volume=6 | pages=76–90 | doi=10.1093/imamat/6.1.76}}
| |
| * {{Citation | last1=Fletcher|first1= R.|title= A New Approach to Variable Metric Algorithms|journal=Computer Journal|year=1970|volume=13|pages=317–322 | doi=10.1093/comjnl/13.3.317 | issue=3}}
| |
| * {{Citation | last1=Fletcher | first1=Roger | title=Practical methods of optimization | publisher=[[John Wiley & Sons]] | location=New York | edition=2nd | isbn=978-0-471-91547-8 | year=1987}}
| |
| * {{Citation|author-link=Donald Goldfarb|last=Goldfarb|first= D.|title=A Family of Variable Metric Updates Derived by Variational Means|journal=Mathematics of Computation|year=1970|volume=24|pages=23–26|doi=10.1090/S0025-5718-1970-0258249-6|issue=109}}
| |
| * {{Citation|last1=Luenberger|first1=David G.|authorlink1=David G. Luenberger|last2=Ye|first2=Yinyu|authorlink2=Yinyu Ye|title=Linear and nonlinear programming|edition=Third|series=International Series in Operations Research & Management Science|volume=116|publisher=Springer|location=New York|year=2008|pages=xiv+546|isbn=978-0-387-74502-2| mr = 2423726}}
| |
| * {{Citation | last1=Nocedal | first1=Jorge | last2=Wright | first2=Stephen J. | title=Numerical Optimization | publisher=[[Springer-Verlag]] | location=Berlin, New York | edition=2nd | isbn=978-0-387-30303-1 | year=2006}}
| |
| * {{Citation | last1=Shanno|first1= David F.|title=Conditioning of quasi-Newton methods for function minimization|date=July 1970|journal=Math. Comput.|volume=24|pages= 647–656|mr=42:8905|doi=10.1090/S0025-5718-1970-0274029-X | issue=111 }}
| |
| * {{Citation | last1=Shanno|first1= David F.|first2= Paul C. |last2=Kettler|title=Optimal conditioning of quasi-Newton methods|date=July 1970|journal=Math. Comput.|volume=24|pages=657–664|mr=42:8906|doi=10.1090/S0025-5718-1970-0274030-6 | issue=111 }}
| |
| | |
| == External links ==
| |
| * [http://www.loshchilov.com/pbfgs.html SOURCE CODE OF HIGH-PRECISION BFGS] A C++ source code of BFGS with high-precision arithmetic
| |
| | |
| {{Optimization algorithms}}
| |
| | |
| {{DEFAULTSORT:Broyden-Fletcher-Goldfarb-Shanno algorithm}}
| |
| [[Category:Optimization algorithms and methods]]
| |
Yesterday I woke up and noticed - I have also been single for a little while today and after much bullying from pals I now locate myself registered for internet dating. They guaranteed me that there are plenty of ordinary, sweet and enjoyable people to meet, so the pitch is gone by here!
My friends and fam are awesome and spending time with them at bar gigabytes or meals is obviously a necessity. As I find you could luke bryan cheap tickets do not get a significant conversation using the noise I have never been in to night clubs. I likewise have 2 definitely cheeky and very adorable dogs that are constantly keen to meet fresh folks.
I make an effort to maintain as physically fit as potential staying at the gym many times a week. I love my athletics and attempt to play photos of luke bryan (navigate to this web-site) or see as numerous a possible. Being wintertime I am going to frequently at Hawthorn suits. Notice: Supposing that you will contemplated purchasing luke bryan concert tour, navigate to this web-site, a sport I do not mind, I have experienced the carnage of fumbling suits at stocktake sales.
My homepage; luke bryan songs