|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{one source|date=November 2010}}
| | I'm Janina and I live in Inning. <br>I'm interested in Film Studies, Dance and Vietnamese art. I like to travel and reading fantasy.<br><br>My web-site ... [http://sportsplusdenver.com/Cyber-Monday/ michael kors wholesale] |
| | |
| In [[statistical theory]], the '''Huber loss function''' is a function used in [[robust statistics|robust]] estimation that allows construction of an estimate which allows the effect of [[outlier]]s to be reduced, while treating non-outliers in a more standard way.
| |
| | |
| ==Definition==
| |
| The '''Huber loss function''' describes the penalty incurred by an [[estimator|estimation procedure]]. [[Peter J. Huber|Huber]] (1964<ref>{{Citation|last=Huber|first=Peter J.|year=1964|title=Robust Estimation of a Location Parameter|journal=Annals of Statistics|volume=53|pages=73–101}}</ref>) defines the loss function piecewise by
| |
| | |
| :<math> L_\delta (a) = (1/2){a^2} \qquad \qquad \text{ for } |a| \le \delta ,</math>
| |
| :<math> L_\delta (a) = \delta (|a| - \delta/2 ), \qquad \text{otherwise}. </math>
| |
| | |
| This function is quadratic for small values of ''a'', and linear for large values, with equal values and slopes of the different sections at the two points where |''a''| = ''δ''. In use, the variable <math>a</math> often refers to the residuals, that is to the difference between the observed and predicted values, i.e. <math> a = y - \hat{y} </math>.
| |
| | |
| ==Motivation==
| |
| For [[parameter estimation|estimating parameters]], it is desirable for a loss function to have the following properties (for all values of <math>a</math> of the [[parameter space]]):
| |
| | |
| # It is greater than or equal to the [[0-1 loss function]] (which is defined as <math>L(a)=0</math> if <math>a=0</math> and <math>L(a)=1</math> otherwise).
| |
| # It is continuous (or lower [[semi-continuity|semicontinuous]]).
| |
| | |
| Two very commonly used loss functions are the [[mean squared error|squared loss]], <math>L(a) = a^2</math>, and the [[absolute deviation|absolute loss]], <math>L(a)=|a|</math>. While the absolute loss is not differentiable at exactly one point, <math>a=0</math>, where it is [[subdifferential|subdifferentiable]] with its [[convex analysis|convex]] [[subdifferential]] equal to the interval <math>[-1+1]</math>; the absolute-value loss function results in a median-unbiased estimator, which can be evaluated for particular data sets by [[linear programming]]. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of <math>a</math>'s (as in <math>\sum_{i=1}^n L(a_i) </math> ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of [[estimation theory]], the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions
| |
| | |
| As defined above, the Huber loss function is [[convex function|convex]] in a uniform neighborhood of its minimum <math>a=0</math>, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points <math> a=-\delta </math> and <math> a = \delta </math>. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimor (using the absolute value function).
| |
| | |
| The [[log cosh loss function]], which is defined as <math> L(a) = \log(\cosh(a)) </math> has a behavior like that of the Huber loss function.
| |
| | |
| ==Pseudo-Huber loss function==
| |
| | |
| The '''Pseudo-Huber loss function''' can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as{{citation needed|date=February 2012}}
| |
| | |
| :<math>L_\delta (a) = \delta^2(\sqrt{1+(a/\delta)^2}-1).</math>
| |
| | |
| As such, this function approximates <math>a^2/2</math> for small values of <math>a</math>, and is parallel with slope <math>\delta</math> for large values of <math>a</math>.
| |
| | |
| ==Applications==
| |
| The Huber loss function is used in [[robust statistics]], [[M-estimator|M-estimation]] and [[additive model]]ling.<ref>Friedman, J. H. (2001), "Greedy Function Approximation: A Gradient Boosting Machine", The Annals of Statistics, Vol. 26, No.5 (Oct. 2001), 1189-1232</ref>
| |
| | |
| ==See also==
| |
| * [[Robust regression]]
| |
| * [[M-estimator]]
| |
| * [[Robust statistics#M-estimators|Visual comparison of different M-estimators]]
| |
| | |
| {{More footnotes|date=November 2010}}
| |
| | |
| ==References==
| |
| <references/>
| |
| | |
| {{DEFAULTSORT:Huber Loss Function}}
| |
| [[Category:Robust statistics]]
| |
| [[Category:M-estimators]]
| |
| [[Category:Loss functions]]
| |
I'm Janina and I live in Inning.
I'm interested in Film Studies, Dance and Vietnamese art. I like to travel and reading fantasy.
My web-site ... michael kors wholesale