# Huber loss function

In statistical theory, the Huber loss function is a function used in robust estimation that allows construction of an estimate which allows the effect of outliers to be reduced, while treating non-outliers in a more standard way.

## Definition

The Huber loss function describes the penalty incurred by an estimation procedure. Huber (1964) defines the loss function piecewise by

$L_{\delta }(a)=(1/2){a^{2}}\qquad \qquad {\text{ for }}|a|\leq \delta ,$ $L_{\delta }(a)=\delta (|a|-\delta /2),\qquad {\text{otherwise}}.$ This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where |a| = δ. In use, the variable $a$ often refers to the residuals, that is to the difference between the observed and predicted values, i.e. $a=y-{\hat {y}}$ .

## Motivation

For estimating parameters, it is desirable for a loss function to have the following properties (for all values of $a$ of the parameter space):

Two very commonly used loss functions are the squared loss, $L(a)=a^{2}$ , and the absolute loss, $L(a)=|a|$ . While the absolute loss is not differentiable at exactly one point, $a=0$ , where it is subdifferentiable with its convex subdifferential equal to the interval $[-1+1]$ ; the absolute-value loss function results in a median-unbiased estimator, which can be evaluated for particular data sets by linear programming. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of $a$ 's (as in $\sum _{i=1}^{n}L(a_{i})$ ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum $a=0$ , at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points $a=-\delta$ and $a=\delta$ . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimor (using the absolute value function).

The log cosh loss function, which is defined as $L(a)=\log(\cosh(a))$ has a behavior like that of the Huber loss function.

## Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as{{ safesubst:#invoke:Unsubst||date=__DATE__ |\$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

$L_{\delta }(a)=\delta ^{2}({\sqrt {1+(a/\delta )^{2}}}-1).$ ## Applications

The Huber loss function is used in robust statistics, M-estimation and additive modelling.