# Jensen's inequality

{{#invoke:Hatnote|hatnote}} Jensen's inequality generalizes the statement that a secant line of a convex function lies above the graph.

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function,

$tf(x_{1})+(1-t)f(x_{2}),$ while the graph of the function is the convex function of the weighted means,

$f\left(tx_{1}+(1-t)x_{2}\right).$ In the context of probability theory, it is generally stated in the following form: if X is a random variable and Template:Mvar is a convex function, then

$\varphi \left(\mathbb {E} [X]\right)\leq \mathbb {E} \left[\varphi (X)\right].$ ## Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

### Finite form

For a real convex function Template:Mvar, numbers x1, x2, ..., xn in its domain, and positive weights ai, Jensen's inequality can be stated as:

$\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{j}}}\right)\leq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{j}}}\qquad \qquad (1)$ and the inequality is reversed if Template:Mvar is concave, which is

$\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{j}}}\right)\geq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{j}}}.\qquad \qquad (2)$ As a particular case, if the weights ai are all equal, then (1) and (2) become

$\varphi \left({\frac {\sum x_{i}}{n}}\right)\leq {\frac {\sum \varphi (x_{i})}{n}}\qquad \qquad (3)$ $\varphi \left({\frac {\sum x_{i}}{n}}\right)\geq {\frac {\sum \varphi (x_{i})}{n}}\qquad \qquad (4)$ For instance, the function log(x) is concave, so substituting φ(x) = log(x) in the previous formula (4) establishes the (logarithm of) the familiar arithmetic mean-geometric mean inequality:

${\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}\geq {\sqrt[{n}]{x_{1}\cdot x_{2}\cdots x_{n}}}.\quad {\text{or}}\quad \log \!\left({\frac {\sum _{i=1}^{n}x_{i}}{n}}\right)\geq {\frac {\sum _{i=1}^{n}\log \!\left(x_{i}\right)}{n}}$ A common application has x as a function of another variable (or set of variables) t, that is, xi = g(ti). All of this carries directly over to the general continuous case: the weights ai are replaced by a non-negative integrable function f (x), such as a probability distribution, and the summations are replaced by integrals.

### Measure-theoretic and probabilistic form

Let (Ω, A, μ) be a measure space, such that μ(Ω) = 1. If g is a real-valued function that is μ-integrable, and if Template:Mvar is a convex function on the real line, then:

$\varphi \left(\int _{\Omega }g\,d\mu \right)\leq \int _{\Omega }\varphi \circ g\,d\mu .$ In real analysis, we may require an estimate on

$\varphi \left(\int _{a}^{b}f(x)\,dx\right),$ where a, bR, and f  : [a, b] → R is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of [a, b] need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get

$\varphi \left(\int _{a}^{b}f(x)\,dx\right)\leq {\frac {1}{b-a}}\int _{a}^{b}\varphi ((b-a)f(x))\,dx.$ The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let $(\Omega ,{\mathfrak {F}},\mathbb {P} )$ be a probability space, X an integrable real-valued random variable and Template:Mvar a convex function. Then:

$\varphi \left(\mathbb {E} [X]\right)\leq \mathbb {E} \left[\varphi (X)\right].$ In this probability setting, the measure Template:Mvar is intended as a probability $\mathbb {P}$ , the integral with respect to Template:Mvar as an expected value $\mathbb {E}$ , and the function g as a random variable X.

Notice that the equality holds if and only if Template:Mvar is constant (degenerate random variable) or Template:Mvar is linear.

### General inequality in a probabilistic setting

$\varphi \left(\mathbb {E} \left[X|{\mathfrak {G}}\right]\right)\leq \mathbb {E} \left[\varphi (X)|{\mathfrak {G}}\right].$ Here $\mathbb {E} [\cdot |{\mathfrak {G}}]$ stands for the expectation conditioned to the σ-algebra ${\mathfrak {G}}$ . This general statement reduces to the previous ones when the topological vector space Template:Mvar is the real axis, and ${\mathfrak {G}}$ is the trivial Template:Mvar-algebra {∅, Ω}.

(Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in.)

## Proofs A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along the Template:Mvar axis is the hypothetical distribution of Template:Mvar, while the dashed curve along the Template:Mvar axis is the corresponding distribution of Template:Mvar values. Note that the convex mapping Y(X) increasingly "stretches" the distribution for increasing values of Template:Mvar.

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where Template:Mvar is a real number (see figure). Assuming a hypothetical distribution of Template:Mvar values, one can immediately identify the position of $\mathbb {E} [X]$ and its image $\varphi (\mathbb {E} [X])$ in the graph. Noticing that for convex mappings Y = φ(X) the corresponding distribution of Template:Mvar values is increasingly "stretched out" for increasing values of X, it is easy to see that the distribution of Template:Mvar is broader in the interval corresponding to X > X0 and narrower in X < X0 for any X0; in particular, this is also true for $X_{0}=\mathbb {E} [X]$ . Consequently, in this picture the expectation of Template:Mvar will always shift upwards with respect to the position of $\varphi (\mathbb {E} [X])$ . A similar reasoning holds if the distribution of Template:Mvar covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

$\mathbb {E} [Y]=\mathbb {E} [\varphi (X)]\geq \varphi (\mathbb {E} [X]),$ with equality when φ(X) is not strictly convex, e.g. when it is a straight line, or when Template:Mvar follows a degenerate distribution (i.e. is a constant).

The proofs below formalize this intuitive notion.

### Proof 1 (finite form)

If λ1 and λ2 are two arbitrary nonnegative real numbers such that λ1 + λ2 = 1 then convexity of Template:Mvar implies

$\forall x_{1},x_{2}:\qquad \varphi \left(\lambda _{1}x_{1}+\lambda _{2}x_{2}\right)\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2}).$ This can be easily generalized: if λ1, ..., λn are nonnegative real numbers such that λ1 + ... + λn = 1, then

$\varphi (\lambda _{1}x_{1}+\lambda _{2}x_{2}+\cdots +\lambda _{n}x_{n})\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2})+\cdots +\lambda _{n}\,\varphi (x_{n}),$ for any x1, ..., xn. This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose it is true also for some n, one needs to prove it for n + 1. At least one of the λi is strictly positive, say λ1; therefore by convexity inequality:

{\begin{aligned}\varphi \left(\sum _{i=1}^{n+1}\lambda _{i}x_{i}\right)&=\varphi \left(\lambda _{1}x_{1}+(1-\lambda _{1})\sum _{i=2}^{n+1}{\frac {\lambda _{i}}{1-\lambda _{1}}}x_{i}\right)\\&\leq \lambda _{1}\,\varphi (x_{1})+(1-\lambda _{1})\varphi \left(\sum _{i=2}^{n+1}{\frac {\lambda _{i}}{1-\lambda _{1}}}x_{i}\right).\end{aligned}} Since

$\sum _{i=2}^{n+1}{\frac {\lambda _{i}}{1-\lambda _{1}}}=1,$ one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

$\varphi \left(\int x\,d\mu _{n}(x)\right)\leq \int \varphi (x)\,d\mu _{n}(x),$ where μn is a measure given by an arbitrary convex combination of Dirac deltas:

$\mu _{n}=\sum _{i=1}^{n}\lambda _{i}\delta _{x_{i}}.$ Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

### Proof 2 (measure-theoretic form)

Let g be a real-valued μ-integrable function on a probability space Ω, and let Template:Mvar be a convex function on the real numbers. Since Template:Mvar is convex, at each real number Template:Mvar we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of Template:Mvar at Template:Mvar, but which are at or below the graph of Template:Mvar at all points.

Now, if we define

$x_{0}:=\int _{\Omega }g\,d\mu ,$ because of the existence of subderivatives for convex functions, we may choose a and b such that

$ax+b\leq \varphi (x),$ for all real x and

$ax_{0}+b=\varphi (x_{0}).$ But then we have that

$\varphi \circ g(x)\geq ag(x)+b$ for all x. Since we have a probability measure, the integral is monotone with μ(Ω) = 1 so that

$\int _{\Omega }\varphi \circ g\,d\mu \geq \int _{\Omega }(ag+b)\,d\mu =a\int _{\Omega }g\,d\mu +b\int _{\Omega }d\mu =ax_{0}+b=\varphi (x_{0})=\varphi \left(\int _{\Omega }g\,d\mu \right),$ as desired.

### Proof 3 (general inequality in a probabilistic setting)

Let X be an integrable random variable that takes values in a real topological vector space T. Since φ : TR is convex, for any $x,y\in T$ , the quantity

${\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }},$ is decreasing as Template:Mvar approaches 0+. In particular, the subdifferential of Template:Mvar evaluated at x in the direction Template:Mvar is well-defined by

$(D\varphi )(x)\cdot y:=\lim _{\theta \downarrow 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}=\inf _{\theta \neq 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}.$ 