# Minimax estimator

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) $\theta \in \Theta$ from observations $x\in {\mathcal {X}},$ an estimator (estimation rule) $\delta ^{M}\,\!$ is called minimax if its maximal risk is minimal among all estimators of $\theta \,\!$ . In a sense this means that $\delta ^{M}\,\!$ is an estimator which performs best in the worst possible case allowed in the problem.

## Problem setup

Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter $\theta \,\!$ itself (If we knew what was the actual value of $\theta \,\!$ , we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

## Definition

Definition : An estimator $\delta ^{M}:{\mathcal {X}}\rightarrow \Theta \,\!$ is called minimax with respect to a risk function $R(\theta ,\delta )\,\!$ if it achieves the smallest maximum risk among all estimators, meaning it satisfies

$\sup _{\theta \in \Theta }R(\theta ,\delta ^{M})=\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).\,$ ## Least favorable distribution

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a prior least favorable distribution of $\theta \,\!$ . To demonstrate this notion denote the average risk of the Bayes estimator $\delta _{\pi }\,\!$ with respect to a prior distribution $\pi \,\!$ as

$r_{\pi }=\int R(\theta ,\delta _{\pi })\,d\pi (\theta )\,$ Corollary: If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition.

Example 1, Unfair coin: Consider the problem of estimating the "success" rate of a Binomial variable, $x\sim B(n,\theta )\,\!$ . This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, $\theta \sim {\text{Beta}}({\sqrt {n}}/2,{\sqrt {n}}/2)\,$ is

$\delta ^{M}={\frac {x+0.5{\sqrt {n}}}{n+{\sqrt {n}}}},\,$ with constant Bayes risk

$r={\frac {1}{4(1+{\sqrt {n}})^{2}}}\,$ and, according to the Corollary, is minimax.

Definition: A sequence of prior distributions ${\pi }_{n}\,\!$ is called least favorable if for any other distribution $\pi '\,\!$ ,

$\lim _{n\rightarrow \infty }r_{\pi _{n}}\geq r_{\pi '}.\,$ Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, $\pi _{n}\sim U[-n,n]\,\!$ with increasing support and also with respect to a zero mean normal prior $\pi _{n}\sim N(0,n\sigma ^{2})\,\!$ with increasing variance. So neither the resulting ML estimator is unique minimax nor the least favorable prior is unique.

$R(\theta ,\delta _{ML})=E{\|\delta _{ML}-\theta \|^{2}}=\sum \limits _{1}^{p}E{(x_{i}-\theta _{i})^{2}}=p\sigma ^{2}.\,$ The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence $\pi _{n}\sim N(0,n\sigma ^{2})\,\!$ , and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply admissibility. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever $p>2\,\!$ . The famous James–Stein estimator dominates the ML whenever $p>2\,\!$ . Though both estimators have the same risk $p\sigma ^{2}\,\!$ when $\|\theta \|\rightarrow \infty \,\!$ , and they are both minimax, the James–Stein estimator has smaller risk for any finite $\|\theta \|\,\!$ . This fact is illustrated in the following figure.

## Some examples

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined.

Example 3, Bounded Normal Mean: When estimating the Mean of a Normal Vector $x\sim N(\theta ,I_{n}\sigma ^{2})\,\!$ , where it is known that $\|\theta \|^{2}\leq M\,\!$ . The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever $M\leq n\,\!$ . The analytical expression for this estimator is

$\delta ^{M}={\frac {nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}},\,$ where $J_{n}(t)\,\!$ , is the modified Bessel function of the first kind of order n.

## Asymptotic minimax estimator

The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax --- an estimator $\delta '$ is called $c$ -asymptotic (or approximate) minimax if

$\sup _{\theta \in \Theta }R(\theta ,\delta ')\leq c\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).$ For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of approximate minimax estimator is intimately related to the geometry, such as the metric entropy number, of $\Theta$ .

## Relationship to Robust Optimization

Robust optimization is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,. For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.