Pseudo-differential operator: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Michael Hardy
No edit summary
 
en>Quietbritishjim
Line 1: Line 1:
Then in a pair connected with a lovely island where your own personal peaceful village is experiencing beaches and woods right up until the enemies known mainly because the BlackGuard led by Lieutenant Hammerman invades your area. After managing to guard against a minuscule invasion force, he provides avenge his loss in battle.<br><br>
{{summarize|to|Covariance matrix#Estimation|date=February 2013}}
In [[statistics]], sometimes the [[covariance matrix]] of a [[multivariate random variable]] is not known but has to be [[estimation theory|estimated]]. '''Estimation of covariance matrices''' then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the [[Joint probability distribution|multivariate distribution]]. Simple cases, where observations are complete, can be dealt with by using the [[sample covariance matrix]]. The sample covariance matrix (SCM) is an [[unbiased estimator|unbiased]] and [[Efficiency (statistics)|efficient estimator]] of the covariance matrix if the space of covariance matrices is viewed as an [[Differential geometry#Intrinsic versus extrinsic|extrinsic]] [[convex cone]] in '''R'''<sup>''p''&times;''p''</sup>; however, measured using the [[Symmetric space|intrinsic geometry]] of [[Positive-definite matrix|positive-definite matrices]], the SCM is a [[Biased estimator|biased]] and inefficient estimator.<ref name="Smith 2005">{{cite journal| title=Covariance, Subspace, and Intrinsic Cramér–Rao Bounds| journal=IEEE Trans. Signal Processing|date=May 2005| volume=53 |pages=1610–1630| url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1420804&tag=1| author=Smith, Steven Thomas| issue=5| doi=10.1109/TSP.2005.845428}}</ref> In addition, if the random variable has [[normal distribution]], the sample covariance matrix has [[Wishart distribution]] and a slightly differently scaled version of it is the [[maximum likelihood estimate]]. Cases involving [[missing data]] require deeper considerations. Another issue is the [[robust statistics|robustness]] to [[outlier]]s:<ref>''Robust Estimation and Outlier Detection with Correlation Coefficients'', Susan J. Devlin, R. Gnanadesikan, J. R. Kettenring, Biometrika, Vol. 62, No. 3 (Dec., 1975), pp. 531–545</ref> "Sample covariance matrices are extremely sensitive to outliers".<ref>''Robust Statistics'', [[Peter. J. Huber]], Wiley, 1981 (republished in paperback, 2004)</ref><ref>"Modern applied statistics with S", [[William N. Venables]], [[Brian D. Ripley]], Springer, 2002, ISBN 0-387-95457-0, ISBN 978-0-387-95457-8, page 336</ref>
Statistical analyses of multivariate data often involve exploratory studies of the way in which the variables change in relation to one another and this may be followed up by explicit statistical models involving the covariance matrix of the variables. Thus the estimation of covariance matrices directly from observational data plays two roles:
:* to provide initial estimates that can be used to study the inter-relationships;
:* to provide sample estimates that can be used for model checking.


If as a parent you're the one concerned with movie recreation content, control what down loadable mods are put previously sport. These online mods are usually caused by players, perhaps not that gaming businesses, therefore there is no ranking system. You actually thought was a more or less un-risky game can move a lot worse suffering from any of these mods.<br><br>To savor unlimited points, resources, loose change or gems, you is required to download the clash of clans hack into tool by clicking on his or her button. Depending regarding the operating system that are generally using, you will need to run the downloaded list as administrator. Provide the log in ID and choose the device. After this, you are want enter the number because of gems or coins that you'd like to get.<br><br>are a group linked to coders that loves into play Cof. We are going to are continuously developing Hacks to speed up Levelling easily and to get more gems for clear. Without our hacks it will take you ages as a way to reach your level.<br><br>Most of us can use this procedure to acquisition the discount of any time in the course of 1hr and one special day. For [https://www.gov.uk/search?q=archetype archetype] to acquisition the majority of dispatch up 4 a long time, acting x = 15, 400 abnormal and you receive y equals 51 gems.<br><br>Also, the association alcazar together with your war abject are altered versus one in your whole village, so it then charge end up clearly abounding seaprately. Militia donated to a rivalry abject is going turn out to be acclimated to avert it adjoin all attacks within course of action day. If you loved this post and you wish to receive much more information relating to [http://prometeu.net clash of clans hack free download no survey] i implore you to visit our web page. Unlike you rregular apple though, there is no enforce to appeal troops to your war base; they are unquestionably automatically open. A variety of troops can be questioned in case you desire however.<br><br>If you want to master game play near shooter video games, excel att your weapons. See everything there is to learn about each and every weapon style in video game. Each weapon excels found in certain ways, but tumbles short in others. When you know the exact pluses and minuses of each weapon, you can use them to registered advantage.
Estimates of covariance matrices are required at the initial stages of [[principal component analysis]] and [[factor analysis]], and are also involved in versions of [[regression analysis]] that treat the [[dependent variable]]s in a data-set, jointly with the [[independent variable]] as the outcome of a random sample.
 
==Estimation in a general context==
Given a [[Sample (statistics)|sample]] consisting of ''n'' independent observations ''x''<sub>1</sub>,..., ''x''<sub>''n''</sub> of a ''p''-dimensional [[random vector]] ''X'' &isin; '''R'''<sup>''p''&times;1</sup> (a ''p''&times;1 column-vector), an [[Bias of an estimator|unbiased]] [[estimator]] of the (''p''×''p'') [[covariance matrix]]
 
:<math>\operatorname{cov}(X) = \operatorname{E}\left[\left(X-\operatorname{E}[X])(X-\operatorname{E}[X]\right)^\mathrm{T}\right]</math>
 
is the [[Sample mean and covariance|sample covariance matrix]]
 
:<math>\mathbf{Q} = {1 \over {n-1}}\sum_{i=1}^n (x_i-\overline{x})(x_i-\overline{x})^\mathrm{T},</math>
 
where <math>\textstyle x_i</math> is the ''i''-th observation of the ''p''-dimensional random vector, and
 
:<math>\overline{x} =\left[ \begin{array} [c]{c}\bar{x}_{1}\\ \vdots\\ \bar{x}_{p}\end{array} \right]  = {1 \over {n}}\sum_{i=1}^n x_i</math>
 
is the [[Sample mean and covariance|sample mean]].
This is true regardless of the distribution of the random variable ''X'', provided of course that the theoretical means and covariances exist. The reason for the factor ''n''&nbsp;&minus;&nbsp;1 rather than ''n'' is essentially the same as the reason for the same factor appearing in unbiased estimates of [[Variance#Population_variance_and_sample_variance|sample variances]] and [[Sample mean and sample covariance|sample covariances]], which relates to the fact that the mean is not known and is replaced by the sample mean.
 
In cases where the distribution of the [[random variable]]  ''X'' is known to be within a certain family of distributions, other estimates may be derived on the basis of that assumption. A well-known instance is when the [[random variable]] ''X'' is [[multivariate normal distribution|normally distributed]]: in this case the [[maximum likelihood]] [[estimator]] of the covariance matrix is slightly different from the unbiased estimate, and is given by
 
:<math>\mathbf{Q_n} = {1 \over n}\sum_{i=1}^n (x_i-\overline{x})(x_i-\overline{x})^\mathrm{T}.</math>
 
A derivation of this result is given below. Clearly, the difference between the unbiased estimator and the maximum likelihood estimator diminishes for large ''n''.
 
In the general case, the unbiased estimate of the covariance matrix provides an acceptable estimate when the data vectors in the observed data set are all complete: that is they contain no [[missing values|missing elements]]. One approach to estimating the covariance matrix is to treat the estimation of each variance or pairwise covariance separately, and to use all the observations for which both variables have valid values. Assuming the missing data are [[missing at random]] this results in an estimate for the covariance matrix which is unbiased. However, for many applications this estimate may not be acceptable because the estimated covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimated correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix.
 
When estimating the [[cross-covariance]] of a pair of signals that are [[wide-sense stationary]], missing samples do ''not'' need be random (e.g., sub-sampling by an arbitrary factor is valid).
 
==Maximum-likelihood estimation for the multivariate normal distribution==
{{main|Multivariate normal distribution}}
A random vector ''X'' &isin; '''R'''<sup>''p''</sup> (a ''p''&times;1 "column vector") has a multivariate normal distribution with a nonsingular covariance matrix Σ precisely if Σ &isin; '''R'''<sup>''p'' &times; ''p''</sup> is a [[positive-definite matrix]] and the [[probability density function]] of ''X'' is
 
:<math>f(x)=(2\pi)^{-p/2}\, \det(\Sigma)^{-1/2} \exp\left(-{1 \over 2} (x-\mu)^\mathrm{T} \Sigma^{-1} (x-\mu)\right)</math>
 
where ''μ'' ∈ '''R'''<sup>''p''&times;1</sup> is the [[expected value]] of ''X''. The [[covariance matrix]] ''Σ'' is the multidimensional analog of what in one dimension would be the [[variance]], and <math>(2\pi)^{-p/2}\det(\Sigma)^{-1/2}</math> normalizes the density <math>f(x)</math> so that it integrates to 1.
 
Suppose now that ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> are [[statistical independence|independent]] and identically distributed samples from the distribution above.  Based on the [[observed value]]s ''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub> of this [[Sample (statistics)#Mathematical description|sample]], we wish to estimate Σ.
 
===First steps===
 
The likelihood function is:
 
: <math> \mathcal{L}(\mu,\Sigma)=(2\pi)^{-np/2}\, \prod_{i=1}^n \det(\Sigma)^{-1/2} \exp\left(-{1 \over 2} (x_i-\mu)^\mathrm{T} \Sigma^{-1} (x_i-\mu)\right) </math>
 
It is fairly readily shown that the [[maximum likelihood|maximum-likelihood]] estimate of the mean vector ''μ'' is the "[[sample mean]]" vector:
 
:<math>\overline{x}=(x_1+\cdots+x_n)/n.</math>
 
See [[normal distribution#Estimation of parameters|the section on estimation in the article on the normal distribution]] for details; the process here is similar.
 
Since the estimate <math>\bar{x}</math> does not depend on Σ, we can just substitute it for ''μ'' in the [[likelihood function]], getting
 
: <math>\mathcal{L}(\overline{x},\Sigma) \propto \det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \sum_{i=1}^n (x_i-\overline{x})^\mathrm{T} \Sigma^{-1} (x_i-\overline{x})\right),</math>
 
and then seek the value of &Sigma; that maximizes the likelihood of the data (in practice it is easier to work with log&nbsp;<math>\mathcal{L}</math>).
 
===The trace of a 1 &times; 1 matrix===
 
Now we come to the first surprising step: regard the [[scalar (mathematics)|scalar]] <math>(x_i-\overline{x})^\mathrm{T} \Sigma^{-1} (x_i-\overline{x})</math> as the [[trace (matrix)|trace]] of a 1&times;1 matrix.
 
This makes it possible to use the identity tr(''AB'') = tr(''BA'') whenever ''A'' and ''B'' are matrices so shaped that both products exist.
We get
 
:<math>
\mathcal{L}(\overline{x},\Sigma)\propto \det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \sum_{i=1}^n \operatorname{tr}((x_i-\overline{x})^\mathrm{T} \Sigma^{-1} (x_i-\overline{x})) \right)</math>
 
:<math>=\det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \sum_{i=1}^n \operatorname{tr}((x_i-\overline{x}) (x_i-\overline{x})^\mathrm{T} \Sigma^{-1}) \right)</math>
 
(so now we are taking the trace of a ''p''&times;''p'' matrix)
 
:<math>=\det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \operatorname{tr} \left( \sum_{i=1}^n (x_i-\overline{x}) (x_i-\overline{x})^\mathrm{T} \Sigma^{-1} \right) \right)</math>
 
:<math>=\det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \operatorname{tr} \left( S \Sigma^{-1} \right) \right)</math>
 
where
 
:<math>S=\sum_{i=1}^n (x_i-\overline{x}) (x_i-\overline{x})^\mathrm{T} \in \mathbf{R}^{p\times p}.</math>
<math>S</math> is sometimes called the [[scatter matrix]], and is positive definite if there exists a subset of the data consisting of <math>p</math> linearly independent observations (which we will assume).
 
===Using the spectral theorem===
 
It follows from the [[spectral theorem]] of [[linear algebra]] that a positive-definite symmetric matrix ''S'' has a unique positive-definite symmetric square root ''S''<sup>1/2</sup>.  We can again use the [[trace (matrix)|"cyclic property"]] of the trace to write
 
:<math>\det(\Sigma)^{-n/2} \exp\left(-{1 \over 2} \operatorname{tr} \left( S^{1/2} \Sigma^{-1} S^{1/2} \right) \right).</math>
 
Let ''B'' = ''S''<sup>1/2</sup> ''Σ''<sup> &minus;1</sup> ''S''<sup>1/2</sup>.  Then the expression above becomes
 
:<math>\det(S)^{-n/2} \det(B)^{n/2} \exp\left(-{1 \over 2} \operatorname{tr} (B) \right).</math>
 
The positive-definite matrix ''B'' can be diagonalized, and then the problem of finding the value of ''B'' that maximizes
 
:<math>\det(B)^{n/2} \exp\left(-{1 \over 2} \operatorname{tr} (B) \right)</math>
 
Since the trace of a square matrix equals the sum of eigen-values ([[Trace_(matrix)#Eigenvalue_relationships|"trace and eigenvalues"]]), the equation reduces to the problem of finding the eigen values λ<sub>1</sub>, ..., λ<sub>''p''</sub> that maximize
 
:<math>\lambda_i^{n/2} \exp(-\lambda_i/2).</math>
 
This is just a calculus problem and we get λ<sub>''i''</sub> = ''n'' for all ''i.'' Thus, assume ''Q'' is the matrix of eigen vectors, then
 
:<math>B = Q (n I_p) Q^{-1} = n I_p </math>
 
i.e., ''n'' times the ''p''&times;''p'' identity matrix.
 
===Concluding steps===
 
Finally we get
 
:<math>\Sigma=S^{1/2} B^{-1} S^{1/2}=S^{1/2}((1/n)I_p)S^{1/2}=S/n,\,</math>
 
i.e., the ''p''&times;''p'' "sample covariance matrix"
 
:<math>{S \over n} = {1 \over n}\sum_{i=1}^n (X_i-\overline{X})(X_i-\overline{X})^\mathrm{T}</math>
 
is the maximum-likelihood estimator of the "population covariance matrix" ''&Sigma;''.  At this point we are using a capital ''X'' rather than a lower-case ''x'' because we are thinking of it "as an estimator rather than as an estimate", i.e., as something random whose probability distribution we could profit by knowing.  The random matrix ''S'' can be shown to have a [[Wishart distribution]] with ''n'' &minus; 1 degrees of freedom.<ref>
[[Kanti Mardia|K.V. Mardia]], [[John Kent (statistician)|J.T. Kent]], and [[John Bibby (mathematician)|J.M. Bibby]] (1979) ''[[Multivariate Analysis]]'', [[Academic Press]].
</ref> That is:
 
:<math>\sum_{i=1}^n (X_i-\overline{X})(X_i-\overline{X})^\mathrm{T} \sim W_p(\Sigma,n-1).</math>
 
===Alternative derivation===
 
An alternative derivation of the maximum likelihood estimator can be performed via [[matrix calculus]] formulae (see also [[Determinant#Derivative|differential of a determinant]] and [[Invertible matrix#Derivative of the matrix inverse|differential of the inverse matrix]]). It also verifies the aforementioned fact about the maximum likelihood estimate of the mean. Re-write the likelihood in the log form using the trace trick:
 
:<math>\ln \mathcal{L}(\mu,\Sigma) = \operatorname{const} -{n \over 2} \ln \det(\Sigma) -{1 \over 2} \operatorname{tr}  \left[ \Sigma^{-1} \sum_{i=1}^n (x_i-\mu) (x_i-\mu)^\mathrm{T} \right]. </math>
 
The differential of this log-likelihood is
 
:<math>d \ln \mathcal{L}(\mu,\Sigma) = -{n \over 2} \operatorname{tr} \left[ \Sigma^{-1} \left\{ d \Sigma \right\} \right]</math>
:<math> -{1 \over 2} \operatorname{tr} \left[ - \Sigma^{-1} \{ d \Sigma \} \Sigma^{-1} \sum_{i=1}^n (x_i-\mu)(x_i-\mu)^\mathrm{T} - 2 \Sigma^{-1} \sum_{i=1}^n (x_i - \mu) \{ d \mu \}^\mathrm{T} \right]. </math>
 
It naturally breaks down into the part related to the estimation of the mean, and to the part related to the estimation of the variance. The [[first order condition]] for maximum, <math>d \ln \mathcal{L}(\mu,\Sigma)=0</math>, is satisfied when the terms multiplying <math>d \mu</math> and <math>d \Sigma</math> are identically zero. Assuming (the maximum likelihood estimate of) <math>\Sigma</math> is non-singular, the first order condition for the estimate of the mean vector is
 
:<math> \sum_{i=1}^n (x_i - \mu) = 0,</math>
 
which leads to the maximum likelihood estimator
 
:<math>\widehat \mu = \bar X = {1 \over n} \sum_{i=1}^n X_i.</math>
 
This lets us simplify <math>\sum_{i=1}^n (x_i-\mu)(x_i-\mu)^\mathrm{T} = \sum_{i=1}^n (x_i-\bar x)(x_i-\bar x)^\mathrm{T} = S</math> as defined above. Then the terms involving <math>d \Sigma</math> in <math>d \ln L</math> can be combined as
 
:<math> -{1 \over 2} \operatorname{tr} \left( \Sigma^{-1} \left\{ d \Sigma \right\} \left[ nI_p - \Sigma^{-1} S \right] \right). </math>
 
The first order condition <math>d \ln \mathcal{L}(\mu,\Sigma)=0</math> will hold when the term in the square bracket is (matrix-valued) zero. Pre-multiplying the latter by <math>\Sigma</math> and dividing by <math>n</math> gives
 
:<math>\widehat \Sigma = {1 \over n} S,</math>
 
which of course coincides with the canonical derivation given earlier.
 
Dwyer <ref name="Thomas 2007">{{cite journal| title=Some applications of matrix derivatives in multivariate analysis| journal=Journal of the American Statistical Association|date=June 1967| volume=62 |pages=607–625| doi=10.2307/2283988| author=Dwyer, Paul S.| issue=318| publisher=Journal of the American Statistical Association, Vol. 62, No. 318| jstor=2283988}}</ref> points out that decomposition into two terms such as appears above is "unnecessary" and derives the estimator in two lines of working. Note that it may be not trivial to show that such derived estimator is the unique global maximizer for likelihood function.
<!--
==Maximum likelihood estimation: general case==
{{main|Maximum likelihood}}
The first-order conditions for a MLE of parameter ''&theta;'' are that the first derivative of the log-likelihood function should be null at ''&theta;''<sub>MLE</sub>. Intuitively, the second derivative of the log-likelihood function indicates its curvature : the higher it is, the better identified ''&theta;''<sub>MLE</sub> since the likelihood function will be inverse-V-shaped around ''&theta;''<sub>MLE</sub>.
Formally, it can be proved that
 
:<math>\sqrt{T}(\theta_\text{MLE}-\theta) \rightarrow \mathcal{N}(0,\Omega) \, </math>
 
where <math>\Omega</math> can be estimated by
 
:<math>\left(-\frac{1}{T}\sum_{t=1}^\mathrm{T} \frac{\partial^2 \ell_t}{\partial \theta \, \partial \theta '} (\theta_\text{MLE})\right)^{-1}.</math> -->
 
==Intrinsic covariance matrix estimation==
 
===Intrinsic expectation===
 
Given a [[Sample (statistics)|sample]] of ''n'' independent observations ''x''<sub>1</sub>,..., ''x''<sub>''n''</sub> of a ''p''-dimensional zero-mean Gaussian random variable ''X'' with covariance '''R''', the [[maximum likelihood]] [[estimator]] of '''R''' is given by
 
:<math>\hat{\mathbf{R}} = {1 \over n}\sum_{i=1}^n x_ix_i^\mathrm{T}.</math>
 
The parameter '''R''' belongs to the set of [[Positive-definite matrix|positive-definite matrices]], which is a [[Riemannian manifold]], not a [[vector space]], hence the usual vector-space notions of [[Expected value|expectation]], i.e. "E['''R'''^]", and [[estimator bias]] must be generalized to manifolds to make sense of the problem of covariance matrix estimation. This can be done by defining the expectation of an manifold-valued estimator '''R'''^ with respect to the manifold-valued point '''R''' as
 
:<math>\mathrm{E}_{\mathbf{R}}[\hat{\mathbf{R}}]\ \stackrel{\mathrm{def}}{=}\ \exp_{\mathbf{R}}\mathrm{E}\left[\exp_{\mathbf{R}}^{-1}\hat{\mathbf{R}}\right]</math>
 
where
 
:<math>\exp_{\mathbf{R}}(\hat{\mathbf{R}}) =\mathbf{R}^{\frac{1}{2}}\exp\left(\mathbf{R}^{-\frac{1}{2}}\hat{\mathbf{R}}\mathbf{R}^{-\frac{1}{2}}\right)\mathbf{R}^{\frac{1}{2}}</math>
:<math>\exp_{\mathbf{R}}^{-1}(\hat{\mathbf{R}}) =\mathbf{R}^{\frac{1}{2}}\left(\log\mathbf{R}^{-\frac{1}{2}}\hat{\mathbf{R}}\mathbf{R}^{-\frac{1}{2}}\right)\mathbf{R}^{\frac{1}{2}}</math>
 
are the [[exponential map]] and inverse exponential map, respectively, "exp" and "log" denote the ordinary [[matrix exponential]] and [[matrix logarithm]], and E[·] is the ordinary expectation operator defined on a vector space, in this case the [[tangent space]] of the manifold.<ref name="Smith 2005"/>
 
===Bias of the sample covariance matrix===
 
The [[intrinsic bias]] [[vector field]] of the SCM estimator '''R'''^ is defined to be
 
:<math>\mathbf{B}(\hat{\mathbf{R}}) =\exp_{\mathbf{R}}^{-1}\mathrm{E}_{\mathbf{R}}\left[\hat{\mathbf{R}}\right] =\mathrm{E}\left[\exp_{\mathbf{R}}^{-1}\hat{\mathbf{R}}\right]</math>
 
The intrinsic estimator bias is then given by <math>\exp_{\mathbf{R}}\mathbf{B}(\hat{\mathbf{R}})</math>.
 
For [[Complex number|complex]] Gaussian random variables, this bias vector field can be shown<ref name="Smith 2005"/> to equal
 
:<math>\mathbf{B}(\hat{\mathbf{R}}) =-\beta(p,n)\mathbf{R}</math>
 
where
 
:<math>\beta(p,n) =\frac{1}{p}\left(p\log n + p -\psi(n-p+1) +(n-p+1)\psi(n-p+2)  +\psi(n+1) -(n+1)\psi(n+2)\right)</math>
 
and ψ(·) is the [[digamma function]]. The intrinsic bias of the sample covariance matrix equals
 
:<math>\exp_{\mathbf{R}}\mathbf{B}(\hat{\mathbf{R}}) =e^{-\beta(p,n)}\mathbf{R}</math>
 
and the SCM is asymptotically unbiased as ''n'' → ∞.
 
Similarly, the intrinsic [[Efficiency (statistics)|inefficiency]] of the sample covariance matrix depends upon the [[Riemannian curvature]] of the space of positive-define matrices.
 
==Shrinkage estimation==
 
If the sample size ''n'' is small and the number of considered variables ''p'' is large, the above empirical estimators of covariance and correlation are very unstable.  Specifically, it is possible to furnish estimators that improve considerably upon the maximum likelihood estimate in terms of mean squared error. Moreover, for ''n''&nbsp;<&nbsp;''p'', the empirical estimate of the covariance matrix becomes [[singular matrix|singular]], i.e. it cannot be inverted to compute the [[precision matrix]].
 
As an alternative, many methods have been suggested to improve the estimation of the covariance matrix.  All of these approaches rely on the concept of shrinkage.  This is implicit in [[Bayesian method]]s and in penalized [[maximum likelihood]] methods and explicit in the [[James–Stein estimator|Stein-type shrinkage approach]].
 
A simple version of a shrinkage estimator of the covariance matrix is constructed as  follows. One considers a [[convex combination]] of the empirical estimator (<math>A</math>) with some suitable chosen target (<math>B</math>), e.g., the diagonal matrix.  Subsequently, the mixing parameter (<math>\delta</math>) is selected  to maximize the expected accuracy of the shrunken estimator. This can be done by [[cross-validation (statistics)|cross-validation]], or by using an analytic estimate of the shrinkage intensity. The resulting regularized estimator (<math>\delta A + (1 - \delta) B</math>) can be shown to outperform the maximum likelihood estimator for small samples.  For large samples, the shrinkage intensity will reduce to zero, hence in this case the shrinkage estimator will be identical to the empirical estimator.  Apart from increased efficiency the shrinkage estimate has the additional advantage that it is always positive definite and well conditioned.
 
A review on this topic is given, e.g., in Schäfer and Strimmer 2005.<ref>
J. Schäfer and K. Strimmer (2005) ''[http://www.bepress.com/sagmb/vol4/iss1/art32 A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics]'', Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 32.</ref> A covariance shrinkage estimator is implemented in the [[R programming language|R]] package [http://cran.r-project.org/web/packages/corpcor/index.html "corpcor"] and the [http://scikit-learn.org/stable/modules/covariance.html scikit-learn] library for the [[Python programming language|Python (programming language)]].
 
==See also==
*[[Propagation of uncertainty]]
*[[Sample mean and sample covariance]]
 
==References==
<references/>
{{statistics|correlation|state=expanded}}
 
{{DEFAULTSORT:Estimation Of Covariance Matrices}}
[[Category:Estimation for specific parameters]]
[[Category:Statistical deviation and dispersion]]

Revision as of 06:32, 21 January 2014

Template:Summarize In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator.[1] In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers:[2] "Sample covariance matrices are extremely sensitive to outliers".[3][4]

Statistical analyses of multivariate data often involve exploratory studies of the way in which the variables change in relation to one another and this may be followed up by explicit statistical models involving the covariance matrix of the variables. Thus the estimation of covariance matrices directly from observational data plays two roles:

  • to provide initial estimates that can be used to study the inter-relationships;
  • to provide sample estimates that can be used for model checking.

Estimates of covariance matrices are required at the initial stages of principal component analysis and factor analysis, and are also involved in versions of regression analysis that treat the dependent variables in a data-set, jointly with the independent variable as the outcome of a random sample.

Estimation in a general context

Given a sample consisting of n independent observations x1,..., xn of a p-dimensional random vector XRp×1 (a p×1 column-vector), an unbiased estimator of the (p×p) covariance matrix

is the sample covariance matrix

where is the i-th observation of the p-dimensional random vector, and

is the sample mean. This is true regardless of the distribution of the random variable X, provided of course that the theoretical means and covariances exist. The reason for the factor n − 1 rather than n is essentially the same as the reason for the same factor appearing in unbiased estimates of sample variances and sample covariances, which relates to the fact that the mean is not known and is replaced by the sample mean.

In cases where the distribution of the random variable X is known to be within a certain family of distributions, other estimates may be derived on the basis of that assumption. A well-known instance is when the random variable X is normally distributed: in this case the maximum likelihood estimator of the covariance matrix is slightly different from the unbiased estimate, and is given by

A derivation of this result is given below. Clearly, the difference between the unbiased estimator and the maximum likelihood estimator diminishes for large n.

In the general case, the unbiased estimate of the covariance matrix provides an acceptable estimate when the data vectors in the observed data set are all complete: that is they contain no missing elements. One approach to estimating the covariance matrix is to treat the estimation of each variance or pairwise covariance separately, and to use all the observations for which both variables have valid values. Assuming the missing data are missing at random this results in an estimate for the covariance matrix which is unbiased. However, for many applications this estimate may not be acceptable because the estimated covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimated correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix.

When estimating the cross-covariance of a pair of signals that are wide-sense stationary, missing samples do not need be random (e.g., sub-sampling by an arbitrary factor is valid).

Maximum-likelihood estimation for the multivariate normal distribution

Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church. A random vector XRp (a p×1 "column vector") has a multivariate normal distribution with a nonsingular covariance matrix Σ precisely if Σ ∈ Rp × p is a positive-definite matrix and the probability density function of X is

where μRp×1 is the expected value of X. The covariance matrix Σ is the multidimensional analog of what in one dimension would be the variance, and normalizes the density so that it integrates to 1.

Suppose now that X1, ..., Xn are independent and identically distributed samples from the distribution above. Based on the observed values x1, ..., xn of this sample, we wish to estimate Σ.

First steps

The likelihood function is:

It is fairly readily shown that the maximum-likelihood estimate of the mean vector μ is the "sample mean" vector:

See the section on estimation in the article on the normal distribution for details; the process here is similar.

Since the estimate does not depend on Σ, we can just substitute it for μ in the likelihood function, getting

and then seek the value of Σ that maximizes the likelihood of the data (in practice it is easier to work with log ).

The trace of a 1 × 1 matrix

Now we come to the first surprising step: regard the scalar as the trace of a 1×1 matrix.

This makes it possible to use the identity tr(AB) = tr(BA) whenever A and B are matrices so shaped that both products exist. We get

(so now we are taking the trace of a p×p matrix)

where

is sometimes called the scatter matrix, and is positive definite if there exists a subset of the data consisting of linearly independent observations (which we will assume).

Using the spectral theorem

It follows from the spectral theorem of linear algebra that a positive-definite symmetric matrix S has a unique positive-definite symmetric square root S1/2. We can again use the "cyclic property" of the trace to write

Let B = S1/2 Σ −1 S1/2. Then the expression above becomes

The positive-definite matrix B can be diagonalized, and then the problem of finding the value of B that maximizes

Since the trace of a square matrix equals the sum of eigen-values ("trace and eigenvalues"), the equation reduces to the problem of finding the eigen values λ1, ..., λp that maximize

This is just a calculus problem and we get λi = n for all i. Thus, assume Q is the matrix of eigen vectors, then

i.e., n times the p×p identity matrix.

Concluding steps

Finally we get

i.e., the p×p "sample covariance matrix"

is the maximum-likelihood estimator of the "population covariance matrix" Σ. At this point we are using a capital X rather than a lower-case x because we are thinking of it "as an estimator rather than as an estimate", i.e., as something random whose probability distribution we could profit by knowing. The random matrix S can be shown to have a Wishart distribution with n − 1 degrees of freedom.[5] That is:

Alternative derivation

An alternative derivation of the maximum likelihood estimator can be performed via matrix calculus formulae (see also differential of a determinant and differential of the inverse matrix). It also verifies the aforementioned fact about the maximum likelihood estimate of the mean. Re-write the likelihood in the log form using the trace trick:

The differential of this log-likelihood is

It naturally breaks down into the part related to the estimation of the mean, and to the part related to the estimation of the variance. The first order condition for maximum, , is satisfied when the terms multiplying and are identically zero. Assuming (the maximum likelihood estimate of) is non-singular, the first order condition for the estimate of the mean vector is

which leads to the maximum likelihood estimator

This lets us simplify as defined above. Then the terms involving in can be combined as

The first order condition will hold when the term in the square bracket is (matrix-valued) zero. Pre-multiplying the latter by and dividing by gives

which of course coincides with the canonical derivation given earlier.

Dwyer [6] points out that decomposition into two terms such as appears above is "unnecessary" and derives the estimator in two lines of working. Note that it may be not trivial to show that such derived estimator is the unique global maximizer for likelihood function.

Intrinsic covariance matrix estimation

Intrinsic expectation

Given a sample of n independent observations x1,..., xn of a p-dimensional zero-mean Gaussian random variable X with covariance R, the maximum likelihood estimator of R is given by

The parameter R belongs to the set of positive-definite matrices, which is a Riemannian manifold, not a vector space, hence the usual vector-space notions of expectation, i.e. "E[R^]", and estimator bias must be generalized to manifolds to make sense of the problem of covariance matrix estimation. This can be done by defining the expectation of an manifold-valued estimator R^ with respect to the manifold-valued point R as

where

are the exponential map and inverse exponential map, respectively, "exp" and "log" denote the ordinary matrix exponential and matrix logarithm, and E[·] is the ordinary expectation operator defined on a vector space, in this case the tangent space of the manifold.[1]

Bias of the sample covariance matrix

The intrinsic bias vector field of the SCM estimator R^ is defined to be

The intrinsic estimator bias is then given by .

For complex Gaussian random variables, this bias vector field can be shown[1] to equal

where

and ψ(·) is the digamma function. The intrinsic bias of the sample covariance matrix equals

and the SCM is asymptotically unbiased as n → ∞.

Similarly, the intrinsic inefficiency of the sample covariance matrix depends upon the Riemannian curvature of the space of positive-define matrices.

Shrinkage estimation

If the sample size n is small and the number of considered variables p is large, the above empirical estimators of covariance and correlation are very unstable. Specifically, it is possible to furnish estimators that improve considerably upon the maximum likelihood estimate in terms of mean squared error. Moreover, for n < p, the empirical estimate of the covariance matrix becomes singular, i.e. it cannot be inverted to compute the precision matrix.

As an alternative, many methods have been suggested to improve the estimation of the covariance matrix. All of these approaches rely on the concept of shrinkage. This is implicit in Bayesian methods and in penalized maximum likelihood methods and explicit in the Stein-type shrinkage approach.

A simple version of a shrinkage estimator of the covariance matrix is constructed as follows. One considers a convex combination of the empirical estimator () with some suitable chosen target (), e.g., the diagonal matrix. Subsequently, the mixing parameter () is selected to maximize the expected accuracy of the shrunken estimator. This can be done by cross-validation, or by using an analytic estimate of the shrinkage intensity. The resulting regularized estimator () can be shown to outperform the maximum likelihood estimator for small samples. For large samples, the shrinkage intensity will reduce to zero, hence in this case the shrinkage estimator will be identical to the empirical estimator. Apart from increased efficiency the shrinkage estimate has the additional advantage that it is always positive definite and well conditioned.

A review on this topic is given, e.g., in Schäfer and Strimmer 2005.[7] A covariance shrinkage estimator is implemented in the R package "corpcor" and the scikit-learn library for the Python (programming language).

See also

References

  1. 1.0 1.1 1.2 One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang
  2. Robust Estimation and Outlier Detection with Correlation Coefficients, Susan J. Devlin, R. Gnanadesikan, J. R. Kettenring, Biometrika, Vol. 62, No. 3 (Dec., 1975), pp. 531–545
  3. Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004)
  4. "Modern applied statistics with S", William N. Venables, Brian D. Ripley, Springer, 2002, ISBN 0-387-95457-0, ISBN 978-0-387-95457-8, page 336
  5. K.V. Mardia, J.T. Kent, and J.M. Bibby (1979) Multivariate Analysis, Academic Press.
  6. One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang
  7. J. Schäfer and K. Strimmer (2005) A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 32.

Template:Statistics