Local homeomorphism: Difference between revisions

Revision as of 18:39, 17 March 2013

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of compression.

Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the law of large numbers and ergodic theory.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in large deviations theory; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.

In the field of pseudorandom number generation, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning sufficient typicality.

Definition

Given a discrete-time stationary ergodic stochastic process X on the probability space (Ω, B, p), AEP is an assertion that

- \frac{1}{n} \log p (X_{1}^{n}) \to H (X) as n \to \infty

where $X_{1}^{n}$ denotes the process limited to duration {1, ..., n}, and H(X) or simply H denotes the entropy rate of X, which must exist for all discrete-time stationary processes including the ergodic ones. AEP is proved for finite-valued (i.e. |Ω| < ∞) stationary ergodic stochastic processes in the Shannon-McMillan-Breiman theorem using the ergodic theory and for any i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where H is simply the entropy of a symbol) and the continuous-valued case (where H is the differential entropy instead). The definition of AEP can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven almost sure in all cases.

AEP for discrete-time i.i.d. sources

Given X is an i.i.d. source, its time series X₁, ..., X_n is i.i.d. with entropy H(X) in the discrete-valued case and differential entropy in the continuous-valued case. The weak law of large numbers gives the AEP with convergence in probability,

\lim_{n \to \infty} \Pr [| - \frac{1}{n} \log p (X_{1}, X_{2}, . . ., X_{n}) - H (X) | > ϵ] = 0 \forall ϵ > 0 .

since the entropy is equal to the expectation of

- \frac{1}{n} \log p (X_{1}, X_{2}, . . ., X_{n})

.

The strong law of large numbers asserts the stronger almost sure convergence,

\Pr [\lim_{n \to \infty} - \frac{1}{n} \log p (X_{1}, X_{2}, . . ., X_{n}) = H (X)] = 1

which implies the result from the weak law of large numbers.

AEP for discrete-time finite-valued stationary ergodic sources

Consider a finite-valued sample space Ω, i.e. |Ω| < ∞, for the discrete-time stationary ergodic process $X : = {X_{n}}$ defined on the probability space (Ω, B, p). The AEP for such stochastic source, known as the Shannon-McMillan-Breiman theorem, can be shown using the sandwich proof by Algoet and Cover outlined as follows:

Let x denote some measurable set x = X(A) for some A ∈ B
Parameterize the joint probability by n and x as

j (n, x) : = p (x_{0}^{n - 1}) .

Parameterize the conditional probability by i, k and x as

c (i, k, x) : = p (x_{i} | x_{i - k}^{i - 1}) .

Take the limit of the conditional probability as k → ∞ and denote it as

c (i, x) : = p (x_{i} | x_{- \infty}^{i - 1}) .

Argue the two notions of entropy rate

\lim_{n \to \infty} E [- \log j (n, X)] and \lim_{n \to \infty} E [- \log c (n, n, X)]

exist and are equal for any stationary process including the stationary ergodic process X. Denote it as H.

Argue that both

\begin{aligned} c (i, k, X) & : = {p (X_{i} | X_{i - k}^{i - 1})} \\ c (i, X) & : = {p (X_{i} | X_{- \infty}^{i - 1})} \end{aligned}

where i is the time index, are stationary ergodic processes, whose sample means converge almost surely to some values denoted by H^k and H^∞ respectively.

Define the k-th order Markov approximation to the probability $a (n, k, x)$ as

a (n, k, x) : = p (X_{0}^{k - 1}) \prod_{i = k}^{n - 1} p (X_{i} | X_{i - k}^{i - 1}) = j (k, x) \prod_{i = k}^{n - 1} c (i, k, x)

Argue that $a (n, k, X (Ω))$ is finite from the finite-value assumption.
Express $- \frac{1}{n} \log a (n, k, X)$ in terms of the sample mean of $c (i, k, X)$ and show that it converges almost surely to H^k
Define the probability measure

a (n, x) : = p (x_{0}^{n - 1} | x_{- \infty}^{- 1}) .

Express $- \frac{1}{n} \log a (n, X)$ in terms of the sample mean of $c (i, X)$ and show that it converges almost surely to H^∞.
Argue that $H^{k} ↘ H$ as k → ∞ using the stationarity of the process.
Argue that H = H^∞ using the Lévy's martingale convergence theorem and the finite-value assumption.
Show that

E [\frac{a (n, k, X)}{j (n, X)}] = a (n, k, X (Ω))

which is finite as argued before.

Show that

E [\frac{j (n, X)}{a (n, X)}] = 1

by conditioning on the infinite past

X_{- \infty}^{- 1}

and iterating the expectation.

Show that

\forall α \in ℝ : \Pr [\frac{a (n, k, X)}{j (n, X)} \geq α] \leq \frac{a (n, k, X (Ω))}{α}

using the Markov's inequality and the expectation derived previously.

Similarly, show that

\forall α \in ℝ : \Pr [\frac{j (n, X)}{a (n, X)} \geq α] \leq \frac{1}{α},

which is equivalent to

\forall α \in ℝ : \Pr [\frac{1}{n} \log \frac{j (n, X)}{a (n, X)} \geq \frac{1}{n} \log α] \leq \frac{1}{α} .

Show that limsup of

\frac{1}{n} \log \frac{a (n, k, X)}{j (n, X)} and \frac{1}{n} \log \frac{j (n, X)}{a (n, X)}

are non-positive almost surely by setting α = n^β for any β > 1 and applying the Borel-Cantelli lemma.

Show that liminf and limsup of

- \frac{1}{n} \log j (n, X)

are lower and upper bounded almost surely by H^∞ and H^k respectively by breaking up the logarithms in the previous result.

Complete the proof by pointing out that the upper and lower bounds are shown previously to approach H as k → ∞.

AEP for non-stationary discrete-time source producing independent symbols

The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the AEP to hold. Indeed, as is quite clear intuitively, the AEP requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.

We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that $V a r [\log [[p (X_{i})]]] < M$ for all i, for some M > 0, the following holds (AEP):

\lim_{n \to \infty} \Pr [| - \frac{1}{n} \log p (X_{1}, X_{2}, . . ., X_{n}) - {\overline{H}}_{n} (X) | < ϵ] = 1 \forall ϵ > 0

where

{\overline{H}}_{n} (X) = \frac{1}{n} H (X_{1}, X_{2}, \dots, X_{n})

Proof

The proof follows from a simple application of Markov's inequality (applied to second moment of $\log (p (X_{i}))$ .

\begin{aligned} \Pr [| - \frac{1}{n} \log p (X_{1}, X_{2}, . . ., X_{n}) - \overline{H} (X) | > ϵ] & \leq \frac{1}{n^{2} ϵ^{2}} E [\sum_{i = 1}^{n} {(\log (p (X_{i}))}^{2}] \\ \leq \frac{M}{n ϵ^{2}} \to 0 as n \to \infty \end{aligned}

It is obvious that the proof holds if any moment $E [| \log [[p (X i)]] |^{r}]$ is uniformly bounded for r > 1 (again by Markov's inequality applied to r-th moment). $◻$

Even this condition is not necessary, but given a non-stationary random process, it should not be difficult to test whether the AEP holds using the above method.

Applications for AEP for non-stationary source producing independent symbols

The AEP for non-stationary discrete-time independent process leads us to (among other results) source coding theorem for non-stationary source (with independent output symbols) and channel coding theorem for non-stationary memoryless channels.

Source Coding Theorem

The source coding theorem for discrete time non-stationary independent sources can be found here: source coding theorem

Channel Coding Theorem

Channel coding theorem for discrete time non-stationary memoryless channels can be found here: noisy channel coding theorem

AEP for certain continuous-time stationary ergodic sources

Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as $\tilde{X} : = f \circ X$ . If AEP holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.

- \frac{1}{n} \log p ({\tilde{X}}_{0}^{τ}) \to H (X)

where n corresponds to the degree of freedom in time τ. $n H (X) / τ$ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon.

An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous $ℒ_{2}$ functions. AEP holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.

Any time-invariant operations also preserves AEP, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing AEP by nulling out a finite number of time samples in the process.

Category theory

A category theoretic definition for the equipartition property is given by Gromov^[1] Given a sequence of Cartesian powers $P^{N} = P \times \dots \times P$ of a measure space P, this sequence admits an asymptotically equivalent sequence H_N of homogenous measure spaces (i.e. all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the terminal object) .

The above requires a definition of asymptotic equivalence. This is given in terms of a distance function, giving how much an injective correspondence differs from an isomorphism. An injective correspondence $π : P \to Q$ is a partially defined map that is a bijection; that is, it is a bijection between a subset $P^{'} \subset P$ and $Q^{'} \subset Q$ . Then define

| P - Q |_{π} = | P ∖ P^{'} | + | Q ∖ Q^{'} |

where |S| denotes the measure of a set S. In what follows, the measure of P and Q are taken to be 1, so that the measure spaces are probability spaces. This distance $| P - Q |_{π}$ is commonly known as the earth mover's distance or Wasserstein metric.

Similarly, define

| \log P : Q |_{π} = \frac{\sup_{p \in P^{'}} | \log p - \log π (p) |}{\log \min (| s e t (P) |, | s e t (Q) |)}

with $| s e t (P) |$ taken to be the counting measure on P. Thus, this definition requires that P be a finite measure space. Finally, let

{dist}_{π} (P, Q) = | P - Q |_{π} + | \log P : Q |_{π}

A sequence of injective correspondences $π_{N} : P_{N} \to Q_{N}$ are then asymptotically equivalent when

{dist}_{π_{N}} (P_{N}, Q_{N}) \to 0 as N \to \infty

Given a sequence H_N that is asymptotically equivalent to P^N, the entropy H(P) of P may be taken as

H (P) = \lim_{N \to \infty} \frac{1}{N} | s e t (H_{N}) |

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

The Classic Paper

Claude E. Shannon. "A Mathematical Theory of Communication". Bell System Technical Journal, July/October 1948.

Textbooks on Information Theory

Thomas M. Cover, Joy A. Thomas. Elements of Information Theory New York: Wiley, 1991. ISBN 0-471-06259-6
David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1

↑ Misha Gromov, (2012) "In a Search for a Structure, Part 1: On Entropy". (See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)

[1] Misha Gromov, (2012) "In a Search for a Structure, Part 1: On Entropy". (See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)

[1]

@@ Line 1: / Line 1: @@
-For your own offense, you might surely have Gunboats which can effortlessly shoot at enemy defense coming from a long range and Landing Special creations which you must satisfy when you train gadgets for example Rifleman, Heavy, Zooka, Warrior and Takes a dive. To your village defenses, you might carry structures like Mortar, Hardware Gun, Sniper Tower, Cannon, Flamethrower, Mine, Tank Mine, Boom Cannon and Drive Launcher to assist families eradicate enemies.<br><br>
+In [[information theory]], the '''asymptotic equipartition property''' ('''AEP''') is a general property of the output samples of a [[stochastic process|stochastic source]]. It is fundamental to the concept of ''[[typical set]]'' used in theories of [[data compression|compression]].
-Most of the amend delivers a large amount of notable enhancements, arc of which could be the new Dynasty War Manner. In this mode, you can making claims combating dynasties and alleviate utter rewards aloft his or beat.<br><br>clash of clans is a ideal game, which usually requires one to build your personal village, discover warriors, raid resources and build your personalized clan and so up. there is a lot a lot good deal to this video task and for every these types of you require jewels in play, as you reminiscent of. Clash of Clans hack allows you to get as many jewels as you want. There is an unlimited involving gems you could render with all the Conflict of Clans cheats you can get online, however you ought to be specific about the url you are using for some of them just waste materials your as well as also dont get anybody anything more.<br><br>If the system that your a person is enjoying on possibly can connect with the Net, be sure that you actually fix the settings for your loved ones before he performs to it. You're going to be ready to safeguard your kid from vulnerability to unsavory information utilizing these filter configuration [http://Www.Wired.com/search?query=settings settings].  In the event you loved this informative article along with you wish to be given guidance with regards to how to hack clash of clans ([http://prometeu.net visit my website]) generously visit our own web-site. There are also options to allocate the amount of communicate they can participate folks when online.<br><br>Kill time for game of a person's season editions of very important titles. These often come out per time of year or higher after your current initial headline, but take into account a lot of  down-loadable and extra content which was released living in steps once the for a start headline. These ball game titles supply a tons more bang for all of the buck.<br><br>Via borer on a boondocks anteroom you possibly is going to appearance added advice with regard to that play, scout, accord troops, or attack. Of course, these results will rely on what normally appearance of the hostilities you might be with.<br><br>Games are some of you see, the finest kinds of home entertainment around. They are unquestionably also probably the for the most part pricey types of entertainment, with console games knowning that range from $50 to assist you to $60, and consoles around their own inside my 100s. It is possible to spend considerably on clash of clans hack and console purchases, and you can arrive across out about them throughout the the following paragraphs.
+Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the [[law of large numbers]] and [[ergodic theory]].) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through [[Cramér's large deviation theorem]], which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in [[large deviations theory]]; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.
+In the field of [[Pseudorandom number generator|pseudorandom number generation]], a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning ''sufficient'' typicality.
+== Definition ==
+Given a discrete-time stationary ergodic stochastic process ''X'' on the [[probability space]] (Ω, ''B'', ''p''), AEP is an assertion that
+:<math>-\frac{1}{n} \log p(X_1^n) \to H(X) \quad \mbox{ as } \quad n\to\infty</math>
+where <math>X_1^n</math> denotes the process limited to duration {1, ..., ''n''}, and ''H''(''X'') or simply ''H'' denotes the [[entropy rate]] of ''X'', which must exist for all discrete-time [[stationary process]]es including the ergodic ones. AEP is proved for finite-valued (i.e. |Ω| < ∞) stationary ergodic stochastic processes in the [[#AEP for discrete-time finite-valued stationary ergodic sources|Shannon-McMillan-Breiman theorem]] using the ergodic theory and for any [[independent identically distributed random variables|i.i.d.]] sources directly using the law of large numbers in both the discrete-valued case (where ''H'' is simply the [[entropy]] of a symbol) and the continuous-valued case (where ''H'' is the differential entropy instead). The definition of AEP can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven [[almost sure]] in all cases.
+== AEP for discrete-time i.i.d. sources ==
+Given ''X'' is an [[independent identically distributed random variables|i.i.d.]] source, its [[time series]] ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> is i.i.d. with [[entropy]] ''H''(''X'') in the discrete-valued case and [[differential entropy]] in the continuous-valued case. The weak [[law of large numbers]] gives the AEP with convergence in probability,
+:<math>\lim_{n\to\infty}\Pr\left[\left|-\frac{1}{n} \log p(X_1, X_2, ..., X_n) - H(X)\right|> \epsilon\right]=0 \qquad \forall \epsilon>0.</math>
+since the entropy is equal to the expectation of
+:<math>-\frac{1}{n} \log p(X_1, X_2, ..., X_n)</math>.
+<!-- The following is not comprehensible: since the (differential) entropy is equal to the entropy rate, which is the expectation of the supprise per symbol <math>-\frac{1}{n} \log p(X_1, X_2, ..., X_n)</math>, which is equal to the sample mean in the i.i.d. case.-->
+The strong law of large numbers asserts the stronger almost sure convergence,
+:<math>\Pr\left[\lim_{n\to\infty} - \frac{1}{n} \log p(X_1, X_2, ..., X_n) = H(X)\right]=1</math>
+which implies the result from the weak law of large numbers.
+== AEP for discrete-time finite-valued stationary ergodic sources ==
+Consider a finite-valued sample space Ω, i.e. |Ω| < ∞, for the discrete-time [[stationary ergodic process]] <math>X:=\{X_n\}</math> defined on the [[probability space]] (Ω, ''B'', ''p''). The AEP for such stochastic source, known as the '''Shannon-McMillan-Breiman theorem''', can be shown using the sandwich proof by Algoet and Cover outlined as follows:
+* Let ''x'' denote some measurable set ''x'' = ''X''(''A'') for some ''A'' ∈ ''B''
+* Parameterize the joint probability by ''n'' and ''x'' as
+::<math>j(n,x):=p\left(x_0^{n-1} \right).</math>
+* Parameterize the conditional probability by ''i'', ''k'' and ''x'' as
+::<math>c(i,k,x) := p \left (x_i|x_{i-k}^{i-1} \right).</math>
+* Take the limit of the conditional probability as ''k'' → ∞ and denote it as
+::<math>c(i,x):=p \left(x_i|x_{-\infty}^{i-1} \right ).</math>
+* Argue the two notions of entropy rate
+::<math>\lim_{n\to\infty} \mathrm{E}[-\log j(n,X)]\quad \text{and} \quad \lim_{n\to\infty} \mathrm{E}[-\log c(n,n,X)]</math>
+:exist and are equal for any stationary process including the stationary ergodic process ''X''. Denote it as ''H''.
+* Argue that both
+::<math>\begin{align}
+c(i,k,X) &:= \left \{p \left(X_i|X_{i-k}^{i-1} \right ) \right \} \\
+c(i,X) &:= \left \{p \left (X_i|X_{-\infty}^{i-1} \right ) \right \}
+\end{align}</math>
+:where ''i'' is the time index, are stationary ergodic processes, whose sample means converge [[almost surely]] to some values denoted by ''H<sup>k</sup>'' and ''H<sup>∞</sup>'' respectively.
+* Define the ''k''-th order Markov approximation to the probability <math>a(n,k,x)</math> as
+::<math>a(n,k,x):=p \left(X_0^{k-1} \right)\prod_{i=k}^{n-1}p \left (X_i|X_{i-k}^{i-1} \right )=j(k,x)\prod_{i=k}^{n-1} c(i,k,x)</math>
+* Argue that <math>a(n,k,X(\Omega))</math> is finite from the finite-value assumption.
+* Express <math>-\frac1n\log a(n,k,X)</math> in terms of the sample mean of <math>c(i,k,X)</math> and show that it converges almost surely to ''H<sup>k</sup>''
+* Define the probability measure
+::<math>a(n,x):=p \left (x_0^{n-1}|x_{-\infty}^{-1} \right ).</math>
+* Express <math>-\frac1n\log a(n,X)</math> in terms of the sample mean of <math>c(i,X)</math> and show that it converges almost surely to ''H<sup>∞</sup>''.
+* Argue that <math>H^k\searrow H</math> as ''k'' → ∞ using the stationarity of the process.
+* Argue that ''H'' = ''H<sup>∞</sup>'' using the [[Lévy's martingale convergence theorem]] and the finite-value assumption.
+* Show that
+::<math>\mathrm{E}\left[\frac{a(n,k,X)}{j(n,X)}\right]=a(n, k,X(\Omega))</math>
+:which is finite as argued before.
+* Show that
+::<math>\mathrm{E}\left[\frac{j(n,X)}{a(n,X)}\right]=1</math>
+:by conditioning on the infinite past <math>X_{-\infty}^{-1}</math> and iterating the expectation.
+* Show that
+::<math>\forall \alpha\in\mathbb{R} \ : \ \Pr\left[\frac{a(n,k,X)}{j(n,X)}\geq \alpha \right]\leq \frac{a(n, k,X(\Omega))}{\alpha}</math>
+:using the [[Markov's inequality]] and the expectation derived previously.
+* Similarly, show that
+::<math>\forall \alpha\in\mathbb{R} \ : \ \Pr\left[\frac{j(n,X)}{a(n,X)}\geq \alpha \right]\leq \frac{1}{\alpha},</math>
+:which is equivalent to
+::<math>\forall \alpha\in\mathbb{R} \ : \ \Pr\left[\frac1n\log\frac{j(n,X)}{a(n,X)}\geq \frac{1}{n}\log\alpha \right]\leq \frac{1}{\alpha}.</math>
+* Show that limsup of
+::<math>\frac1n \log \frac{a(n,k,X)}{j(n,X)} \quad \text{and} \quad \frac{1}{n} \log\frac{j(n,X)}{a(n,X)}</math>
+:are non-positive almost surely by setting α = ''n''<sup>β</sup> for any β > 1 and applying the [[Borel-Cantelli lemma]].
+* Show that liminf and limsup of
+::<math>-\frac{1}{n} \log j(n,X)</math>
+:are lower and upper bounded almost surely by ''H<sup>∞</sup>'' and ''H<sup>k</sup>'' respectively by breaking up the logarithms in the previous result.
+* Complete the proof by pointing out that the upper and lower bounds are shown previously to approach ''H'' as ''k'' → ∞.
+== AEP for non-stationary discrete-time source producing independent symbols ==
+The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the AEP to hold. Indeed, as is quite clear intuitively, the AEP requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.
+We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that <math>\mathrm{Var}[\log[[p(X_i)]]]<M</math> for all ''i'', for some ''M'' > 0, the following holds (AEP):
+:<math>\lim_{n\to\infty}\Pr\left[\,\left|-\frac{1}{n} \log p(X_1, X_2, ..., X_n) - \overline{H}_n(X)\right|< \epsilon\right]=1\qquad \forall \epsilon>0</math>
+where
+:<math>\overline{H}_n(X)=\frac{1}{n}H(X_1,X_2,\ldots,X_n)</math>
+===Proof===
+The proof follows from a simple application of [[Markov's inequality]] (applied to second moment of <math>\log(p(X_i))</math>.
+:<math>\begin{align}
+\Pr \left[\left|-\frac{1}{n} \log p(X_1, X_2, ..., X_n) -\overline{H}(X)\right|> \epsilon\right] &\leq \frac{1}{n^2 \epsilon^2} \mathrm{E} \left [\sum_{i=1}^n \left(\log(p(X_i) \right)^2 \right ]\\
+&\leq \frac{M}{n \epsilon^2} \to 0 \ \mbox{as} \ n\to \infty
+\end{align}</math>
+It is obvious that the proof holds if any moment <math>\mathrm{E} \left [|\log[[p(X i)]]|^r \right ]</math> is uniformly bounded for ''r'' > 1 (again by [[Markov's inequality]] applied to ''r''-th moment). <math>\Box{}</math>
+Even this condition is not necessary, but given a non-stationary random process, it should not be difficult to test whether the AEP holds using the above method.
+=== Applications for AEP for non-stationary source producing independent symbols ===
+The AEP for non-stationary discrete-time independent process leads us to (among other results) source coding theorem for non-stationary source (with independent output symbols) and channel coding theorem for non-stationary memoryless channels.
+==== Source Coding Theorem ====
+The source coding theorem for discrete time non-stationary independent sources can be found here: [[source coding theorem]]
+==== Channel Coding Theorem ====
+Channel coding theorem for discrete time non-stationary memoryless channels can be found here: [[noisy channel coding theorem]]
+== AEP for certain continuous-time stationary ergodic sources ==
+Discrete-time functions can be interpolated to continuous-time functions. If such interpolation ''f'' is [[measurable]], we may define the continuous-time stationary process accordingly as <math>\tilde{X}:=f\circ X</math>. If AEP holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.
+:<math>-\frac{1}{n} \log p(\tilde{X}_0^\tau) \to H(X)</math>
+where ''n'' corresponds to the degree of freedom in time τ. <math>nH(X)/\tau</math> and ''H''(''X'') are the entropy per unit time and per degree of freedom respectively, defined by [[Claude E. Shannon|Shannon]].
+An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous <math>\mathcal{L}_2</math> functions. AEP holds if the process is white, in which case the time samples are i.i.d., or there exists ''T'' > 1/2''W'', where ''W'' is the [[Bandwidth (signal processing)|nominal bandwidth]], such that the ''T''-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.
+Any [[time-invariant]] operations also preserves AEP, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing AEP by nulling out a finite number of time samples in the process.
+==Category theory==
+A [[category theoretic]] definition for the equipartition property is given by [[Gromov]]<ref>Misha Gromov, (2012) "[http://www.ihes.fr/~gromov/PDF/structres-entropy-june-2012.pdf In a Search for a Structure, Part 1: On Entropy]". ''(See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)''</ref> Given a sequence of [[Product (category theory)|Cartesian powers]] <math>P^N=P\times \cdots \times P</math> of a measure space ''P'', this sequence admits an ''asymptotically equivalent'' sequence ''H<sub>N</sub>'' of homogenous measure spaces (''i.e.'' all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the [[terminal object]]) .
+The above requires a definition of ''asymptotic equivalence''. This is given in terms of a distance function, giving how much an [[injective correspondence]] differs from an [[isomorphism]].  An injective correspondence <math>\pi: P\to Q</math> is a [[partially defined map]] that is a [[bijection]]; that is, it is a bijection between a subset <math>P'\subset P</math> and <math>Q'\subset Q</math>.  Then define
+:<math>|P-Q|_\pi = |P\smallsetminus P'| + |Q\smallsetminus Q'|</math>
+where |''S''| denotes the measure of a set ''S''.  In what follows, the measure of ''P'' and ''Q'' are taken to be 1, so that the measure spaces are probability spaces. This distance <math>|P-Q|_\pi</math> is commonly known as the [[earth mover's distance]] or [[Wasserstein metric]].
+Similarly, define
+:<math>|\log P:Q|_\pi = \frac{\sup_{p\in P^'}|\log p - \log \pi(p)|}{\log \min \left(|set(P)|,|set(Q)|\right)}</math>
+with <math>|set(P)|</math> taken to be the counting measure on ''P''.  Thus, this definition requires that ''P'' be a finite measure space.  Finally, let
+:<math>\mbox{dist}_\pi(P,Q) = |P-Q|_\pi +|\log P:Q|_\pi</math>
+A sequence of injective correspondences <math>\pi_N:P_N\to Q_N</math> are then '''asymptotically equivalent''' when
+:<math>\mbox{dist}_{\pi_N}(P_N,Q_N) \to 0 \quad\mbox{ as }\quad N\to\infty</math>
+Given a sequence ''H<sub>N</sub>'' that is asymptotically equivalent to ''P<sup>N</sup>'', the entropy ''H''(''P'') of ''P'' may be taken as
+:<math>H(P)=\lim_{N\to\infty}\frac{1}{N} |set(H_N)|</math>
+==See also==
+* [[Cramér's large deviation theorem]]
+* [[Typical set]]
+* [[Source coding theorem]]
+* [[Noisy-channel coding theorem]]
+==References==
+{{reflist}}
+=== The Classic Paper ===
+* Claude E. Shannon. "[[A Mathematical Theory of Communication]]". ''Bell System Technical Journal'', July/October 1948.
+=== Other Journal Articles ===
+* Paul H. Algoet and Thomas M. Cover. "[http://yreka.stanford.edu/~cover/papers/paper084.pdf A Sandwich Proof of the Shannon-McMillan-Breiman Theorem]". ''The Annals of Probability'', '''16'''(2): pp 899-909, 1988.
+* Sergio Verdu and Te Sun Han. "The Role of the Asymptotic Equipartition Property in Noiseless Source Coding."  ''IEEE Transactions on Information Theory'', '''43'''(3): 847-857, 1997.
+=== Textbooks on Information Theory ===
+* Thomas M. Cover, Joy A. Thomas. ''Elements of Information Theory'' New York: Wiley, 1991. ISBN 0-471-06259-6
+* [[David J. C. MacKay]]. ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'' Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
+[[Category:Information theory]]
+[[Category:Statistical theorems]]