|
|
Line 1: |
Line 1: |
| | | Nice to satisfy you, my name is Refugia. North Dakota is her birth location but she will have to move 1 working day or another. I am a meter reader. Body building is what my family members and I appreciate.<br><br>my homepage: [http://www.wildabouttheworld.com/forum/members/alfonsogladyswyvkj.html www.wildabouttheworld.com] |
| [[File:Largenumbers.svg|An illustration of the law of large numbers using a particular run of rolls of a single [[dice|die]]. As the number of rolls in this run increases, the average of the values of all the results approaches 3.5. While different runs would show a different shape over a small number of throws (at the left), over a large number of rolls (to the right) they would be extremely similar.|thumb|right|400 px]]
| |
| {{Probability fundamentals}}
| |
| In [[probability theory]], the '''law of large numbers''' ('''LLN''') is a [[theorem]] that describes the result of performing the same experiment a large number of times. According to the law, the [[average]] of the results obtained from a large number of trials should be close to the [[expected value]], and will tend to become closer as more trials are performed.
| |
| | |
| The LLN is important because it "guarantees" stable long-term results for the averages of random events. For example, while a casino may lose money in a single spin of the [[roulette]] wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the LLN only applies (as the name indicates) when a ''large number'' of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others. See the [[Gambler's fallacy]].
| |
| | |
| ==Examples==
| |
| For example, a single roll of a six-sided [[dice]] produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal [[probability]]. Therefore, the expected value of a single die roll is
| |
| : <math> \tfrac{1+2+3+4+5+6}{6} = 3.5.</math>
| |
| According to the law of large numbers, if a large number of six-sided die are rolled, the average of their values (sometimes called the [[sample mean]]) is likely to be close to 3.5, with the precision increasing as more dice are rolled.
| |
| | |
| It follows from the law of large numbers that the [[empirical probability]] of success in a series of [[Bernoulli trials]] will converge to the theoretical probability. For a [[Bernoulli random variable]], the expected value is the theoretical probability of success, and the average of ''n'' such variables (assuming they are [[Independent and identically distributed random variables|independent and identically distributed (i.i.d.)]]) is precisely the relative frequency.
| |
| | |
| For example, a [[fair coin]] toss is a [[Bernoulli trial]]. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to 1/2. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly 1/2. In particular, the proportion of heads after ''n'' flips will [[almost surely]] [[limit of a sequence|converge]] to 1/2 as ''n'' approaches infinity.
| |
| | |
| Though the proportion of heads (and tails) approaches 1/2, almost surely the [[absolute difference|absolute (nominal) difference]] in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number, approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, as the number of flips grows.
| |
| | |
| ==History==
| |
| [[File:DiffusionMicroMacro.gif|thumb|right|250px|[[Molecular diffusion|Diffusion]] is an example of the law of large numbers, applied to [[chemistry]]. Initially, there are [[solution|solute]] molecules on the left side of a barrier (purple line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container.<br> <u>Top:</u> With a single molecule, the motion appears to be quite random.<br>
| |
| <u>Middle:</u> With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations.<br>
| |
| <u>Bottom:</u> With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see [[Fick's law]]s), despite its underlying random nature.]]
| |
| | |
| The Italian mathematician [[Gerolamo Cardano]] (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials.<ref>Mlodinow, L. ''The Drunkard's Walk.'' New York: Random House, 2008. p. 50.</ref> This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by [[Jacob Bernoulli]].<ref>Jakob Bernoulli, ''Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis'', 1713, Chapter 4, (Translated into English by Oscar Sheynin)</ref> It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his ''[[Ars Conjectandi]]'' (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's Theorem". This should not be confused with the principle in physics with [[Bernoulli's principle|the same name]], named after Jacob Bernoulli's nephew [[Daniel Bernoulli]]. In 1837, [[Siméon Denis Poisson|S.D. Poisson]] further described it under the name "la loi des grands nombres" ("The law of large numbers").<ref>Poisson names the "law of large numbers" (la loi des grands nombres) in: S.D. Poisson, ''Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés'' (Paris, France: Bachelier, 1837), [http://books.google.com/books?id=uovoFE3gt2EC&pg=PA7#v=onepage&q&f=false page 7]. He attempts a two-part proof of the law on pages 139-143 and pages 277 ff.</ref><ref>Hacking, Ian. (1983) "19th-century Cracks in the Concept of Determinism", ''Journal of the History of Ideas'', 44 (3), 455-475 {{jstor|2709176}}</ref> Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used.
| |
| | |
| After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including [[Pafnuty Chebyshev|Chebyshev]],<ref>{{cite doi|10.1515/crll.1846.33.259}}</ref> [[Andrey Markov|Markov]], [[Émile Borel|Borel]], [[Francesco Paolo Cantelli|Cantelli]] and [[Andrey Kolmogorov|Kolmogorov]] and [[Aleksandr Khinchin|Khinchin]], who finally provided a complete proof of the LLN for arbitrary [[random variables]]. These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law. These forms do not describe different laws but instead refer to different ways of describing the mode of [[limit of a sequence|convergence]] of the cumulative sample means to the expected value, and the strong form implies the weak.
| |
| <!--
| |
| We need a discussion of the Uniform LLN also. -->
| |
| | |
| ==Forms==
| |
| Two different versions of the '''Law of Large Numbers''' are described below; they are called the '' '''Strong Law''' of Large Numbers'', and the '' '''Weak Law''' of Large Numbers''.
| |
| Both versions of the law state that – with virtual certainty – the sample average
| |
| | |
| :<math>\overline{X}_n=\frac1n(X_1+\cdots+X_n) </math>
| |
| | |
| converges to the expected value
| |
| | |
| :<math>\overline{X}_n \, \to \, \mu \qquad\textrm{for}\qquad n \to \infty</math>
| |
| | |
| where ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... </sub> is an infinite sequence of [[i.i.d.]] Lebesgue integrable random variables with expected value E(''X''<sub>1</sub>) = E(''X''<sub>2</sub>) = ...= ''µ''. Lebesgue Integrability of ''X<sub>j</sub>'' means that the expected value E(''X<sub>j</sub>'') exists according to [[Lebesgue integration]] and is finite.
| |
| | |
| An assumption of finite [[variance]] Var(''X''<sub>1</sub>) = Var(''X''<sub>2</sub>) = ... = ''σ''<sup>2</sup> < ∞ is '''not necessary'''. Large or infinite variance will make the convergence slower, but the LLN holds anyway. This assumption is often used because it makes the proofs easier and shorter.
| |
| | |
| The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see [[Convergence of random variables]].
| |
| | |
| ===Weak law===
| |
| [[File:Lawoflargenumbersanimation2.gif|thumb|Simulation illustrating the Law of Large Numbers. Each frame, you flip a coin that is red on one side and blue on the other, and put a dot in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that the proportion varies a lot at first, but gradually approaches 50%.]]
| |
| The '''weak law of large numbers''' (also called Khintchine's law) states that the sample average [[Convergence in probability|converges in probability]] towards the expected value<ref>{{harvnb|Loève|1977|loc=Chapter 1.4, page 14}}</ref><sup>[[Proof of the law of large numbers#Proof|[proof]]]</sup>
| |
| : <math>
| |
| \overline{X}_n\ \xrightarrow{P}\ \mu \qquad\textrm{when}\ n \to \infty.
| |
| </math>
| |
| | |
| That is to say that for any positive number ''ε'',
| |
| : <math>
| |
| \lim_{n\to\infty}\Pr\!\left(\,|\overline{X}_n-\mu| > \varepsilon\,\right) = 0.
| |
| </math>
| |
| | |
| Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.
| |
| | |
| Convergence in probability is also called weak convergence of random variables. This version is called the weak law because random variables may converge weakly (in probability) as above without converging strongly (almost surely) as below.
| |
| | |
| ===Strong law===
| |
| The '''strong law of large numbers''' states that the sample average [[Almost sure convergence|converges almost surely]] to the expected value<ref>{{harvnb|Loève|1977|loc=Chapter 17.3, page 251}}</ref>
| |
| : <math>
| |
| \overline{X}_n\ \xrightarrow{a.s.}\ \mu \qquad\textrm{when}\ n \to \infty.
| |
| </math>
| |
| | |
| That is,
| |
| : <math>
| |
| \Pr\!\left( \lim_{n\to\infty}\overline{X}_n = \mu \right) = 1.
| |
| </math>
| |
| | |
| The proof is more complex than that of the weak law.<ref>{{cite web|url=http://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/ |title=The strong law of large numbers « What’s new |publisher=Terrytao.wordpress.com |date= |accessdate=2012-06-09}}</ref> This law justifies the intuitive interpretation of the expected value of a random variable when sampled repeatedly as the "long-term average".
| |
| | |
| Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.
| |
| | |
| The strong law of large numbers can itself be seen as a special case of the [[Ergodic theory#Ergodic theorems|pointwise ergodic theorem]].
| |
| | |
| Moreover, if the summands are independent but not identically distributed, then
| |
| : <math>
| |
| \overline{X}_n - \operatorname{E}\big[\overline{X}_n\big]\ \xrightarrow{a.s.}\ 0
| |
| </math>
| |
| provided that each ''X''<sub>''k''</sub> has a finite second moment and
| |
| : <math>
| |
| \sum_{k=1}^{\infty} \frac{1}{k^2} \operatorname{Var}[X_k] < \infty.
| |
| </math>
| |
| | |
| This statement is known as ''Kolmogorov's strong law'', see e.g. {{harvtxt|Sen|Singer|1993|loc=Theorem 2.3.10}}.
| |
| | |
| ===Differences between the weak law and the strong law===
| |
| The ''weak law'' states that for a specified large ''n'', the average <math style="vertical-align:-.35em">\overline{X}_n</math> is likely to be near ''μ''. Thus, it leaves open the possibility that <math style="vertical-align:-.4em">|\overline{X}_n -\mu| > \varepsilon</math> happens an infinite number of times, although at infrequent intervals.
| |
| | |
| The ''strong law'' shows that this [[almost surely]] will not occur. In particular, it implies that with probability 1, we have that for any {{nowrap|''ε'' > 0}} the inequality <math style="vertical-align:-.4em">|\overline{X}_n -\mu| < \varepsilon</math> holds for all large enough ''n''.<ref>{{harvtxt|Ross|2009}}</ref>
| |
| | |
| ===Uniform law of large numbers===
| |
| Suppose ''f''(''x'',''θ'') is some [[Function (mathematics)|function]] defined for ''θ'' ∈ Θ, and continuous in ''θ''. Then for any fixed ''θ'', the sequence {''f''(''X''<sub>1</sub>,''θ''), ''f''(''X''<sub>2</sub>,''θ''), …} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[''f''(''X'',''θ'')]. This is the ''pointwise'' (in ''θ'') convergence.
| |
| | |
| The '''uniform law of large numbers''' states the conditions under which the convergence happens ''uniformly'' in ''θ''. If<ref>{{harvnb|Newey|McFadden|1994|loc=Lemma 2.4}}</ref><ref>{{cite journal|doi=10.1214/aoms/1177697731|title=Asymptotic Properties of Non-Linear Least Squares Estimators|year=1969|last1=Jennrich|first1=Robert I.|journal=The Annals of Mathematical Statistics|volume=40|issue=2|pages=633–643}}</ref>
| |
| <ol>
| |
| <li> Θ is compact,
| |
| <li> ''f''(''x'',''θ'') is continuous at each ''θ'' ∈ Θ for [[Almost everywhere|almost all]] ''x''’s, and measurable function of ''x'' at each ''θ''.
| |
| <li> there exists a dominating function ''d''(''x'') such that E[''d''(''X'')] < ∞, and
| |
| : <math> \left\| f(x,\theta) \right\| \leq d(x) \quad\text{for all}\ \theta\in\Theta.</math>
| |
| </ol>
| |
| Then E[''f''(''X'',''θ'')] is continuous in ''θ'', and
| |
| : <math>
| |
| \sup_{\theta\in\Theta} \left\| \frac1n\sum_{i=1}^n f(X_i,\theta) - \operatorname{E}[f(X,\theta)] \right\| \xrightarrow{\mathrm{a.s.}} \ 0.
| |
| </math>
| |
| | |
| ===Borel's law of large numbers===
| |
| '''Borel's law of large numbers''', named after [[Émile Borel]], states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if ''E'' denotes the event in question, ''p'' its probability of occurrence, and ''N<sub>n</sub>''(''E'') the number of times ''E'' occurs in the first ''n'' trials, then with probability one,
| |
| | |
| : <math> \frac{N_n(E)}{n}\to p\text{ as }n\to\infty.\, </math>
| |
| | |
| '''Chebyshev's Lemma'''. Let ''X'' be a [[random variable]] with finite [[expected value]] ''μ'' and finite non-zero [[variance]] ''σ''<sup>2</sup>. Then for any [[real number]] {{nowrap|''k'' > 0}},
| |
| : <math>
| |
| \Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.
| |
| </math>
| |
| | |
| This theorem makes rigorous the intuitive notion of probability as the long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.
| |
| | |
| ==Proof==
| |
| Given ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... </sub> an infinite sequence of [[i.i.d.]] random variables with finite expected value ''E(X''<sub>1</sub>'')'' = ''E(X''<sub>2</sub>'')'' = ... = µ < ∞, we are interested in the convergence of the sample average
| |
| | |
| :<math>\overline{X}_n=\tfrac1n(X_1+\cdots+X_n). </math>
| |
| | |
| The weak law of large numbers states:
| |
| | |
| '''Theorem:''' <math>\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\textrm{for}\qquad n \to \infty.</math>
| |
| | |
| ===Proof using Chebyshev's inequality===
| |
| This proof uses the assumption of finite [[variance]] <math> \operatorname{Var} (X_i)=\sigma^2 </math> (for all <math>i</math>). The independence of the random variables implies no correlation between them, and we have that
| |
| | |
| :<math>
| |
| \operatorname{Var}(\overline{X}_n) = \operatorname{Var}(\tfrac1n(X_1+\cdots+X_n)) = \frac{1}{n^2} \operatorname{Var}(X_1+\cdots+X_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.
| |
| </math>
| |
| | |
| The common mean μ of the sequence is the mean of the sample average:
| |
| | |
| :<math>
| |
| E(\overline{X}_n) = \mu.
| |
| </math>
| |
| | |
| Using [[Chebyshev's inequality]] on <math>\overline{X}_n </math> results in
| |
| | |
| :<math>
| |
| \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{n\varepsilon^2}.
| |
| </math>
| |
| | |
| This may be used to obtain the following:
| |
| | |
| :<math>
| |
| \operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{n \varepsilon^2 }.
| |
| </math>
| |
| | |
| As ''n'' approaches infinity, the expression approaches 1. And by definition of [[convergence in probability]], we have obtained
| |
| | |
| :<math>\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\textrm{for}\qquad n \to \infty.</math>
| |
| | |
| ===Proof using convergence of characteristic functions===
| |
| By [[Taylor's theorem]] for [[complex function]]s, the [[Characteristic function (probability theory)|characteristic function]] of any random variable, ''X'', with finite mean μ, can be written as
| |
| | |
| :<math>\varphi_X(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.</math>
| |
| | |
| All ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... </sub> have the same characteristic function, so we will simply denote this ''φ''<sub>''X''</sub>.
| |
| | |
| Among the basic properties of characteristic functions there are
| |
| | |
| :<math>\varphi_{\frac 1 n X}(t)= \varphi_X(\tfrac t n) \quad \text{and} \quad
| |
| \varphi_{X+Y}(t)=\varphi_X(t) \varphi_Y(t) \quad </math> if ''X'' and ''Y'' are independent.
| |
| | |
| These rules can be used to calculate the characteristic function of <math>\scriptstyle\overline{X}_n</math> in terms of ''φ''<sub>''X''</sub>:
| |
| | |
| :<math>\varphi_{\overline{X}_n}(t)= \left[\varphi_X\left({t \over n}\right)\right]^n = \left[1 + i\mu{t \over n} + o\left({t \over n}\right)\right]^n \, \rightarrow \, e^{it\mu}, \quad \text{as} \quad n \rightarrow \infty.</math>
| |
| | |
| The limit ''e''<sup>''it''μ</sup> is the characteristic function of the constant random variable μ, and hence by the [[Lévy continuity theorem]], <math> \scriptstyle\overline{X}_n</math> [[Convergence in distribution|converges in distribution]] to μ:
| |
| | |
| :<math>\overline{X}_n \, \xrightarrow{\mathcal D} \, \mu \qquad\text{for}\qquad n \to \infty.</math>
| |
| | |
| μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see [[Convergence of random variables]].) Therefore,
| |
| | |
| :<math>\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\text{for}\qquad n \to \infty.</math>
| |
| | |
| This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.
| |
| | |
| ==See also==
| |
| * [[Asymptotic equipartition property]]
| |
| * [[Central limit theorem]]
| |
| * [[Infinite monkey theorem]]
| |
| * [[Law of averages]]
| |
| * [[Law of the iterated logarithm]]
| |
| * [[Regression toward the mean]]
| |
| * [[Lindy Effect]]
| |
| | |
| ==Notes==
| |
| {{Reflist|2}}
| |
| | |
| ==References==
| |
| {{refbegin}}
| |
| *{{cite book | author=Grimmett, G. R. and Stirzaker, D. R. | title=Probability and Random Processes, 2nd Edition | publisher=Clarendon Press, Oxford | year=1992 | isbn=0-19-853665-8}}
| |
| *{{cite book | author=Richard Durrett | title=Probability: Theory and Examples, 2nd Edition | publisher=Duxbury Press | year=1995}}
| |
| *{{cite book | author=Martin Jacobsen | publisher= HCØ-tryk, Copenhagen | year=1992|title=Videregående Sandsynlighedsregning (Advanced Probability Theory) 3rd Edition''| isbn=87-91180-71-6}}
| |
| * {{cite book
| |
| | last = Loève | first = Michel
| |
| | title = Probability theory 1
| |
| | year = 1977
| |
| | edition = 4th
| |
| | publisher = Springer Verlag
| |
| | ref = CITEREFLo.C3.A8ve1977
| |
| }}
| |
| * {{cite book
| |
| | last1 = Newey | first1 = Whitney K.
| |
| | last2 = McFadden | first2 = Daniel | authorlink2 = Daniel McFadden
| |
| | title = Large sample estimation and hypothesis testing
| |
| | series = Handbook of econometrics, vol.IV, Ch.36
| |
| | year = 1994
| |
| | publisher = Elsevier Science
| |
| | pages = 2111–2245
| |
| | ref = CITEREFNeweyMcFadden1994
| |
| }}
| |
| * {{cite book
| |
| | last = Ross | first = Sheldon
| |
| | title = A first course in probability
| |
| | year = 2009
| |
| | edition = 8th
| |
| | publisher = Prentice Hall press
| |
| | isbn = 978-0-13-603313-4
| |
| }}
| |
| * {{cite book
| |
| | last1 = Sen | first1 = P. K
| |
| | last2 = Singer | first2 = J. M.
| |
| | year = 1993
| |
| | title = Large sample methods in statistics
| |
| | publisher = Chapman & Hall, Inc
| |
| | ref = CITEREFSenSinger1993
| |
| }}
| |
| {{refend}}
| |
| | |
| ==External links==
| |
| * {{springer|title=Law of large numbers|id=p/l057720}}
| |
| * {{MathWorld|urlname=WeakLawofLargeNumbers|title=Weak Law of Large Numbers}}
| |
| * {{MathWorld|urlname=StrongLawofLargeNumbers|title=Strong Law of Large Numbers}}
| |
| * [http://animation.yihui.name/prob:law_of_large_numbers Animations for the Law of Large Numbers] by Yihui Xie using the [[R (programming language)|R]] package [http://cran.r-project.org/package=animation animation]
| |
| | |
| [[Category:Probability theorems]]
| |
| [[Category:Mathematical proofs]]
| |
| [[Category:Statistical terminology]]
| |
| [[Category:Statistical theorems]]
| |