# Talk:Probability theory

Template:Maths rating Template:Vital article Template:WikiProject Statistics Template:WP1.0 Template:DelistedGA

This page appears to be mostly or entirely redundant with the probability page. I'd suggest that we merge any differences into probability and remove this page. Comments? Wile E. Heresiarch 15:14, 27 Dec 2003 (UTC)

OK, on second thought it does make sense to have a separate page for probability theory. Other pages can refer specifically theoretical issues to the theory page. 128.138.85.48 02:25, 3 Jan 2004 (UTC)

The external link is broken.

Probability Theory shouldn't be included in the Discrete Math Category. Dennis 17:40, 16 Dec 2004 (UTC)

I disagree. First, probability in discrete spaces has peculiar features distinct from continuous spaces. In addition, many of the techniques of finite mathematics such as difference equations, generating functions, and the like, originated in probability theory (a classic survey is Laplace's *Analytical Theory of Probability*, written in 1812!!) and probability techniques are sometimes important in numbers theory. Finally, using his own version of nonstandard analysis, Edward Nelson reduced all of probability theory (including stochastic processes) to discrete probability spaces(this is in his book *Radically Elementary Probability Theory*). — Miguel 04:52, 2004 Dec 19 (UTC)

Why is this page so biased towards Bayesian statistics? INic 12:08, 19 October 2005 (UTC)

## sequences?

in the explanation of sample space, shouldn't the word "sequence" be "combination" as the order of the Californian voters does not matter?

## important article

In response to the comment about whether this page is the same as the article on probability: this page is certainly not the same as the other article. Kolmogorov's axioms are stated clearly here, and it is important that they are in the Wikipedia. This article is quite good and very important, but it still would benefit from some editing. For example, Ω is the "probability space", while "sample space" S comes into play once a random variable X (or Y or Z or whatever you'd like to name it), which is a measurable function, maps X:Ω→S where S is the space from which we collect samples. What is the difference, one might ask, but truly, this is important mathematically. Ω could be anything. As examples, Ω could be {the people of earth}, {the atoms in the cosmos}, {all 18K gold jewelry or golden crowns}, or any such set of interest. But the sample space S would be something such as, respectively, the set of vectors (height, weight, age) of the people; the set of number of electrons of the atoms and their quantum energy levels; the set of all weights of the golden items. There are some other improvements that are warranted, too. I plan to begin editing the article soon, but will wait for further discussion, so that we can do the best job of doing so as a team. -- MathStatWoman

- With the most recent edits, I wonder whether this page continnues to serve a useful pupose. Rjm at sleepers 09:24, 10 March 2007 (UTC)

## Don't use F to mean two completely different things

I'm going to remove all references in this page to F as an event. It will only confuse people later when they see that F is a sigma-algebra. E=>A and F=>B is my suggested universal fix.

The article on probability theory is superficial. It uses jargon, while being disconnected from real life. I believe that the best foundation to theory of probability is laid out here:

The article is accompanied by free software pertinent to probability (combinatorics and statistics as well).

Ion Saliu, Probably At-Large

## Proposed with probability axioms

This page currently contains two main sections:

- The Kologorov axioms (also repeated on the probability axioms page)
- A discussion of philosophy of probability (also repeated on probability interpretations page).

I propose that this page is merged with probability axioms and limits its discussion to the first bullet point. Perhaps it should be renamed 'Kolmogorov axioms' (a title which currently redirects to probability axioms)? Andeggs 16:18, 24 December 2006 (UTC)

I agree; there is a great deal of redundancy in having two different articles on the Kologrov axioms; this article should be merged with probability axioms and probability interpretations.

-Patrick N.R. Julius 03:00, 28 December 2006 (UTC)

- "Probability theory" should redirect to probability if the proposed move of this article's material to those other two articles gets done. Michael Hardy 00:13, 29 December 2006 (UTC)

- You're right there's tonnes of redundancy. I'm going to merge all of the information about the probability axioms along with probability space into this page, and then we can decide where that should go. I hope that I'm not stepping on anyone's toes :) MisterSheik 14:11, 28 February 2007 (UTC)

- Done! MisterSheik 17:59, 28 February 2007 (UTC)

- I moved this from the introduction because it wasn't clear to me what it actually meant. The definition in the article seems clear and rigorous by comparison. MisterSheik 18:18, 28 February 2007 (UTC)

More precisely, **probability** is used for modelling situations when the result of an experiment, realized under the same circumstances, produces different results (typically throwing a dice or a coin). Mathematicians and actuaries think of probabilities as numbers in the closed interval from 0 to 1 assigned to "events" whose occurrence or failure to occur is random. Probabilities are assigned to events according to the probability axioms.

## Stubbification

From looking at the history of this article, one infers that once upon a time, there was a lot of important content on this page, but most of it is gone in a direction unknown, and the current version is little better than a stub with a long list of links. What an extraordinary loss! I don't know which Einstein, or, given the context, Laplace came up with the idea of removing most content from this page to who-knows-where, but in my opinion, it amounts to an act of extraordinary vandalism. It may not all happened at once, but I do not have sufficient time and resources to investigate. If any qualified editors are monitoring this page, can you, please, restore some reasonable verion of it? For one possible model of what *Probability theory* article can look like, you may refer to the article in the German edition of Wikipedia. Arcfrk 03:15, 21 March 2007 (UTC)

- Looking at the revison just before the stubification [2] it looks like it was a rather abstract page with formal definitions beyond non specalists. These quite rightly seem to have gone to Probability space and Probability axioms. For the vital article we probably want Probability instead and I've change WP:VA accordingly. --Salix alba (talk) 09:55, 21 March 2007 (UTC)

- See also Talk:Probability#Merge Probability theory to here. --Lambiam
^{Talk}10:33, 21 March 2007 (UTC)

I strongly object both to the removal of *Probability theory* from the list of vital articles and to merging it with Probability. As an emergency measure, I will put links to Probability space and Probability axioms in the lead. However, *probability theory* is one of the most important mathematical disciplines, with enormous applications to exact sciences, social sciences, non-sciences such as economics, and through Statistics, for which it provides mathematical underpinning, to nearly every subject dealing with analysis of large amounts of data. To merge the article on *Probability theory* with the article on *Probability* is akin to merging Differential equations with Derivative. Needless to say, probability theory constitutes a lot more than a list of axioms for probality spaces. Now, to address your complaint about abstractness and being beyond non specialists: you should expect an article in an encyclopaedia to reflect the current state of knowledge, which will be beyond most people's basic training. I am not advocating for making it incomprehensible on purpose, but merely point out that the popularity of a topic among the masses cannot be a valid criterion for inclusion or exclusion it into encyclopaedia or designating it as a vital topic. If the article was incomplete or bad in some ways, it would have to be improved, not dissipated. I think that the section on probability in mathematics in *Probability* can form a useful core for this article. Of course, it will have to be eventually expanded and structured further. Arcfrk 00:25, 22 March 2007 (UTC)

- Arcfrk, instead of pontificating on the talk page, why don't you put an outline into the article? There's no need to call people vandals or "einsteins." It's my fault that article is blank, and the reason is that I didn't know what should go here, but I was pretty sure that what was here before didn't belong. You're right about the stuff in mathematical probability being a good start to this page, maybe you'd like to build a more complete outline? --MisterSheik 00:34, 22 March 2007 (UTC)
- Well, better to pontificate once and develop a good coordinated plan than engage in a meaningless game of tug-of-edit! For the future reference, you may consider writing up a motivated summary of your substantial edits, otherwise to us mere humans it may appear to be a work of an Einstein (meaning, a genius). Arcfrk 03:56, 22 March 2007 (UTC)

As I've said elsewhere, I think that it's the Probability page that should be the page without real contents. "Probability" and "probable" in natural languages can mean quite a lot of different things, only seldomly coinciding with the mathematical concept. For this reason I think that Probability should be nothing more than a disambiguation page for all the different flavors of the word. I agree that Probability theory deserves an article of its own, not only a stub. iNic 02:51, 22 March 2007 (UTC) And I saw now that MisterSheik already started to implement this idea. :-) iNic 02:57, 22 March 2007 (UTC)

- It wasn't my idea--it was yours! :) But, yeah, it's a start I agree... --MisterSheik 03:25, 22 March 2007 (UTC)

## New lead

I've written a new lead to this article. As you can see, it is quite different in intent and flavor from the lead in Probability. I invite experts in probability to develop the article to a level comparable with, say, Algebra or Geometry. I reiterate my belief that the mathematical theory of probability should be treated separately from a more leisurely article explaining the history and main concepts in lay terms. Arcfrk 04:01, 22 March 2007 (UTC)

- It's really well written :) Nice job. --MisterSheik 04:22, 22 March 2007 (UTC)

## Link to Britannica

I've restored the link to Britannica, for the following reason: the one sentence definition of probability theory is a direct quote from the Britannica article. It would seem unwise not to acknowledge this fact, and has nothing to do with being or not being freely available to everyone (think of references to books that are not "freely available" to anyone with internet access). Arcfrk 21:49, 23 March 2007 (UTC)

- OK, but to have a link that supposedly goes to a source article but is blocked for most users is annoying I think. To avoid this, and still keep the reference intact, I suggest that we remove the link part but leave the reference as it is. iNic 03:05, 25 March 2007 (UTC)

- Changed the Britannica link to free summary accessible by all users. Hirak 99 22:33, 28 March 2007 (UTC)

## veracity

Is this even true? "The general study of probability has two main flavors - discrete and continuous."

What about a random variable that's dirichlet distributed? It is not continuous, but has a pdf.

This statement needs to be more precise. For example, it could say: "There are three theories of probability: discrete, continuous, and ...?" But, I have no idea...

Also, we should be careful not to duplicate the information that is on other pages, or belongs on other pages. For example, probability space, probability axioms, and probability distribution should all be well-written clear pages that we can just refer to here. (And if they're not, they should be fixed.) There's no point in redefining what a probability distribution is in more than a sentence or two, imho.

MisterSheik 22:28, 28 March 2007 (UTC)

- Certainly the Dirichlet distribution is continuous.

- Yeah :) What I meant was that it doesn't fit under the definition of continuous distributions that is on the page right now, because a dirichlet random variable is not a real number; you can't define Pr(X<4), etc. So, something is wrong with the definition on this page. (On second look, it's now dealt with by the "measure theoretic probability theory".

- It is true that some probability distributions are neither discrete nor continuous (in the sense of having a pdf) nor a mixture of the two. But I'm not sure that's another "main flavor of the general study of probability". Also, I definitely would NOT say "three theories of probability" or "two theories of probability" or the like when writing about this sort of thing. They're really not separate theories. Michael Hardy 22:34, 28 March 2007 (UTC)

- If there is a dichotomy in probability theory, I feel that it's between the study of independent random variables and stochastic processes. A sequence of i.i.d. variables can of course be interpreted as a stochastic process with discrete time; however, in general theory of stochastic processes many types of "somewhat dependent" processes are considered. Arcfrk 23:14, 28 March 2007 (UTC)

- I have added measure theoretic probability, and changed the word "classification" to "treatment". Hopefully this will take care of all confusions. Cheers --Hirak 99 23:16, 28 March 2007 (UTC)

- Yeah, it looks really good right now. Very nicely presented...MisterSheik 01:07, 29 March 2007 (UTC)

- I have to confess that I'm very pleased with the article in its current state. :-) Well done! iNic 20:49, 29 March 2007 (UTC)

- Thank you :) Hirak 99 14:13, 30 March 2007 (UTC)

- Flavors? A meassure based approach covers both the discrete and the continous cases - and everything in between. This fact isn't made clear in the article.
- I would say "most introductions to probability separates it in two main flavors - discrete and continuous. A more advanced approach using measure theory unites these flavors and covers everything in between and more." Aastrup 22:34, 19 August 2007 (UTC)

- True, but this article presents the stages preceding measure theoretic probability too. There are some advantages to do this. (1) The article reflect (in a loose way) the historical development of the subject, and (2) it makes the article accessible to more readers. (I'd guess the latter reason is the main reason you'll find the older, simpler versions still in introductory text books.) The obvious disadvantage to this approach is if someone gets the wrong impression that there are several different probability theories and not one. But in the way the article is written now I think that risk is very low. iNic 21:02, 21 August 2007 (UTC)

- Well, my introduction to Probability didn't start with meassure theory either, and the simpler continous/discrete "two flavors" approach is a good way to get to know the subject. I'll expand the meassure theory in probability paragraph, and I'll make it more clear that it covers the continous and the discrete cases. Aastrup 01:55, 23 August 2007 (UTC)

- More explanation of the relative places of different parts of and approaches to probability theory is needed. See also the comments below left by Geometry Guy concerning filling in the patchy prose. They still apply. Arcfrk 23:52, 23 August 2007 (UTC)

## Generating functions

Would another section on generating functions make sense? Other than the pdf/pmf other ways to characterize a real-valued rv are the... moment-generating function, characteristic function, and cumulant generating function, and explain in a couple lines what is going on? MisterSheik 01:30, 29 March 2007 (UTC)

- Yes, the generating functions section seems to have a discontinuity after the Treatment section. I am thinking about bringing back a
**distribution**section, with the mgf, cf etc in a subsection of it along with the laws like Levy continuity theorem. Also how about removing "Probability in Mathematics" and instead adding sections like law of large numbers? --Hirak 99 07:55, 29 March 2007 (UTC) - PS. Thanks for all the editing to give the article a much higher quality :) --Hirak 99 07:56, 29 March 2007 (UTC)

- Good idea Hirak. I like the depth that you chose for the law of large numbers. Ideally, since the introduction also mentions the central limit theorem, that could have a similar overview-like section? But, on a completely selfish note, I don't understand how the mgf and cf and cgf relate to one other (if at all?), and I'd be thrilled to see an equally good overview here :) Maybe I should just read those articles though ;) Cheers. MisterSheik 16:19, 29 March 2007 (UTC)

## Now almost totally redundant, unless someone wants to merge something back in

To give a mathematical meaning to probability, consider flipping a "fair" coin. Intuitively, the probability that heads will come up on any given coin toss is "obviously" 50%; but this statement alone lacks mathematical rigor. Certainly, while we might *expect* that flipping such a coin 10 times will yield 5 heads and 5 tails, there is no *guarantee* that this will occur; it is possible, for example, to flip 10 heads in a row. What then does the number "50%" mean in this context?

One approach is to use the law of large numbers. In this case, we assume that we can perform any number of coin flips, with each coin flip being independent—that is to say, the outcome of each coin flip is unaffected by previous coin flips. If we perform *N* trials (coin flips), and let *N*_{H} be the number of times the coin lands heads, then we can, for any *N*, consider the ratio .

As *N* gets larger and larger, we expect that in our example the ratio will get closer and closer to 1/2. This allows us to "define" the probability of flipping heads as the limit, as *N* approaches infinity, of this sequence of ratios:

In actual practice, of course, we cannot flip a coin an infinite number of times; so in general, this formula most accurately applies to situations in which we have already assigned an *a priori* probability to a particular outcome (in this case, our *assumption* that the coin was a "fair" coin). The law of large numbers then says that, given Pr(*H*), and any arbitrarily small number ε, there exists some number *n* such that for all *N* > *n*,

In other words, by saying that "the probability of heads is 1/2", we mean that if we flip our coin often enough, *eventually* the number of heads over the number of total flips will become arbitrarily close to 1/2; and will then stay *at least* as close to 1/2 for as long as we keep performing additional coin flips.

Note that a proper definition requires measure theory, which provides means to cancel out those cases where the above limit does not provide the "right" result (or is even undefined) by showing that those cases have a measure of zero.

The *a priori* aspect of this approach to probability is sometimes troubling when applied to real world situations. For example, in the play *Rosencrantz & Guildenstern Are Dead* by Tom Stoppard, a character flips a coin which keeps coming up heads over and over again, a hundred times. He can't decide whether this is just a random event—after all, it is possible (although unlikely) that a fair coin would give this result—or whether his assumption that the coin is fair is at fault.

## I'm happy

I'm very pleased with the current version of this article now... it has become atleast a good base to start with.

Things that remain to do:

- Cleanup of the see also (med-low priority)
- Assembling intros other important areas in probability theory in this page (low priority - already comprehensive encyclopedia article, although surely more details can be added)

Cheers, Hirak 99 21:05, 30 March 2007 (UTC)

## P and Pr

- I just discovered that "
*P*" and "Pr" aren't used consistently in the article. I would personally prefer to see*P*everywhere instead of Pr, but it seems that Pr is adopted as the standard notation in other articles at wikipedia. And standards are always good, aren't they? ;-) iNic 01:12, 31 March 2007 (UTC)

- I guess the reason people might prefer "Pr" is that the TeX markup adopted for Wikipedia has a standard function
**\Pr**which comes in non-italics and looks like an operator when rendered. I see no other reason for preferring "Pr". I think it is alright if one wishes to replace all \Pr in the text with \operatorname{P}, but just to follow standards, we might keep it as Pr :-) --Hirak 99 09:29, 31 March 2007 (UTC)

- I guess the reason people might prefer "Pr" is that the TeX markup adopted for Wikipedia has a standard function

- Yes, you might be right here. I bet there are some, maybe heated, discussions about this somewhere on wikipedia talkpages; it's the kind of minor issues that tend to start a debate. I would have voted for
*P*but I apparently missed the voting opportunity. And if they ended up settling for Pr for some reason I will not object to that. :-) iNic 20:21, 31 March 2007 (UTC)

- The confusing thing about random variables is the meaning of operations on them: sometimes we mean to operate on the random variable's value and other times we are operating on its probability distribution in a more complicated way. For example, if X is a random variable, then 2X means doubling the values of the random variable, and similarly sin(X) means applying the sin function to values. But, E(X), for example isn't a function on the values, but rather takes the whole distribution and returns a value. Similarly, P(X=2) doesn't seem like a function on the event (until you are doing measure theoretic probability theory, and you think of P as a measure on the space of events, right?) And so, I think that Pr makes more sense unless you specifically mean the measure-theoretic definition, and then you would ideally always pass it a set events. Unfortunately, no one does that, and we all use P(X=2) as shorthand. I think this is why people write E[X] (with square brackets) and Pr(X=2): to distinguish, by means of notation, these operation from run-of-the-mill functions. (But I'm not opposed to P(X=2); I just wish that probability theorists had come up with nicer notation). MisterSheik 04:23, 1 April 2007 (UTC)

- Notation in mathematics (as well as in all natural languages) is a constant struggle between 'logic' and tradition. Almost without exception tradition usually wins over logic. However, in this case I think the traditional notation makes sense as it is. The trick is not to alter
*P()*and*E()*to become operators but to make clear that*X*is not a number but a*random variable*. Usually some notational conventions are used to stress this, commonly the upper case letters*X*,*Y*,*Z*,... are reserved to denote random variables and the lower case letters*x*,*y*,*z*,*a*,*b*,*c*,... are used to denote ordinary numerical variables. I haven't seen Pr for probability in any modern text about probability, nor have I seen*E[]*as a way to make*E*an operator. iNic 14:06, 3 April 2007 (UTC)

- P(X=2) is really a shorthand for , where denotes the sample space on which the random variable X is defined (in other words, the domain of X). I prefer the notation P{X=2} to emphasize that P operates on a
*set*. Other common shorthands are , where is the preimage of A under X. - The expectation E on the other hand operates on the random variable itself. To use MisterSheik's example above, I'd prefer to , whereas to denote the preimage of A under I use . I believe this notation is unambiguous.--Drizzd 16:46, 10 April 2007 (UTC)

- P(X=2) is really a shorthand for , where denotes the sample space on which the random variable X is defined (in other words, the domain of X). I prefer the notation P{X=2} to emphasize that P operates on a

- So, we don't disagree with each other at all. As iNic said, there's a struggle between logic and tradition. What we need to do is decide on a few canonical texts, and do a survey of what notation they prefer. Who are the major
*modern*players in probability theory? MisterSheik 17:13, 10 April 2007 (UTC)

- So, we don't disagree with each other at all. As iNic said, there's a struggle between logic and tradition. What we need to do is decide on a few canonical texts, and do a survey of what notation they prefer. Who are the major

- I actually did check some modern texts in a book store the other day and I found out that some books really use Pr a lot. So I guess I'm getting old and not following the latest notational trends anymore. ;-) iNic 00:37, 11 April 2007 (UTC)

- I didn't do the recent changes from Pr to
*P*here! But I won't object to it either... ;-) iNic 09:42, 28 April 2007 (UTC)

## Suggestion on images

Is there any image/diagram we can create to put on this page to improve clarity of the article? --Hirak 99 12:31, 3 April 2007 (UTC)

## Link to Italian text

The external link added by 87.19.208.113 today points to a text written in Italian language. Maybe this would be more appropriate for the Italian version of Probability theory? --Drizzd 09:06, 25 April 2007 (UTC)

- I removed it. I've deleted it from this page before. Seems to be a case of self-promotion. iNic 11:46, 3 May 2007 (UTC)

- reply 82.61.68.72 05:42, 6 May 2007 (UTC)

The full text is in Italian but the scientific progress is of global interest.

- end reply 82.61.68.72 05:42, 6 May 2007 (UTC)

## Good Article review

I've reviewed this article and it appears to meet the Good Article criteria. I find the lead to be well-written and accessible to laypeople, along with a more detailed description for mathy persons. I think the History section could use a little bit of fleshing out, but it covers the necessary ground. Good references and bibliography. I think kudos are due to those involved with this article. I don't think there is much question in my mind of the subject of this article vs. the more general Probability; this article clearly shows the theoretical and formal side, where Probability is more general (as it should be). The "See also" section could maybe use a scrub, but it's not in enough disarray to distract significantly from the article. Nice work! Ryanjunk 15:50, 27 April 2007 (UTC)

- It is good work indeed, but I don't think this article would survive a Good Article Review (see WP:WIAGA). The lead may be accessible, but the rest isn't, and the lead is a bit too short. The history section does not have adequate coverage, and there isn't enough motivation. If this article can only be understood in concert with Probability then this should be made more clear in the text (although I think it is a pity to rely on that article so much: surely this is a subarticle, and so should expand on the treatment there).
- However, the main reason it wouldn't survive is a lack of references. Using EB as a source is not ideal (what sources does it use?), and there is no publication information for the other reference. The history section at least needs a source. I am tempted to delist it myself to spare editors here from the occasionally dismissive approach of some GA/R reviewers, and provide a thorough review (see e.g. Klee's measure problem for my approach). Comments? Geometry guy 17:35, 9 June 2007 (UTC)

### Delisting

As there have been no comments for over a month and essentially no changes to the article I am going to delist it.

- It is
**reasonably well written**.- a
*(prose)*: Template:GAList/check b*(MoS)*: Template:GAList/check

- a
- It is
**factually accurate**and**verifiable**.- a
*(references)*: Template:GAList/check b*(citations to reliable sources)*: Template:GAList/check c*(OR)*: Template:GAList/check

- a
- It is
**broad in its coverage**.- a
*(major aspects)*: Template:GAList/check b*(focused)*: Template:GAList/check

- a
- It follows the
**neutral point of view policy**.- a
*(fair representation)*: Template:GAList/check b*(all significant views)*: Template:GAList/check

- a
- It is
**stable**. - It
**contains images**, where possible, to illustrate the topic.- a
*(tagged and captioned)*: Template:GAList/check b*(lack of images does not in itself exclude GA)*: Template:GAList/check c*(non-free images have fair use rationales)*: Template:GAList/check

- a
**Overall**:- a
*Pass/Fail*: Template:GAList/check

- a

Essentially all the comments I made previously still apply. Furthermore, a glance at Probability (two sentences of lead, and three sentences on "theory") reveals that it is inadequate as an introduction to this article, so there really is no excuse for not making this article more accessible. I have attempted to improve the prose style in places, but it is still patchy, and uncompromising to the reader: furthermore, there is not enough prose in the technical sections. The article fails the good article criteria for lead sections (although I have added to the lead), explaining jargon, and probably also layout and list incorporation. The history section is lacking in coverage, references and citation. The referencing is poor in general, and there are several places where citations are needed. This article probably does need some images, e.g. to illustrate the notion of a sample space or a probability distribution. Geometry guy 15:09, 12 July 2007 (UTC)

## rv and prob dist

Congratulations on the GA review guys. Do you think that "random variable" and "probability distribution" should be touched on in the article? Particularly, since there are random variables in the article being used without explanation. MisterSheik 21:22, 7 May 2007 (UTC)

- Good point! Please add that. iNic 21:33, 7 May 2007 (UTC)

- What's the best way? It seems that we will have a definition of both for each of the types of probability theory? I know that in measure theoretic probability theory, we have that a random variable is a measurable function from the sample space to a measurable space, and that a probability distribution is any probability measure (which is already defined). Is that right? What about discrete and continuous probability theory? MisterSheik 01:58, 9 May 2007 (UTC)

I think it's enough if we turn these concepts to links to the pages explaining them better. We can't explain everything here from scratch; this is more of an overview article. What do you think? iNic 09:31, 9 May 2007 (UTC)

I agree. I was thinking more of one-line explanations anchored into the right places in this article, the idea being that this page could serve as a map of probability theory. MisterSheik 13:58, 9 May 2007 (UTC)

## discrete+continuous=?

"An example of such distributions could be a mix of discrete and continuous distributions, e.g., a sum of a discrete and a continuous random variable will neither have a pmf nor a pdf."

Is that correct? A discrete r.v. + an independent continuous r.v. will give rise to a continuous r.v. - e.g. Bernoulli + Std. Normal should follow density (phi(x)+phi(1+x))/2. Replaced the example to

"... a random variable which is 0 with probability 1/2, and takes a value from random normal distribution with probability 1/2. It can still be studied to some extent by considering it to have a pdf of , where is the Kronecker delta function."

--Hirak 99 (talk) 18:22, 8 January 2008 (UTC)

## History section

As noted before, the history section lacks coverage. For instance, Russell and Norvig mention the Indian mathematician Mahaviracarya as one precursor of modern probability theory in chapter 13 of Artificial Intelligence: A Modern Approach. The chapter contains also more references about the history of probability theory. --zeno (talk) 15:35, 15 January 2008 (UTC)

## Central Limit Theorem

*Moved here from Talk:Probability.*

The section on central limit theorem is wrong, if then the expression in question converges to a degenerate random variable. Either a simple case of this theorem should be stated (the case when all variances equal and greater than 0) or the Lindberg-Fuller version of the result should be stated. —Preceding unsigned comment added by 69.134.214.62 (talk) 00:00, 24 January 2008 (UTC)

- Template:Done by stating the simple case with iid rv's. --Lambiam 05:36, 24 January 2008 (UTC)

About its importance: I have added a source, removing "citation needed" in the "CLT" and "References" sections.Boris Tsirelson (talk) 12:13, 17 October 2008 (UTC)

## Weak convergence

In the article it was stated that weak converge is converge in probability. This is not true. Weak convergence is convergence in distribution. In the main article about convergence of rv's it is mentioned correctly. I changed it.

I also reformulated the statement that strong convergence is a stronger version of convergence in probability. This does not make sense. One is not a version of the other. They are both different forms of convergence. 86.87.139.31 (talk) 22:00, 13 November 2008 (UTC)

## Is a measure always positive?

As noted by 80.221.23.99, "Measure is already non-negative by definition (according to article measure)". Yes, but see also Measure (mathematics)#Generalizations. Boris Tsirelson (talk) 22:09, 12 February 2009 (UTC)

- In my experience, whenever the term measure on its own is used, it always means non-negative (the first link). It is true that there are useful generalizations of measures such as signed and complex measures, but whenever talking about these, they are always prefixed by the word complex or signed (unless it is obvious from the context). In the context of probability theory, you can be pretty much 100% certain that probability measure means a finite positive measure. --67.193.128.233 (talk) 17:38, 8 March 2009 (UTC)

## A good article became muddled

This article has got muddled. The material about discrete distributions and continuous distributions belongs under the heading of probability distributions. What is now totally missing is the general description of a probability space, ie (in present day probability theory) a set endowed with a sigma algebra of subsets, which are called events, and which have probabilities, which form a (nonnegative) measure on the sigma algebra, of total mass one. Random variables are measurable functions from the sample space to the real line. Random variables have distributions etc etc...

This general approach covers discrete probability, which one could say is the special case when the sample space is countable, and the sigma algebra consists of all its subsets. It covers also the common case when the probability space is R^n, the sigma algebra is its Borel sigma field, and the n coordinates can be thought of as n random variables. Gill110951 (talk) 14:55, 14 March 2009 (UTC)

Engel's law is an observation in economics stating that, with a given set of tastes and preferences, as income rises, the proportion of income spent on food falls, even if actual expenditure on food rises. In other words, the income elasticity of demand of food is less than 1.

The law was named after the statistician Ernst Engel (1821–1896).

Engel's Law doesn't imply that food spending remains unchanged as income increases: It suggests that consumers increase their expenditures for food products (in % terms) less than their increases in income.[1]

Engel's Law states that household expenditures on food in the aggregate decline as income rise; in other words, the income elasticity of demand for food in the aggregate is less than one a decline toward zero with income growth. [2]

One application of this statistic is treating it as a reflection of the living standard of a country. As this proportion or "Engel coefficient" increases, the country is by nature poorer, conversely a low Engel coefficient indicates a higher standard of living. —Preceding unsigned comment added by 122.161.114.232 (talk) 09:10, 12 November 2009 (UTC)

- What do you hint at, sir "122.161.114.232"? Boris Tsirelson (talk) 12:25, 12 November 2009 (UTC)

## Relevance of Bertand's paradox

The present version says (under "Treatment" and "Continuous probability distributions") "Classical definition: The classical definition breaks down when confronted with the continuous case. See Bertrand's paradox." There is an immediate problem because the earlier structure of the article only defines (not really a definition, but still...) "classical definition" for the discrete case. A guess is that the supposed relevance of Betrand's paradox here could be summarised as saying that an argument involving a transformation of variables shows that imposing a (multivariate) uniform distribution in one set of variables leads to a different result from imposing a uniform distribution in another transformed set. But the same argument would apply for (multivariate) discrete distributions. Hence at a minimum, the structure of the discussion is wrong. It is not actually very clear what thing the "classical definition" is supposed to be a definition of, and it may not be a term that is used much anyway in "the literature" so it may be possible to rearrange things so that the term is not used. I think there is a confusion between (1) assigning probabilities to things, and (2) rules for manipulating probabilities. It may be that "classical definition" relates to (1), while "probability theory" relates to (2). Melcombe (talk) 09:43, 12 January 2010 (UTC)

- "But the same argument would apply for (multivariate) discrete distributions." — Really? On every finite set, the uniform distribution is well-defined, and it is INVARIANT UNDER EVERY BIJECTIVE MAP from one finite set to another (in particular, to the same set, which is permutation invariance). As for me, the point of Bertrand's paradox is: NOTHING LIKE THAT IS POSSIBLE ON INFINITE SETS. Especially, the uniform distribution on (say) the interval (0,1) is not invariant under (say) the bijective map x -> x
^{2}. Maybe the structure of the article can be improved, but anyway, for my experience, many people are inclined to believe (often sub-consciously) that "the truly random choice" is possible from every set, and is invariant under all bijections. It is important to say "no". Boris Tsirelson (talk) 12:45, 12 January 2010 (UTC)

- But the article doesn't say "NOTHING LIKE THAT IS POSSIBLE ON INFINITE SETS" ...the distinction is between discrete (but one might guess finitely discrete) and continuous. And the article doesn't say what it thinks the point of referring to Bertrand's paradox is. It certainly doesn't talk about invariance or permutation invariance, and certainly not about bijective maps. A discrete version of Bertrand's paradox can be constructed by considering a square divided into square sub-blocks. One way of assigning "uniform probabilities" is by selecting a row and a column independently and with equal probabilities along each dimension. A second way is to consider turning the square through 45 degrees and using distance along the new x-axis to define a distance along the diagonal which can be divided in to discrete set a values each determining a unique column of diamonds in the new orientation. Then a "uniform assignment" can be set up by choosing one of these new columns with equal probabilities and then indepently choosing one of the diamonds in the column, again with equal probabilities within each column. The results are of course different, in much the same way as the different ways of defining a "random line" are for Bertrand's paradox. This seems to me to be as a much a dismissal of whatever "classical definition" was meant to mean as whatever interpretation of referring to Bertrand's paradox one is supposed to be able to guess at. Melcombe (talk) 15:59, 12 January 2010 (UTC)

- I see. Such example could be also instructive, to some extent; but it is relatively simple, since its deviation from the "classical definition" is rather evident. The classical definition is, "number of favorable cases over the total number of outcomes"; and "your" procedure is different, and nonequivalent. Yes, I know that the article is written differently from my words above. But still, "my" point is that, CONTRARY TO THE NAIVE INTUITION, the classical definition cannot be generalized to infinite sets. Different parameterizations of these lines in Bertrand's paradox lead to different subsets of the plane; these sets are in bijective correspondences to each other; and still, they lead to different results.

- Once again, I do not say that the article is good. If you can improve it, please do. I only say that some lesson out of Bertrand's paradox is relevant (and important). By the way, even some young mathematicians (non-probabilists) sometimes ask me something like that: "If a function is chosen at random, what is the probability that it is continuous, differentiable etc?" Their intuition tells them wrongly that the classical definition has some reasonable generalization to arbitrary sets. This is the problem. But of course, other problems exist, too. Boris Tsirelson (talk) 18:46, 12 January 2010 (UTC)

- One of the most exciting things I learned as a young mathematician (non-probabilist) was that a random continuous function would be differentiable with probability 0 and Hölder with constant 1/2 + ε with probability 1, for any positive ε. Needless to say, one can't learn a thing like that from Bourbaki. Arcfrk (talk) 10:57, 16 January 2010 (UTC)

- Yes, Brownian motion is like that. However, asking "If a function is chosen at random, what is the probability that it is continuous, differentiable etc?" people usually assume (implicitly) that the answer is well-defined by the very idea of "random choice", and by random choice they mean (some generalization of) the classical definition; that is, all functions are (somehow) equiprobable. They will say: well, some random process has non-differentiable sample paths, another one has smooth sample paths, etc, but THE QUESTION is, what happens for a TRULY RANDOM function... Boris Tsirelson (talk) 17:45, 16 January 2010 (UTC)

## Absolutely continuous

Hi, could we discuss your undoing my change to the article Probability theory? My opinion still stands, I did not understand your comment. Here is, again, my reasoning:

"The existence of a derivative almost everywhere is a properity of absolutely continuous functions, not its definition." Quiet photon (talk) 09:16, 3 March 2010 (UTC)

- Well, if you insist, I shall not object to your change. On the other hand, the phrase "its derivative exists and integrating the derivative gives us the cdf back again" is one of several equivalent definitions of absolute continuity. True, it is not the definition given in "absolutely continuous". However, this definition is more relevant to "Probability theory". This is why I did not see anything wrong with the old phrase in the article. Boris Tsirelson (talk) 09:58, 3 March 2010 (UTC)

## huh?

I really enjoyed learning about probability in high school. Despite taking a couple calculus classes at university, my math education since that time has been quite limited. And so, this article is very much over my head. It seems more appropriate to a specialized encyclopedia for mathematicians. Or, a nice introductory summary would be really nice. If I could be reminded of the difference between a combination and a permutation, that might actually interest me in the more complicated topics that might follow. —Preceding unsigned comment added by Briancooper45 (talk • contribs) 17:46, 12 October 2010 (UTC)

## D and D'

In the convergece of random values section I've changed the symbol D in D' : usually D is the test function space and D' is its dual, i.e. the distributions space. Random values are seen as distributions and not as test functions! (E.g. a discrete uniform distribution could be seen as a sum of dirac delta distribution). — Preceding unsigned comment added by 95.225.60.175 (talk) 11:47, 22 December 2011 (UTC)

- I understand your argument, but "D" is indeed "the most common notation", while "D'" is not. In fact, "D" means just "in Distribution" (irrespective of the space of test functions). Boris Tsirelson (talk) 12:51, 22 December 2011 (UTC)

## Copied from Wikibooks

The new section "Types of probability" is copied from Wikibooks [3], see also Wikilabs [4]. Is it a good idea? Boris Tsirelson (talk) 17:22, 9 January 2013 (UTC)