# Talk:Probability distribution

Template:Vital article Template:Maths rating Template:WPStatistics

## Lead is confusing

"random variable may be attributed to a function defined on a state space equipped with a probability distribution that assigns a probability to every subset ... of its state space..." A function defined on a state space should assign something to points of the state space. If it assigns something to sets, it is rather a SET FUNCTION. Also, if the state space is already equipped with a probability distribution, what is the role of the random variable?

"A random variable then defines a probability measure on the sample space by assigning a subset of the sample space the probability of its inverse image in the state space." Really? The two states are swapped here. On the sample space a measure is given from the beginning; on the state space it appears due to the random variable.

"In other words the probability distribution of a random variable is the push forward measure of the probability distribution on the state space." I do not understand this phrase; once again, on which space the measure is given from the beginning? Boris Tsirelson (talk) 16:03, 12 December 2008 (UTC)

## Organization of the article: subsection Terminology etc

Perhaps subsection Terminology with its current content is superfluous. Repetition of discrete vs continuous distribution seems confusing (the reader could think he overlooked sth he didn't). The notion of support should appear in the article on measures. Measure, probability and possibly a few other concepts should be referenced at the bottom of the article. More thoughts are needed about organization of the vast legion of important links to related articles.

As a continuous counterpart of formula for probability distribution should be used that with integral using density function (since resembles discrete version very well - just think of density as discrete histogram with dense set of values). After presenting discrete and absolutely continuous distributions in separate sections, there should appear the third section unifying both into the general theory with formula using Lebesgue integral w.r.t. axiomatic probability measure P (this formula which is now horribly serving as definition in the special case of absolute continuous random variable - inacceptable for general purpose encyclopedia). Then some properties, examples and graphs should follow (independently of famous and important distributions described in dedicated articles). Compare e.g. with some good stylistic and logistic ideas applied to the article on expected value. --Megaloxantha (talk) 02:37, 31 December 2008 (UTC)

## New "Some Properties" section

Can we do something sensible about the stuff presently in this section, which says:

- The probability density function of the sum of two independent random variables is the convolution of each of their density functions.
- The probability density function of the difference of two random variables is the cross-correlation of each of their density functions.

In the first bullet point, there is a need to say this for general distributions, not just those which have densities ...unfortunately the article on convolution does not seem to give a sensible formula in terms of cumulative distribution functions. In the second bullet point, there is again the problem of dealing with general distributions but, in addition, the use of the word "cross-correlation" would need to be given the interpretation in the article pointed-to, which is very different from the common one in statistics.

Melcombe (talk) 10:38, 31 December 2008 (UTC)

## Proposal to archive discussion

Given that this article has had a major revamp, much of the old discussion is irrelevant to the present content. I have reordered the threads a little to put newer stuff towards the end, but I am suggesting that all the stuff now before the section headed "Lead is confusing" be archived. Any thoughts? Melcombe (talk) 12:15, 31 December 2008 (UTC)

## Observation Space

In the formal definition, we have:

"A random variable is defined as a measurable function *X* from a probability space to its observation space ."

Can we have someone put up a definition of the "observation space"? I'd do it myself but I'm not able enough. --WestwoodMatt (talk) 15:41, 20 March 2010 (UTC)

- Formally, it is just a measurable space. Informally, it is the set of all possible values (or a larger set, if more convenient). I also wonder, is "observation space" a standard word, a neologism, or what?Boris Tsirelson (talk) 16:30, 20 March 2010 (UTC)

- I'm assuming it's the same thing as the image of . But I'm not used to coming at this from the direction of defining the image as a measurable space - the only treatments I'm familiar with regard the image of as just being a subset of . Hence I'm slightly out of my depth, and although I could work through it myself step by step, I'd be unsure as to whether I'd done it right - I'm new to measure theory. --WestwoodMatt (talk) 23:33, 20 March 2010 (UTC)

- "Observation space" is used sometimes, see for example here. I did not find the definition, but probably it means the codomain, not just the image. Boris Tsirelson (talk) 07:01, 21 March 2010 (UTC)
- An explanation added; please look now. Boris Tsirelson (talk) 10:13, 21 March 2010 (UTC)

- I suspect "Observation space" is not standard in the literature, and should be omitted. At least as a PhD student in Stochastic Analysis with decent background in probability and measure theory, I've never seen used before. -- Some random passerby
- Also I, an old professor-probabilist, did not see it before (that is before 21 March 2010). Probably, for now it is used mostly by non-mathematicians. But does it mean that it should be omitted? Boris Tsirelson (talk)
- I think that the use of non-standard terminology is unfortunate because it makes it harder to read and understand. Especially so in definitions. I also think measure-theoretic definitions are mostly of interest to mathematicians, and thus that terminology should be taken from there. -- The same random passerby.

- Also I, an old professor-probabilist, did not see it before (that is before 21 March 2010). Probably, for now it is used mostly by non-mathematicians. But does it mean that it should be omitted? Boris Tsirelson (talk)

See also Wikipedia talk:WikiProject Mathematics#Codomain of a random variable: observation space?. Boris Tsirelson (talk) 16:50, 27 March 2010 (UTC)

The term seems self-explanatory when used the way it's used in the article. Someone used the word "unfortunate" above without rigorously defining it. I don't have a problem with that. Michael Hardy (talk) 18:50, 27 March 2010 (UTC)

- When you're in a section called "formal definition" it
*is*necessary to define stuff down to this level. Okay then, so although I*can*make a stab at defining "unfortunate" to a no-mathematically-inclined person (it describes an observation of a random variable whose outcome is contrary to the desires of one's motivational consciousness) I would not be able (in this context) to determine**rigorously**what an "observation space" is. When something is described as "self-explanatory" I always suspect that this is because the person so describing it lacks the ability to define it. As a logician I can not accept this as an answer. And as a statistician (i.e. I'm not one, I'm just learning this stuff as I go along) I*don't*understand what an "observation space" is in the terms of the mathematical objects that a probability space is defined in. --WestwoodMatt (talk) 21:35, 27 March 2010 (UTC) - Formal definitions are supposed to be formal. Also, adding definitions not used in the literature is probably a breach of the "no original research" criteria for wikipedia articles.

Enough is enough. Observation space is gone.

## Add "mode", "tail", "inflection", etc. to terminology?

Shouldn't the basic terminology used to discuss/describe a distribution be included here? It would not only fill out the basic presentation of the material but also provide an anchor for references to such terms in other articles. Jojalozzo 03:02, 25 July 2011 (UTC)

## Strange terminology

As far as I know, "probability distribution" is, in general, a probability measure (rather than this or that function). In some (but not all) cases it can be described by a cumulative distribution function. Sometimes also by the probability mass function; sometimes also by the probability density function. But the lead says "a probability mass, probability density, or probability distribution is a function..." Or is it meant that a measure is also a kind of function (namely, a set function)? But no, this is not written in the sections. Boris Tsirelson (talk) 05:57, 2 July 2012 (UTC)

- I have rewritten the lead to try to overcome this problem. But the rest of the article is extremely short of anything understandable about probability measure. Melcombe (talk) 19:20, 3 July 2012 (UTC)
- Nice! Much better than before. (But it would be enough, to link "probability measure" in the lead only once.) Boris Tsirelson (talk) 21:09, 3 July 2012 (UTC)

## Normal distribution

Why does this article refer to the Gaussian distribution as "the most important distribution"? Isn't this kind of arbitrary? — Preceding unsigned comment added by 2607:4000:200:13:1A03:73FF:FEB3:B07C (talk) 05:46, 27 August 2012 (UTC)

- Good question but no it's not arbitrary: there's a theorem somewhere that says that given a large enough sample size, all distributions tend towards the Gaussian in the limit. --Matt Westwood 09:46, 27 August 2012 (UTC)
- A quote:
- "Gaussian random variables and processes always played a central role in the probability theory and statistics. The modern theory of Gaussian measures combines methods from probability theory, analysis, geometry and topology and is closely connected with diverse applications in functional analysis, statistical physics, quantum field theory, financial mathematics and other areas."
- R. Latala, "On some inequalities for Gaussian measures". Proceedings of the International Congress of Mathematicians (2002), 813-822. arXiv:math.PR/0304343.

- Boris Tsirelson (talk) 12:41, 27 August 2012 (UTC)

- A quote:

- An example: The normal distribution in R
^{n}is the unique (up to scaling) rotation-invariant probability measure with independent components. This purely mathematical result is fundamental to the Kinetic theory of gases. --Rainald62 (talk) 21:36, 29 April 2013 (UTC)- Yes... You mean Maxwell–Boltzmann distribution. Boris Tsirelson (talk) 05:49, 30 April 2013 (UTC)

- An example: The normal distribution in R

## generic statistical distributions (especially sample distributions)

Right now there seem to be somewhat strange redirects. What article should I link to for a plain old distribution of occurrences as observed in a finite sample?

- I suspect that statistical distribution ought to disambiguate between sample distributions (that vary from sample to sample) and the ideal probability distributions. At the moment it redirects to just probability distribution.
- Sample distribution redirects to empirical distribution function (sample cdf), what about the actual sample distribution, non-integrated? There are also count data and frequency distribution articles but neither quite seems to be appropriate (or maybe they are just badly written).