# Statistical significance

**Statistical significance** is the probability that an effect is not due to just chance alone.^{[1]}^{[2]} It is an integral part of statistical hypothesis testing where it is used as an important value judgment. In statistics, a result is considered significant not because it is important or meaningful, but because it has been predicted as unlikely to have occurred by chance alone.^{[3]}

The present-day concept of statistical significance originated from Ronald Fisher when he developed statistical hypothesis testing in the early 20^{th} century.^{[4]}^{[5]}^{[6]} These tests are used to determine which outcomes of a study would lead to a rejection of the null hypothesis based on a pre-specified low probability threshold called *p*-values, which can help an investigator to decide if a result contains sufficient information to cast doubt on the null hypothesis.^{[7]}

*P*-values are often coupled to a significance or alpha (α) level, which is also set ahead of time, usually at 0.05 (5%).^{[7]} Thus, if a *p*-value was found to be less than 0.05, then the result would be considered statistically significant and the null hypothesis would be rejected.

## Contents

## History

{{#invoke:main|main}}
The present-day concept of statistical significance originated by Ronald Fisher when he developed statistical hypothesis testing, which he described as "tests of significance," in his 1925 publication, *Statistical Methods for Research Workers*.^{[4]}^{[5]}^{[6]} Fisher suggested a probability of one-in-twenty (0.05) as a convenient cutoff level to reject the null hypothesis.^{[8]} In their 1933 paper, Jerzy Neyman and Egon Pearson recommended that the significance level (e.g., 0.05), which they called α, be set ahead of time, prior to any data collection.^{[8]}^{[9]}

Despite his suggestion of 0.05 as a significance level, Fisher did not intend this cutoff value to be fixed and in his 1956 publication, *Statistical methods and scientific inference*, he even recommended that significant levels be set according to specific circumstances.^{[8]}

## Role in statistical hypothesis testing

{{#invoke:main|main}}

Statistical significance plays a pivotal role in statistical hypothesis testing where it is used to determine if a null hypothesis can be rejected or retained. A null hypothesis is the general or default statement that nothing happened or changed.^{[10]} For a null hypothesis to be rejected as false, the result has to be identified as being statistically significant, i.e., unlikely to have occurred by chance alone.

To determine if a result is statistically significant, a researcher would have to calculate a *p*-value, which is the probability of observing an effect given that the null hypothesis is true.^{[11]} The null hypothesis is rejected if the *p*-value is less than the significance or α level. The α level is the probability of rejecting the null hypothesis when it is true (type I error) and is usually set at 0.05 (5%), which is the most widely used.^{[2]} If the α level is 0.05, then the probability of committing a type I error is 5%.^{[12]} Thus, a statistically significant result is one in which the *p*-value for obtaining that result is less than 5%, which is formally written as *p* < 0.05^{[12]}

If the α level is set at 0.05, it means that the rejection region comprises 5% of the sampling distribution.^{[13]} This 5% can be allocated to one side of the sampling distribution as in a one-tailed test or partitioned to both sides of the distribution as in a two-tailed test, with each tail (or rejection region) comprising 2.5%. One-tailed tests are more powerful than two-tailed tests, as a null hypothesis can be rejected with a less extreme result.

## Defining significance in terms of sigma (σ)

{{#invoke:main|main}}
In specific fields such as particle physics and manufacturing, statistical significance is often expressed in units of standard deviation or sigma (σ) of a normal distribution, with significance thresholds set at a much stricter level (e.g., 5 sigma).^{[14]}^{[15]} For instance, the certainty of the Higgs boson particle's existence was based on the 5-sigma criterion, which corresponds to a *p*-value of about 1 in 3.5 million.^{[15]}^{[16]}

## Criticisms

{{#invoke:main|main}}

Researchers focusing solely on whether their results are statistically significant might report findings that are not necessarily substantive.^{[17]} To gauge the research significance of their result, researchers are also encouraged to report the effect-size along with p-values, as the former describes the strength of an effect such as the distance between two means and the correlation between two variables.^{[18]} p-values violate the likelihood principle.^{[19]}

## See also

{{#invoke:Portal|portal}}

- A/B testing
- ABX test
- Fisher's method for combining independent tests of significance
- Look-elsewhere effect
- Reasonable doubt
- Statistical hypothesis testing

## References

- ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑
^{2.0}^{2.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑
^{4.0}^{4.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑
^{5.0}^{5.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑
^{6.0}^{6.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑
^{7.0}^{7.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑
^{8.0}^{8.1}^{8.2}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑ {{#invoke:Citation/CS1|citation |CitationClass=journal }}
- ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑
^{12.0}^{12.1}{{#invoke:citation/CS1|citation |CitationClass=book }} Cite error: Invalid`<ref>`

tag; name "Healy" defined multiple times with different content - ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑
^{15.0}^{15.1}{{#invoke:citation/CS1|citation |CitationClass=book }} - ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑ {{#invoke:Citation/CS1|citation |CitationClass=journal }}
- ↑ {{#invoke:citation/CS1|citation |CitationClass=book }}
- ↑ {{#invoke:Citation/CS1|citation |CitationClass=journal }}

## Further reading

- Ziliak, Stephen, and McCloskey, Deirdre, (2008).
*The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives*. Ann Arbor, University of Michigan Press, 2009. - Thompson, Bruce, (2004). The "significance" crisis in psychology and education.
*Journal of Socio-Economics*, 33, pp. 607–613. - Chow, Siu L., (1996).
*Statistical Significance: Rationale, Validity and Utility,*Volume 1 of series*Introducing Statistical Methods,*Sage Publications Ltd, ISBN 978-0-7619-5205-3 – argues that statistical significance is useful in certain circumstances.

- Kline, Rex, (2004).
*Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research*Washington, DC: American Psychological Association.

## External links

- The article "Earliest Known Uses of Some of the Words of Mathematics (S)" contains an entry on Significance that provides some historical information.
- "The Concept of Statistical Significance Testing" (February 1994): article by Bruce Thompon hosted by the ERIC Clearinghouse on Assessment and Evaluation, Washington, D.C.
- "What does it mean for a result to be "statistically significant"?" (no date): an article from the Statistical Assessment Service at George Mason University, Washington, D.C.