Digital differential analyzer: Difference between revisions
en>Solarra m Reverted edits by 1.38.18.140 to last version by KATANAGOD (GLOO) |
en>Yobot m WP:CHECKWIKI error fixes using AWB (9075) |
||
Line 1: | Line 1: | ||
'''Grubbs' test''' (named after Frank E. Grubbs), also known as the '''maximum normed [[errors and residuals in statistics|residual]] test''' or '''extreme studentized deviate test''', is a [[Statistical hypothesis testing|statistical test]] used to detect [[outlier]]s in a [[univariate]] data set assumed to come from a [[normal distribution|normally distributed]] population. | |||
==Definition== | |||
Grubbs' test is based on the assumption of [[normal distribution|normality]]. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.<ref>Quoted from the ''Engineering and Statistics Handbook'', paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm</ref> | |||
Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers. | |||
Grubbs' test is defined for the [[statistical hypothesis|hypothesis]]: | |||
:H<sub>0</sub>: There are no outliers in the data set | |||
:H<sub>a</sub>: There is at least one outlier in the data set | |||
The Grubbs' test statistic is defined as: | |||
:<math> | |||
G = \frac{\displaystyle\max_{i=1,\ldots, N}\left \vert Y_i - \bar{Y}\right\vert}{s} | |||
</math> | |||
with <math>\overline{Y}</math> and ''s'' denoting the [[sample mean]] and [[standard deviation]], respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation. | |||
This is the two-sided version of the test. The Grubbs test can also be defined as a one-sided test. To test whether the minimum value is an outlier, the test statistic is | |||
:<math> | |||
G = \frac{\bar{Y}-Y_\min}{s} | |||
</math> | |||
with ''Y''<sub>min</sub> denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is | |||
:<math> | |||
G = \frac{Y_\max - \bar{Y}}{s} | |||
</math> | |||
with ''Y''<sub>max</sub> denoting the maximum value. | |||
For the [[two-sided test]], the hypothesis of no outliers is rejected at [[significance level]] α if | |||
:<math> | |||
G > \frac{N-1}{\sqrt{N}} \sqrt{\frac{t_{\alpha/(2N),N-2}^2}{N - 2 + t_{\alpha/(2N),N-2}^2}} | |||
</math> | |||
with ''t''<sub>α/(2''N''),''N''−2</sub> denoting the upper [[critical value]] of the [[t-distribution]] with ''N'' − 2 [[Degrees of freedom (statistics)|degrees of freedom]] and a significance level of α/(2''N''). For the one-sided tests, replace α/(2''N'') with α/''N''. | |||
==Related techniques== | |||
Several [[graphical technique]]s can, and should, be used to detect outliers. A simple [[run sequence plot]], a [[box plot]], or a [[histogram]] should show any obviously outlying points. A [[normal probability plot]] may also be useful. | |||
==See also== | |||
* [[Chauvenet's criterion]] | |||
* [[Peirce's criterion]] | |||
* [[Q test]] | |||
==References== | |||
<references> | |||
* {{cite journal|last=Grubbs|first=Frank|month=February|year=1969|title= Procedures for Detecting Outlying Observations in Samples|journal=Technometrics|volume=11|issue=1|pages=1–21|doi=10.2307/1266761|publisher=Technometrics, Vol. 11, No. 1|jstor=1266761}} | |||
* {{cite journal|last=Stefansky|first=W.|year=1972|title=Rejecting Outliers in Factorial Designs|journal=Technometrics|volume=14|pages=469–479|doi=10.2307/1267436|volume=14|issue=2|publisher=Technometrics, Vol. 14, No. 2|jstor=1267436}} | |||
{{NIST-PD}} | |||
</references> | |||
==External links== | |||
* [http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm Grubbs' Test for Outliers] | |||
* [http://www.graphpad.com/quickcalcs/Grubbs1.cfm Grubbs' Test online calculator] | |||
[[Category:Statistical tests]] | |||
[[Category:Statistical outliers]] |
Revision as of 17:01, 12 April 2013
Grubbs' test (named after Frank E. Grubbs), also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.
Definition
Grubbs' test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.[1]
Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers.
Grubbs' test is defined for the hypothesis:
- H0: There are no outliers in the data set
- Ha: There is at least one outlier in the data set
The Grubbs' test statistic is defined as:
with and s denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.
This is the two-sided version of the test. The Grubbs test can also be defined as a one-sided test. To test whether the minimum value is an outlier, the test statistic is
with Ymin denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is
with Ymax denoting the maximum value.
For the two-sided test, the hypothesis of no outliers is rejected at significance level α if
with tα/(2N),N−2 denoting the upper critical value of the t-distribution with N − 2 degrees of freedom and a significance level of α/(2N). For the one-sided tests, replace α/(2N) with α/N.
Related techniques
Several graphical techniques can, and should, be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. A normal probability plot may also be useful.
See also
References
- ↑ Quoted from the Engineering and Statistics Handbook, paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm