# False discovery rate

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of findings (i.e. studies where the null-hypotheses are rejected), FDR procedures are designed to control the expected proportion of incorrectly rejected null hypotheses ("false discoveries").[1] FDR controlling procedures exert a less stringent control over false discovery compared to familywise error rate (FWER) procedures (such as the Bonferroni correction), which seek to reduce the probability of even one false discovery, as opposed to the expected proportion of false discoveries. Thus FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting the null hypothesis of no effect when it should fail to be rejected.[2]

## History

### Technological motivations

The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons).[3] By the late 1980s and 1990s, the development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform hundreds and thousands of statistical tests on a given data set. The technology of microarrays was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions.[4]

As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.g. thousands of gene expression levels). In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard multiple comparison procedures. This created a need within many scientific communities to abandon FWER and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than FWER in flagging possibly noteworthy observations. As a side effect, standard correction for multiple tests has literally disappeared from all but those publications which present results with huge sample sizes.

The false discovery rate concept was formally described by Yoav Benjamini and Yosi Hochberg in 1995[1] as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields (especially in the life sciences, from genetics to biochemistry, oncology and plant sciences).[3] In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers.[5]

### Related statistical concepts

Prior to the 1995 introduction of the FDR concept, various precursor ideas had been considered in the statistics literature. In 1979, Holm proposed the Holm procedure,[6] a stepwise algorithm for controlling the FWER that is at least as powerful as the well-known Bonferroni adjustment. This stepwise algorithm sorts the p-values and sequentially rejects the hypotheses starting from the smallest p-value.

Benjamini (2010)[3] said that the false discovery rate, and the paper Benjamini and Hochberg (1995), had its origins in two papers concerned with multiple testing:

• The first paper is by Schweder and Spjotvoll (1982)[7] who suggested plotting the ranked p-values and assessing the number of true null hypotheses (${\displaystyle m_{0}}$) via an eye-fitted line starting from the largest p-values. The p-values that deviate from this straight line then should correspond to the false null hypotheses. This idea was later developed into an algorithm and incorporated the estimation of ${\displaystyle m_{0}}$ into procedures such as Bonferroni, Holm or Hochberg.[8] This idea is closely related to the graphical interpretation of the BH procedure.
• The second paper is by Branko Soric (1989)[9] which introduced the terminology of "discovery" in the multiple hypothesis testing context. Soric used the expected number of false discoveries divided by the number of discoveries ${\displaystyle \left({\frac {E[V]}{R}}\right)}$ as a warning that "a large part of statistical discoveries may be wrong". This led Benjamini and Hochberg to the idea that a similar error rate, rather than being merely a warning, can serve as a worthy goal to control.

The q-value quantity (defined below) was first proposed by John Storey.[10]

## Definitions

### Classification of m hypothesis tests

{{#invoke:main|main}}

The following table gives a number of errors committed when testing ${\displaystyle m}$ null hypotheses. It defines some random variables that are related to the ${\displaystyle m}$ hypothesis tests.

Null hypothesis is True (H0) Alternative hypothesis is True (H1) Total
Declared significant ${\displaystyle V}$ ${\displaystyle S}$ ${\displaystyle R}$
Declared non-significant ${\displaystyle U}$ ${\displaystyle T}$ ${\displaystyle m-R}$
Total ${\displaystyle m_{0}}$ ${\displaystyle m-m_{0}}$ ${\displaystyle m}$

### The FDR

Based on previous definitions we can define ${\displaystyle Q}$ as the proportion of false discoveries among the discoveries ${\displaystyle \left(Q={\frac {V}{R}}\right)}$. And the false discovery rate is given by:[1]

And one wants to keep this value below a threshold ${\displaystyle \alpha }$ (or q).

### q-value

The q-value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.[10]

## Properties

The FDR is the expected proportion of false positives among all discoveries (rejected null hypotheses); for example, if the null hypotheses of 1000 hypothesis tests were experimentally rejected, and a maximum FDR level (q-value) for these tests was 0.10, then less than 100 of these rejections would be expected to be false positives.

Using a multiplicity procedure that controls the FDR criterion is adaptive and scalable. Meaning that controlling the FDR can be very permissive (if the data justify it), or conservative (acting close to control of FWER for sparse problem) - all depending on the number of hypotheses tested and the level of significance.[3]

The FDR criterion adapts so that the same number of false discoveries (V) will mean different things, depending on the total number of discoveries (R). This contrasts the family wise error rate criterion. For example, if inspecting 100 hypotheses (say, 100 genetic mutations or SNPs for association with some phenotype in some population):

• If we make 4 discoveries (R), having 2 of them be false discoveries (V) is often unbearable. Whereas,
• If we make 50 discoveries (R), having 2 of them be false discoveries (V) is often bearable.

The FDR criterion is scalable in that the same proportion of false discoveries out of the total number of discoveries (Q), remains sensible for different number of total discoveries (R). For example:

## Related error rates

The discovery of the FDR was preceded and followed by many other types of error rates. These include:

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} is defined as: ${\displaystyle k-FWER=P(V\geq k)\leq q}$.

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}) is defined as: ${\displaystyle k-FDR=E\left({\frac {V}{R}}1_{(V>k)}\right)\leq q}$.

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} is define as: ${\displaystyle FNR=E\left({\frac {T}{m-R}}\right)=E\left({\frac {m-{m_{0}}-\left({R-V}\right)}{m-R}}\right)}$

## References

1. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
2. Shaffer J.P. (1995) Multiple hypothesis testing, Annual Review of Psychology 46:561-584, Annual Reviews
3. Template:Cite doi
4. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
5. Template:Cite doi
6. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
7. Template:Cite doi
8. Template:Cite doi
9. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
10. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
11. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
12. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
13. {{#invoke:citation/CS1|citation |CitationClass=book }}
14. {{#invoke:Citation/CS1|citation |CitationClass=journal }} Cite error: Invalid <ref> tag; name "Efron2008" defined multiple times with different content
15. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
16. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
17. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
18. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
19. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
20. Template:Cite doi
21. Template:Cite doi
22. Template:Cite doi
23. Template:Cite doi
24. Template:Cite doi
25. Template:Cite doi
26. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
27. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
28. {{#invoke:citation/CS1|citation |CitationClass=book }}
29. Template:Cite doi
30. Template:Cite doi