|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| [[File:Rozklad benforda.svg|thumb|A sequence of decreasing blue bars against a light gray grid background|The distribution of first digits, according to Benford's law. Each bar represents a digit, and the height of the bar is the percentage of numbers that start with that digit.]]
| | Oscar is how he's called and he completely enjoys this title. For many years I've home std test kit been working as a payroll clerk. Her husband and her live in Puerto std testing at home Rico but she will have to [http://www.chw.org/display/PPF/DocID/22764/router.asp transfer] one day or an additional. One of the issues she loves most is to do aerobics and now she is attempting to earn money with it.<br><br>Look into my web blog - [http://delhipower.in/stay-yeast-infection-free-with-one-of-these-helpful-suggestions/ at home std testing] |
| [[File:Benford-physical.svg|thumb|Frequency of first significant digit of physical constants plotted against Benford's law]]
| |
| {{Use dmy dates|date=June 2013}}
| |
| {{For|the tongue-in-cheek "law" about controversy|Benford's law of controversy}}
| |
| | |
| '''Benford's Law''', also called the '''First-Digit Law''', refers to the frequency distribution of digits in many (but not all) real-life sources of [[data]]. In this distribution, the number {{formatnum:1}} occurs as the leading digit about 30% of the time, while larger numbers occur in that position less frequently: {{formatnum:9}} as the first digit less than 5% of the time. Benford's Law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.
| |
| | |
| This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, [[physical constant|physical]] and [[mathematical constant]]s, and processes described by [[power law]]s (which are very common in nature). It tends to be most accurate when values are distributed across multiple [[Order of magnitude|orders of magnitude]].
| |
| | |
| The graph below shows Benford's Law for [[radix|base 10]]. There is a generalization of the law to numbers expressed in other bases (for example, [[hexadecimal|base 16]]), and also a generalization to second digits and later digits.
| |
| | |
| It is named after physicist [[Frank Benford]], who stated it in 1938,<ref name=Benford>{{Cite journal
| |
| | author = [[Frank Benford]]
| |
| | title = The law of anomalous numbers
| |
| | journal = [[Proceedings of the American Philosophical Society]]
| |
| | volume = 78
| |
| | issue = 4
| |
| |date=March 1938
| |
| | pages = 551–572
| |
| | jstor=984802
| |
| }} (subscription required)</ref>
| |
| although it had been previously stated by [[Simon Newcomb]] in 1881.<ref name=Newcomb>{{Cite journal
| |
| | author = [[Simon Newcomb]]
| |
| | title = Note on the frequency of use of the different digits in natural numbers
| |
| | journal = [[American Journal of Mathematics]]
| |
| | volume = 4
| |
| | issue = 1/4
| |
| | year = 1881
| |
| | pages = 39–40
| |
| | doi = 10.2307/2369148
| |
| | publisher = American Journal of Mathematics, Vol. 4, No. 1
| |
| | jstor = 2369148
| |
| }} (subscription required)</ref>
| |
| | |
| == Mathematical statement ==
| |
| [[File:Logarithmic scale.png|thumb|upright=1.5|Rectangle with offset bolded axis in lower left, and light gray lines representing logarithms|A [[logarithmic scale]] bar. Picking a random ''x'' position [[Uniform distribution (continuous)|uniformly]] on this number line, roughly 30% of the time the first digit of the number will be 1.]]
| |
| A set of numbers is said to satisfy Benford's Law if the leading digit ''d'' (''d'' ∈ {1, ..., 9}) occurs with [[probability]]
| |
| | |
| : <math>P(d)=\log_{10}(d+1)-\log_{10}(d)=\log_{10} \left(\frac{d+1}{d}\right)=\log_{10} \left(1+\frac{1}{d}\right).</math>
| |
| | |
| Numerically, the leading digits have the following distribution in Benford's Law, where ''d'' is the leading digit and ''P''(''d'') the probability:
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| ! ''d'' !! ''P''(''d'') !! Relative size of ''P''(''d'')
| |
| |-
| |
| | 1 || style="text-align:right;"| {{bartable|30.1|%|10}}
| |
| |-
| |
| | 2 || style="text-align:right;"| {{bartable|17.6|%|10}}
| |
| |-
| |
| | 3 || style="text-align:right;"| {{bartable|12.5|%|10}}
| |
| |-
| |
| | 4 || style="text-align:right;"| {{bartable| 9.7|%|10}}
| |
| |-
| |
| | 5 || style="text-align:right;"| {{bartable| 7.9|%|10}}
| |
| |-
| |
| | 6 || style="text-align:right;"| {{bartable| 6.7|%|10}}
| |
| |-
| |
| | 7 || style="text-align:right;"| {{bartable| 5.8|%|10}}
| |
| |-
| |
| | 8 || style="text-align:right;"| {{bartable| 5.1|%|10}}
| |
| |-
| |
| | 9 || style="text-align:right;"| {{bartable| 4.6|%|10}}
| |
| |}
| |
| | |
| The quantity ''P''(''d'') is proportional to the space between ''d'' and ''d'' + 1 on a [[logarithmic scale]]. Therefore, this is the distribution expected if the [[significand|mantissae]] of the ''logarithms'' of the numbers (but not the numbers themselves) are [[Uniform distribution (continuous)|uniformly and randomly distributed]]. For example, a number ''x'', constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ ''x'' < 2, and starts with the digit 9 if 9 ≤ ''x'' < 10. Therefore, ''x'' starts with the digit 1 if log 1 ≤ log ''x'' < log 2, or starts with 9 if log 9 ≤ log ''x'' < log 10. The interval [log 1, log 2] is much wider than the interval [log 9, log 10] (0.30 and 0.05 respectively); therefore if log ''x'' is uniformly and randomly distributed, it is much more likely to fall into the wider interval than the narrower interval, i.e. more likely to start with 1 than with 9. The probabilities are proportional to the interval widths, and this gives the equation above. (The above discussion assumed ''x'' is between 1 and 10, but the result is the same no matter how many digits ''x'' has before the decimal point.)
| |
| | |
| An extension of Benford's Law predicts the distribution of first digits in other [[radix|bases]] besides [[decimal]]; in fact, any base ''b'' ≥ 2. The general form is:
| |
| : <math>P(d)=\log_{b}(d+1)-\log_{b}(d)=\log_{b} \left(1+\frac{1}{d}\right).</math>
| |
| For ''b'' = 2 (the [[Binary numeral system|binary number system]]), Benford's Law is true but trivial: All binary numbers (except for 0) start with the digit 1. (On the other hand, the [[#Generalization to digits beyond the first|generalization of Benford's law to second and later digits]] is not trivial, even for binary numbers.) Also, Benford's Law does not apply to [[Unary numeral system|unary systems]] such as [[tally marks]].
| |
| | |
| ==Example==
| |
| [[File:Benfords law illustrated by world's countries population.png|Distribution of first digits (in %, red bars) in the [[List of countries by population|population of the 237 countries]] of the world. Black dots indicate the distribution predicted by Benford's Law.|thumb|right]]
| |
| Examining a list of the heights of the [[List of tallest buildings and structures in the world#Tallest structure by category|60 tallest structures in the world by category]] shows that 1 is by far the most common leading digit, ''irrespective of the unit of measurement'':
| |
| {| class="wikitable"
| |
| |-
| |
| ! rowspan="2" style="width:17%;"| Leading digit
| |
| ! colspan=2 | meters
| |
| ! colspan=2 | feet
| |
| ! rowspan="2" style="width:17%;"| In Benford's law
| |
| |-
| |
| ! style="width:16%;"| Count
| |
| ! style="width:17%;"| %
| |
| ! style="width:16%;"| Count
| |
| ! style="width:17%;"| %
| |
| |-
| |
| | 1
| |
| | 26
| |
| | 43.3%
| |
| | 18
| |
| | 30.0%
| |
| | 30.1%
| |
| |-
| |
| | 2
| |
| | 7
| |
| | 11.7%
| |
| | 8
| |
| | 13.3%
| |
| | 17.6%
| |
| |-
| |
| | 3
| |
| | 9
| |
| | 15.0%
| |
| | 8
| |
| | 13.3%
| |
| | 12.5%
| |
| |-
| |
| | 4
| |
| | 6
| |
| | 10.0%
| |
| | 6
| |
| | 10.0%
| |
| | 9.7%
| |
| |-
| |
| | 5
| |
| | 4
| |
| | 6.7%
| |
| | 10
| |
| | 16.7%
| |
| | 7.9%
| |
| |-
| |
| | 6
| |
| | 1
| |
| | 1.7%
| |
| | 5
| |
| | 8.3%
| |
| | 6.7%
| |
| |-
| |
| | 7
| |
| | 2
| |
| | 3.3%
| |
| | 2
| |
| | 3.3%
| |
| | 5.8%
| |
| |-
| |
| | 8
| |
| | 5
| |
| | 8.3%
| |
| | 1
| |
| | 1.7%
| |
| | 5.1%
| |
| |-
| |
| | 9
| |
| | 0
| |
| | 0.0%
| |
| | 2
| |
| | 3.3%
| |
| | 4.6%
| |
| |}
| |
| | |
| ==History==
| |
| The discovery of Benford's Law goes back to 1881, when the American astronomer [[Simon Newcomb]] noticed that in [[logarithm]] tables (used at that time to perform calculations) the earlier pages (which contained numbers that started with 1) were much more worn than the other pages.<ref name=Newcomb /> Newcomb's published result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number ''N'' being the first digit of a number was equal to log(''N'' + 1) − log(''N'').
| |
| | |
| The phenomenon was again noted in 1938 by the physicist [[Frank Benford]],<ref name=Benford/> who tested it on data from 20 different domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 [[physical constant]]s, 1800 [[molecular weight]]s, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of ''[[Reader's Digest]]'', the street addresses of the first 342 persons listed in ''American Men of Science'' and 418 death rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford (making it an example of [[Stigler's Law]]).
| |
| | |
| In 1995, [[Ted Hill (mathematician)|Ted Hill]] proved the result about mixed distributions mentioned [[#Multiple probability distributions|below]].<ref name=Hill1995/>
| |
| | |
| ==Explanations==
| |
| Benford's Law has been explained in various ways.
| |
| | |
| ===Outcomes of exponential growth processes===
| |
| The precise form of Benford's Law can be explained if one assumes that the [[Significand|mantissae]] of the ''logarithms'' of the numbers are uniformly distributed; and this is likely to be approximately true if the numbers range over several orders of magnitude. For many sets of numbers, especially sets that [[exponential growth|grow exponentially]] such as incomes and stock prices, this is a reasonable assumption.
| |
| | |
| For example, if a quantity increases continuously and doubles every year, then it will be twice its original value after one year, four times its original value after two years, eight times its original value after three years, and so on. When this quantity reaches a value of 100, the value will have a leading digit of 1 for a year, reaching 200 at the end of the year. Over the course of the next year, the value increases from 200 to 400; it will have a leading digit of 2 for a little over seven months, and 3 for the remaining five months. In the third year, the leading digit will pass through 4, 5, 6, and 7, spending less and less time with each succeeding digit, reaching 800 at the end of the year. Early in the fourth year, the leading digit will pass through 8 and 9. The leading digit returns to 1 when the value reaches 1000, and the process starts again, taking a year to double from 1000 to 2000.
| |
| From this example, it can be seen that if the value is sampled at uniformly distributed random times throughout those years, it is more likely to be measured when the leading digit is 1, and successively less likely to be measured with higher leading digits.
| |
| | |
| This example makes it plausible that data tables that involve measurements of exponentially growing quantities will agree with Benford's Law. But the law also appears to hold for many cases where an exponential growth pattern is not obvious.
| |
| | |
| ===Scale invariance===
| |
| If there is a list of lengths, the distribution of first digits of numbers in the list may be generally similar regardless of whether all the lengths are expressed in metres, or yards, or feet, or inches, etc. To the extent that the distribution of first digits of a data set is [[scale invariant]], the distribution of first digits is the same regardless of the units that the data are expressed in. This implies that the distribution of first digits is given by Benford's Law.<ref>Roger S. Pinkham, [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177704862 On the Distribution of First Significant Digits], Ann. Math. Statist. Volume 32, Number 4 (1961), 1223-1230.</ref><ref name=wolfram>[http://mathworld.wolfram.com/BenfordsLaw.html MathWorld – Benford's Law]</ref> To be sure of approximate agreement with Benford's Law, the data has to be approximately invariant when scaled up by any factor up to 10. A [[lognormal]]ly distributed data set with wide dispersion has this approximate property, as do some of the examples mentioned above.
| |
| | |
| This means that if one converts from feet to [[yard]]s (multiplication by a constant), for example, the distribution of first digits must be unchanged — it is [[scale invariant]], and the only continuous distribution that fits this is one whose logarithm is uniformly distributed. For example, the first (non-zero) digit of the [[length]]s or [[distance]]s of objects should have the same distribution whether the unit of measurement is feet or yards, or anything else. But there are three feet in a yard, so the probability that the first digit of a length in yards is 1 must be the same as the probability that the first digit of a length in feet is 3, 4, or 5. Applying this to all possible measurement scales gives a logarithmic distribution, and combined with the fact that log<sub>10</sub>(1) = 0 and log<sub>10</sub>(10) = 1 gives Benford's Law. That is, if there is a scale invariant distribution of first digits, it must apply to a set of data regardless of what measuring units are used, and the only distribution of first digits that fits that is Benford's Law.
| |
| | |
| ===Multiple probability distributions===
| |
| [[File:BenfordDensities.png|thumb|For each positive integer ''n'', this graph shows the probability that a random integer between 1 and ''n'' starts with each of the nine possible digits. For any particular value of ''n'', the probabilities do not precisely satisfy Benford's Law; however, looking at a variety of different values of ''n'' and averaging the probabilities for each, the resulting probabilities ''do'' exactly satisfy Benford's Law.{{citation needed|date=October 2012}}]]
| |
| | |
| For numbers drawn from certain distributions (IQ scores, human heights) the Law fails to hold because these variates obey a normal distribution which is known not to satisfy Benford's Law,<ref name=Formann2010>{{Cite journal
| |
| | last1 = Formann | first1 = A. K.
| |
| | title = The Newcomb-Benford Law in Its Relation to Some Common Distributions
| |
| | doi = 10.1371/journal.pone.0010541
| |
| | journal = PLoS ONE
| |
| | volume = 5
| |
| | issue = 5
| |
| | pages = e10541
| |
| | year = 2010
| |
| | pmid = 20479878
| |
| | pmc = 2866333
| |
| | editor1-last = Morris
| |
| | editor1-first = Richard James
| |
| |bibcode = 2010PLoSO...510541F }}</ref> since normal distributions can't span several orders of magnitude and the mantissae of their logarithms will not be (even approximately) uniformly distributed.
| |
| | |
| However, if one "mixes" numbers from those distributions, for example by taking numbers from newspaper articles, Benford's Law reappears. This can also be proven mathematically: if one repeatedly "randomly" chooses a [[probability distribution]] (from an uncorrelated set) and then randomly chooses a number according to that distribution, the resulting list of numbers will obey Benford's Law.<ref name=Hill1995>{{Cite journal
| |
| | author = [[Theodore P. Hill]]
| |
| | title = A Statistical Derivation of the Significant-Digit Law
| |
| | journal = Statistical Science
| |
| | volume = 10
| |
| | pages = 354–363
| |
| | year = 1995
| |
| | url = http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1042&context=rgp_rsr
| |
| |format=PDF}}</ref><ref name=Hill1998>{{Cite journal
| |
| | author = [[Theodore P. Hill]]
| |
| | title = The first digit phenomenon
| |
| | journal = [[American Scientist]]
| |
| | volume = 86
| |
| |date=July–August 1998
| |
| | page = 358
| |
| | url = http://people.math.gatech.edu/~hill/publications/PAPER%20PDFS/TheFirstDigitPhenomenonAmericanScientist1996.pdf
| |
| |format=PDF
| |
| | bibcode = 1998AmSci..86..358H
| |
| | doi = 10.1511/1998.4.358
| |
| | issue = 4}}</ref> A similar probabilistic explanation for the appearance of Benford's Law in everyday-life numbers has been advanced by showing that it arises naturally when one considers mixtures of uniform distributions.<ref>Élise Janvresse and Thierry de la Rue (2004), "From Uniform Distributions to Benford's Law", ''Journal of Applied Probability'', 41 1203–1210 {{doi|10.1239/jap/1101840566}} {{MR|2122815}} [https://www.univ-rouen.fr/LMRS/Persopage/Delarue/Publis/PDF/uniform_distribution_to_Benford_law.pdf preprint]</ref>
| |
| | |
| ==Applications==
| |
| | |
| ===Accounting fraud detection===
| |
| In 1972, [[Hal Varian]] suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's Law ought to show up any anomalous results.<ref>{{Cite journal|first=Hal |last=Varian |authorlink=Hal Varian |title=Benford's Law (Letters to the Editor) |journal=[[The American Statistician]]|year=1972 |issue=3 |volume=26 |page=65 |doi=10.1080/00031305.1972.10478934 |url=http://www.tandfonline.com/doi/pdf/10.1080/00031305.1972.10478934 }}</ref> Following this idea, [[Mark_Nigrini|Mark Nigrini]] showed that Benford's Law could be used in [[forensic accounting]] and [[audit]]ing as an indicator of accounting and expenses fraud.<ref name=Nigrini>{{Cite journal
| |
| | first = Mark J. |last=Nigrini
| |
| | title = I've Got Your Number:How a mathematical phenomenon can help CPAs uncover fraud and other irregulaities
| |
| | journal = Journal of Accountancy | |
| |date=May 1999
| |
| | url = http://www.journalofaccountancy.com/Issues/1999/May/nigrini
| |
| }}</ref>
| |
| In practice, applications of Benford's Law for fraud detection routinely use more than the first digit.<ref name=Nigrini/>
| |
| | |
| ===Legal status===
| |
| In the United States, evidence based on Benford's Law has been admitted in criminal cases at the federal, state, and local levels.<ref>{{cite episode
| |
| | url = http://www.wnyc.org/shows/radiolab/episodes/2009/10/09/segments/137643
| |
| | title =From Benford to Erdös
| |
| | series = Radio Lab
| |
| | serieslink = Radio Lab
| |
| | airdate = 2009-09-30
| |
| | season =
| |
| | number = 2009-10-09 }}</ref>
| |
| | |
| ===Election data===
| |
| Benford's Law has been invoked as evidence of fraud in the [[Iranian presidential election, 2009|2009 Iranian elections]],<ref>Stephen Battersby [http://www.newscientist.com/article/mg20227144.000-statistics-hint-at-fraud-in-iranian-election.html Statistics hint at fraud in Iranian election] ''New Scientist'' 24 June 2009</ref> and also used to analyze other election results. However, other experts consider Benford's Law essentially useless as a statistical indicator of election fraud in general.<ref>Joseph Deckert, Mikhail Myagkov and Peter C. Ordeshook, (2010) ''[http://www.vote.caltech.edu/drupal/node/327 The Irrelevance of Benford’s Law for Detecting Fraud in Elections]'', Caltech/MIT Voting Technology Project Working Paper No. 9</ref><ref>Charles R. Tolle, Joanne L. Budzien, and Randall A. LaViolette (2000) ''[http://dx.doi.org/10.1063/1.166498 Do dynamical systems follow Benford?s Law?]'', Chaos 10, 2, pp.331–336 (2000); {{doi|10.1063/1.166498}}</ref>
| |
| | |
| ===Macroeconomic data===
| |
| Similarly, the macroeconomic data the Greek government reported to the European Union before entering the Euro Zone was shown to be probably fraudulent using Benford's Law, albeit years after the country joined.<ref>Müller, Hans Christian: ''[http://www.forbes.com/sites/timworstall/2011/09/12/greece-was-lying-about-its-budget-numbers/ Greece Was Lying About Its Budget Numbers]''. ''[[Forbes]]''. 12 September 2011.</ref>
| |
| | |
| ===Genome data===
| |
| The number of [[open reading frame]]s and their relationship to genome size differs between [[eukaryote]]s and [[prokaryote]]s with the former showing a log-linear relationship and the latter a linear relationship. Benford's Law has been used to test this observation with an excellent fit to the data in both cases.<ref name=Friar2012>Friar JL, Goldman T, Pérez-Mercader J (2012) Genome sizes and the benford distribution. PLoS One 7(5):e36624. {{doi|10.1371/journal.pone.0036624}}</ref>
| |
| | |
| ===Scientific fraud detection===
| |
| A test of regression coefficients in published papers showed agreement with Benford's law.<ref name=Diekmann2007>Diekmann A (2007) Not the First Digit! Using Benford's Law to detect fraudulent scientific data. J Appl Stat 34 (3) 321–329, {{doi|10.1080/02664760601004940}}</ref> As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results failed to obey Benford's law.
| |
| | |
| ==Limitations==
| |
| Benford's law can only be applied to data that are distributed across multiple orders of magnitude. For instance, one might expect that Benford's law would apply to a list of numbers representing the populations of UK villages beginning with 'A', or representing the values of small insurance claims. But if a "village" is defined as a settlement with population between 300 and 999, or a "small insurance claim" is defined as a claim between $50 and $100, then Benford's law will not apply.<ref name=dspguide>{{cite web|url=http://www.dspguide.com/ch34.htm |title=The Scientist and Engineer's Guide to Digital Signal Processing, chapter 34, Explaining Benford's Law |author=Steven W. Smith |accessdate=15 December 2012}} (especially [http://www.dspguide.com/ch34/10.htm section 10]).</ref><ref name=fewster>{{Cite journal|first=R. M. |last=Fewster |title=A simple explanation of Benford's Law |journal=The American Statistician |year=2009 |volume=63 |issue=1 |pages=26–32 |doi=10.1198/tast.2009.0005 |postscript=<!--None--> }}</ref> More generally, if there is ''any'' cut-off which excludes a portion of the underlying data above a maximum value or below a minimum value, then the law will not apply.
| |
| | |
| Consider the probability distributions shown below, plotted on a [[log scale]].<ref>Note that if you have a regular probability distribution (on a linear scale), you have to multiply it by a certain function to get a proper probability distribution on a log scale: The log scale distorts the horizontal distances, so the height has to be changed also, in order for the area under each section of the curve to remain true to the original distribution. See, for example, [http://www.dspguide.com/ch34/4.htm]</ref>
| |
| In each case, the total area in red is the relative probability that the first digit is 1, and the total area in blue is the relative probability that the first digit is 8.
| |
| | |
| {|
| |
| [[File:BenfordBroad.gif|thumb|left|300px|A broad probability distribution on a log scale]]
| |
| |
| |
| [[File:BenfordNarrow.gif|thumb|left|300px|A narrow probability distribution on a log scale]]
| |
| |}
| |
| | |
| For the left distribution, the size of the ''areas'' of red and blue are approximately proportional to the ''widths'' of each red and blue bar. Therefore the numbers drawn from this distribution will approximately follow Benford's law. On the other hand, for the right distribution, the ratio of the areas of red and blue is very different from the ratio of the widths of each red and blue bar. Rather, the relative areas of red and blue are determined more by the ''height'' of the bars than the widths. The heights, unlike the widths, do not satisfy the universal relationship of Benford's law; instead, they are determined entirely by the shape of the distribution in question. Accordingly, the first digits in this distribution do not satisfy Benford's law at all.<ref name=fewster />
| |
| | |
| Thus, real-world distributions that span several [[orders of magnitude]] rather smoothly (e.g. populations of settlements, provided that there is no lower limit) are likely to satisfy Benford's law to a very good approximation. On the other hand, a distribution that covers only one or two orders of magnitude (e.g. heights of human adults, or IQ scores) is unlikely to satisfy Benford's law well.<ref name=dspguide /><ref name=fewster />
| |
| | |
| ==Statistical tests==
| |
| Statistical tests examining the fit of Benford's law to data have more power when the data values span several orders of magnitude. Since many data samples typically do not have this range, numerical transformation of the data to a base other than 10 may be useful before testing.{{citation needed|date=June 2012}}
| |
| | |
| Although the [[chi square]]d test has been used to test for compliance with Benford's law it has low statistical power when used with small samples.
| |
| | |
| The [[Kolmogorov–Smirnov test]] and the [[Kuiper test]] are more powerful when the sample size is small particularly when Stephens's corrective factor is used.<ref name=Stephens1970>{{cite journal |last=Stephens |first=M. A. |year=1970 |title=Use of the Kolmogorov–Smirnov, Cramér–Von Mises and Related Statistics without Extensive Tables |journal=[[Journal of the Royal Statistical Society, Series B]] |volume=32 |issue=1 |pages=115–122 |url=http://ebookbrowse.com/stephens-1970-use-of-the-kolmogorov-smirnov-cramer-von-mises-and-related-statistics-without-extensive-tables-pdf-d15049209 |accessdate=2013-03-09}}</ref> These tests may be overly conservative when applied to discrete distribution. Values for the Benford test have been generated by Morrow.<ref name=Morrow2010>Morrow, J. (2010) [http://www.johnmorrow.info/projects/benford/benfordMain.pdf "Benford’s Law, Families of Distributions and a test basis"], UW-Madison</ref> The critical values of the test statistics are shown below:
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| !
| |
| ! α = 0.10
| |
| ! α = 0.05
| |
| ! α = 0.01
| |
| |-
| |
| | Kuiper Test
| |
| | 1.191
| |
| | 1.321
| |
| | 1.579
| |
| |-
| |
| | Kolmogorov–Smirnov
| |
| | 1.012
| |
| | 1.148
| |
| | 1.420
| |
| |}
| |
| | |
| Two alternative tests specific to this law have been published: first, the max (''m'') statistic<ref name=Leemis2000>{{cite journal |last1=Leemis |first1=L. M. |last2=Schmeiser |first2=B. W. |last3=Evans |first3=D. L. |year=2000 |title=Survival distributions satisfying Benford's Law |journal=The Amererican Statistician |volume=54 |issue=4|pages=236–241 |doi=10.1080/00031305.2000.10474554}}</ref> is given by
| |
| :<math>m = \operatorname*{max}_{i=1}^{9} \Big\{\Pr (X \text{ has FSD}=i)-\log_{10}(1+1/i) \Big\},</math>
| |
| and secondly, the distance (''d'') statistic<ref name=Cho2007>{{cite journal |last1=Cho |first1=W. K. T. |last2=Gaines |first2=B. J. |year=2007 |title=Breaking the (Benford) law: Statistical fraud detection in campaign finance |journal=The Amererican Statistician |volume=61 |pages=218–223 |doi=10.1198/000313007X223496 |issue=3}}</ref> is given by
| |
| :<math>d=\sqrt{\sum_{i=1}^{9}\Big[\Pr ( X \text{ has FSD}=i ) - \log_{10}(1+1/i) \Big]^{2}},</math>
| |
| where FSD is the First Significant Digit. Morrow has determined the critical values for both these statistics, which are shown below:<ref name="Morrow2010" />
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| !
| |
| ! α = 0.10
| |
| ! α = 0.05
| |
| ! α = 0.01
| |
| |-
| |
| | Leemis' ''m''
| |
| | 0.851
| |
| | 0.967
| |
| | 1.212
| |
| |-
| |
| | Cho–Gaines' ''d''
| |
| | 1.212
| |
| | 1.330
| |
| | 1.569
| |
| |}
| |
| | |
| Nigrini<ref name=Nigrini1996>{{cite journal |last=Nigrini |first=M. |year=1996 |title=A taxpayer compliance application of Benford's Law |journal=J Amer Tax Assoc |volume=18 |pages=72–91}}</ref> has suggested the use of a ''z'' statistic
| |
| | |
| : <math> z = \frac { \, | p_o - p_e | - \frac{1}{2n} \, } { s_i } </math>
| |
| with
| |
| : <math> s_i = \left[ \frac { p_e ( 1 - p_e ) } { n } \right]^{1/2}, </math>
| |
| | |
| where |''x''| is the absolute value of ''x'', ''n'' is the sample size, 1/(2''n'') is a continuity correction factor, ''p''<sub>e</sub> is the proportion expected from Benford's law and ''p''<sub>o</sub> is the observed proportion in the sample.
| |
| | |
| Morrow has also shown that for any random variable ''X'' (with a continuous pdf) divided by its standard deviation (''σ''), a value ''A'' can be found such that the probability of the distribution of the first significant digit of the random variable ( ''X'' / ''σ'' )<sup>''A''</sup> will differ from Benford's Law by less than ''ε'' > 0.<ref name="Morrow2010" /> The value of ''A'' depends on the value of ''ε'' and the distribution of the random variable.
| |
| | |
| A method of accounting fraud detection based on bootstrapping and regression has been proposed.<ref name=Suh2011>{{cite journal |last1=Suh |first1=I. S. |last2=Headrick |first2=T. C. |last3=Minaburo |first3=S. |year=2011 |title=An effective and efficient analytic technique: A bootstrap regression procedure and Benford's Law |journal=J Forensic & Investigative Accounting |volume=3 |issue=3}}</ref>
| |
| | |
| ==Generalization to digits beyond the first==
| |
| It is possible to extend the law to digits beyond the first.<ref name=Hill1995sigdig>[[Theodore P. Hill]], "The Significant-Digit Phenomenon", The American Mathematical Monthly, Vol. 102, No. 4, (Apr., 1995), pp. 322–327. [http://www.jstor.org/stable/2974952 Official web link (subscription required)]. [http://www.math.gatech.edu/~hill/publications/cv.dir/digit.pdf Alternate, free web link].</ref> In particular, the probability of encountering a number starting with the string of digits ''n'' is given by:
| |
| | |
| :<math> \log_{ 10 } \left( n + 1 \right )- \log_{ 10 } \left( n \right ) = \log_{ 10 } \left( 1 + \frac{ 1 }{ n } \right)</math>
| |
| | |
| (For example, the probability that a number starts with the digits 3, 1, 4 is log<sub>10</sub>(1 + 1/314) ≈ 0.0014.) This result can be used to find the probability that a particular digit occurs at a given position within a number. For instance, the probability that a "2" is encountered as the second digit is<ref name=Hill1995sigdig />
| |
| | |
| :<math> \log_{ 10 } \left( 1 + \frac{ 1 }{ 12 } \right ) + \log_{ 10 } \left( 1 + \frac{ 1 }{ 22 } \right )+ \cdots + \log_{ 10 } \left( 1 + \frac{ 1 }{ 92 } \right) \approx 0.109 </math>
| |
| | |
| And the probability that ''d'' (''d'' = 0, 1, ..., 9) is encountered as the ''n''-th (''n'' > 1) digit is
| |
| | |
| :<math> \sum_{ k = 10^{ n - 2 } }^{ 10^{ n - 1 } - 1 } \log_{ 10 } \left( 1 + \frac{ 1 }{ 10k + d } \right )</math>
| |
| | |
| The distribution of the ''n''-th digit, as ''n'' increases, rapidly approaches a uniform distribution with 10% for each of the ten digits.<ref name=Hill1995sigdig /> Four digits is often enough to assume a uniform distribution of 10% as '0' appears 10.0176% of the time in the fourth digit while '9' appears 9.9824% of the time.
| |
| | |
| ==Tests of Benford's Law with common distributions==
| |
| Benford's Law was empirically tested against the numbers (up to the 10th digit) generated by a number of important distributions. including the [[uniform distribution (discrete)|uniform distribution]], the [[exponential distribution]], the [[half-normal distribution]], the [[Truncated normal distribution|right-truncated normal]], the [[normal distribution]], the [[chi square distribution]] and the [[log normal distribution]] <ref name=Formann2010>Formann AK (2010) The Newcomb-Benford Law in its relation to some common distributions. PLoS 5(5): e10541. {{doi|10.1371/journal.pone.0010541}}</ref> In addition to these the [[ratio distribution]] of two uniform distributions, the ratio distribution of two exponential distributions, the ratio distribution of two half-normal distributions, the ratio distribution of two right-truncated normal distributions, the ratio distribution of two chi-square distributions (the [[F distribution]]) and the [[log normal]] distribution were tested.
| |
| | |
| The uniform distribution as might be expected does not obey Benford's Law. In contrast, the ratio distribution of two uniform distributions is well described by Benford's Law. Benford's Law also describes the exponential distribution and the ratio distribution of two exponential distributions well. Although the half-normal distribution does not obey Benford's Law, the ratio distribution of two half-normal distributions does. Neither the right-truncated normal distribution nor the ratio distribution of two right-truncated normal distributions are well described by Benford's Law. This is not surprising as this distribution is weighted towards larger numbers. Neither the normal distribution nor the ratio distribution of two normal distributions (the [[Cauchy distribution]]) obey Benford's Law. The fit of chi square distribution depends on the [[degrees of freedom (statistics)|degrees of freedom]] (df) with good agreement with df = 1 and decreasing agreement as the df increases. The F distribution is fitted well for low degrees of freedom. With increasing dfs the fit decreases but much more slowly than the chi square distribution. The fit of the log-normal distribution depends on the [[mean]] and the [[variance]] of the distribution. The variance has a much greater effect on the fit than does the mean. Larger values of both parameters result in better agreement with the law. The ratio of two log normal distributions is a log normal so this distribution was not examined.
| |
| | |
| Other distributions that have been examined include the [[Muth distribution]], [[Gompertz distribution]], [[Weibull distribution]], [[gamma distribution]], [[log-logistic distribution]] and the [[exponential power distribution]] all of which show reasonable agreement with the law.<ref name=Leemis2000>Leemis LM, Schmeiser BW, Evans DL (2000) Survival distributions satisfying Benford's Law. Am Stat 54: 236–241</ref><ref name="Dümbgen2008">Dümbgen L, Leuenberger C (2008) "Explicit bounds for the approximation error in Benford's Law". ''Elect Comm in Probab'', 13: 99–112 {{doi|10.1214/ECP.v13-1358}}</ref> The [[Gumbel distribution]] – a density increases with increasing value of the random variable – does not show agreement with this law.<ref name="Dümbgen2008"/>
| |
| | |
| ==Distributions known to obey Benford's Law==
| |
| Some well-known infinite [[integer sequence]]s {{not a typo|provably}} satisfy Benford's Law exactly (in the [[asymptotic limit]] as more and more terms of the sequence are included). Among these are the [[Fibonacci number]]s,<ref>L. C. Washington, "Benford's Law for Fibonacci and Lucas Numbers", ''[[The Fibonacci Quarterly]]'', '''19.2''', (1981), 175–177</ref><ref>R. L. Duncan, "An Application of Uniform Distribution to the Fibonacci Numbers", ''[[The Fibonacci Quarterly]]'', '''5''', (1967), 137–140</ref> the [[factorial]]s,<ref>P. B. Sarkar, "An Observation on the Significant Digits of Binomial Coefficients and Factorials", ''[[Sankhya]]'' B, '''35''', (1973), 363–364</ref> the powers of 2,<ref name=powers>In general, the sequence ''k''<sup>1</sup>, ''k''<sup>2</sup>, ''k''<sup>3</sup>, etc., satisfies Benford's Law exactly, under the condition that log<sub>10</sub> ''k'' is an [[irrational number]]. This is a straightforward consequence of the [[equidistribution theorem]].</ref><ref>That the first 100 powers of 2 approximately satisfy Benford's Law is mentioned by Ralph Raimi. Ralph A. Raimi (1976) "The First Digit Problem", ''[[American Mathematical Monthly]]'', '''83''' (7 ), 521–538</ref> and the powers of almost any other number.<ref name=powers />
| |
| | |
| Likewise, some continuous processes satisfy Benford's Law exactly (in the asymptotic limit as the process continues longer and longer). One is an [[exponential growth]] or [[exponential decay|decay]] process: If a quantity is exponentially increasing or decreasing in time, then the percentage of time that it has each first digit satisfies Benford's Law asymptotically (i.e., more and more accurately as the process continues for more and more time).
| |
| | |
| ==Distributions known to not obey Benford's law==
| |
| Square roots and reciprocals do not obey this law.<ref name=Raimi1976>Raimi RA (1976) "The first digit problem". ''[[American Mathematical Monthly]]'', 83: 521–538</ref> Other specific collections of numbers calculated to not obey this law include the 1974 [[Vancouver]], [[Canada]] telephone book, where no number began with the digit 1, the populations of all places with population at least 2500 from five US states according to the 1960 and 1970 censuses, where only 19% began with digit 1 but 20% began with digit 2,<ref name=Raimi1976/> and the terminal digits in pathology reports.<ref name=Beer2009>Beer TW (2009) "Terminal digit preference: beware of Benford's Law", ''J Clin Pathol'' 62: 192</ref>
| |
| | |
| The lack of fit in these cases have known explanations:<ref name=Raimi1976/><ref name="Beer2009"/> the assignment of telephone numbers in an arbitrary manner, truncation of population size at 2500 inhabitants, and rounding of data.
| |
| | |
| ==Criteria for distributions expected and not expected to obey Benford's Law==
| |
| A number of criteria—applicable particularly to accounting data—have been suggested where Benford's Law can be expected to apply and not to apply.<ref name=Durtschi2004>Durtschi C, Hillison W, Pacini C (2004) "The effective use of Benford’s Law to assist in detecting fraud in accounting data". ''J Forensic Accounting'' 5: 17–34</ref>
| |
| | |
| ===Distributions that can be expected to obey Benford's Law===
| |
| * When the mean is greater than median and the skew is positive
| |
| * Numbers that result from mathematical combination of numbers: e.g., quantity × price
| |
| * Transaction level data: e.g., disbursements, sales
| |
| | |
| ===Distributions that would not be expected to obey Benford's Law===
| |
| * Where numbers are assigned: e.g., check numbers, invoice numbers
| |
| * Where numbers are influenced by human thought: e.g., prices set by psychological thresholds ($1.99)
| |
| * Accounts with a large number of firm-specific numbers: e.g., accounts set up to record $100 refunds
| |
| * Accounts with a built-in minimum or maximum
| |
| * Where no transaction is recorded
| |
| | |
| ==Moments==
| |
| Moments of random variables for the digits 1 to 9 following this law have been calculated:<ref name=Scott2001>Scott, P.D.; Fasli, M. (2001) [http://www.essex.ac.uk/csee/research/publications/technicalreports/2001/CSM-349.pdf "Benford’s Law: An empirical investigation and a novel explanation"]. ''CSM Technical Report'' 349, Department of Computer Science, Univ. Essex</ref>
| |
| * [[mean]] 3.440
| |
| * [[variance]] 6.057
| |
| * [[skewness]] 0.796
| |
| * [[kurtosis]] 0.548
| |
| | |
| For the first and second digit distribution these values are also known:<ref name=Suh2010>Suh I.S., Headrick T.C. (2010) [http://www.bus.lsu.edu/accounting/faculty/lcrumbley/jfia/Articles/Abstracts/abs_2010v2n2a7.pdf "A comparative analysis of the bootstrap versus traditional statistical procedures applied to digital analysis based on Benford’s Law"], ''Journal of Forensic and Investigative Accounting'' 2(2) 144–175</ref>
| |
| * [[mean]] 38.590
| |
| * [[variance]] 621.832
| |
| * [[skewness]] 0.772
| |
| * [[kurtosis]] 0.547
| |
| | |
| A table of the expected values of the first two digits according to Benford's law is available,<ref name=Suh2010/> as is the population correlation between the first and second digits:<ref name=Suh2010/> {{nowrap|1=''ρ''<sup>2</sup> = 0.0561 }}.
| |
| | |
| ==See also==
| |
| * [[Predictive analytics#Fraud detection|Fraud detection in predictive analytics]]
| |
| | |
| ==References==
| |
| {{Reflist|colwidth=30em}}
| |
| | |
| ==Further reading==
| |
| {{Refbegin}}
| |
| * {{cite web|url=http://mathworld.wolfram.com/BenfordsLaw.html |title=Benford's Law – from Wolfram MathWorld |publisher=Mathworld.wolfram.com |date=14 June 2012 |accessdate=2012-06-26}}
| |
| * {{Cite book
| |
| | author = Mark J. Nigrini
| |
| | title = Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection
| |
| | publisher = John Wiley & Sons
| |
| | year = 2012
| |
| | pages = 330
| |
| | isbn = 978-1-118-15285-0
| |
| }}
| |
| * {{Cite journal
| |
| | author = Alessandro Gambini, Giovanni Mingari Scarpello, Daniele Ritelli
| |
| | title = Probability of digits by dividing random numbers: A ψ and ζ functions approach
| |
| | journal = Expositiones Mathematicae
| |
| | volume = 30
| |
| | year = 2012
| |
| | pages = 223–238
| |
| | doi = 10.1016/j.exmath.2012.03.001
| |
| | last2 = Hoelzl
| |
| | first2 = E
| |
| | last3 = Kirchler
| |
| | first3 = E
| |
| | display-authors = 1
| |
| | issue = 4
| |
| }}
| |
| * {{Cite journal
| |
| | author = Sehity
| |
| | title = Price developments after a nominal shock: Benford's Law and psychological pricing after the euro introduction
| |
| | journal = International Journal of Research in Marketing
| |
| | volume = 22
| |
| | year = 2005
| |
| | pages = 471–480
| |
| | doi = 10.1016/j.ijresmar.2005.09.002
| |
| | last2 = Hoelzl
| |
| | first2 = Erik
| |
| | last3 = Kirchler
| |
| | first3 = Erich
| |
| | issue = 4
| |
| }}
| |
| * {{Cite journal
| |
| | author = Nicolas Gauvrit, [[Jean-Paul Delahaye]]
| |
| | year = 2011
| |
| | title = Scatter and regularity implies Benford's Law...and more
| |
| | journal = Zenil: Randomness through computation: some answers, more questions
| |
| | volume = ISBN 9814327751
| |
| | issue =
| |
| | pages = 58–69
| |
| | month =
| |
| | doi =
| |
| }}
| |
| * {{Cite journal
| |
| | author = Bernhard Rauch1, Max Göttsche, Gernot Brähler, Stefan Engel
| |
| | year = 2011
| |
| | title = Fact and Fiction in EU-Governmental Economic Data
| |
| | journal = [[German Economic Review]]
| |
| | volume = 12
| |
| | issue = 3
| |
| | pages = 243–255
| |
| | month = August
| |
| | doi = 10.1111/j.1468-0475.2011.00542.x
| |
| }}
| |
| * {{Cite journal
| |
| | author = Wendy Cho and Brian Gaines
| |
| | year = 2007
| |
| | title = Breaking the (Benford) Law: statistical fraud detection in campaign finance
| |
| | journal = [[The American Statistician]]
| |
| | volume = 61
| |
| | issue = 3
| |
| | pages = 218–223
| |
| | month = August
| |
| | doi = 10.1198/000313007X223496
| |
| }}
| |
| * {{cite journal | title = The Law of Harmony in Statistics: An Investigation of the Metrical Interdependence of Social Phenomena. by L. V. Furlan | last1 = [[Hilda Geiringer|Geiringer]] | first1 = Hilda | journal = Journal of the American Statistical Association | year = 1948 | volume = 43 | pages = 325–328 | publisher = American Statistical Association | jstor = 2280379 | doi = 10.2307/2280379 | last2 = Furlan | first2 = L. V. | issue = 242 }}
| |
| {{Refend}}
| |
| | |
| ==External links==
| |
| {{commons category}}
| |
| | |
| ===General audience===
| |
| * [http://www.benfordonline.net Benford Online Bibliography], an online bibliographic database on Benford's Law.
| |
| * [http://www.nigrini.com/benfordslaw.htm Companion website for Benford's Law by Mark Nigrini] Website includes 15 data sets, 10 Excel templates, photos, documents, and other miscellaneous items related to Benford's Law
| |
| * [http://www.rexswain.com/benford.html Following Benford's Law, or Looking Out for No. 1], 1998 article from ''[[The New York Times]]''.
| |
| * [http://www.bbc.co.uk/radio4/science/further5.shtml A further five numbers: number 1 and Benford's law], [[BBC]] radio segment by [[Simon Singh]]
| |
| * [http://www.wnyc.org/shows/radiolab/episodes/2009/10/09/segments/137643 From Benford to Erdös], Radio segment from the [[Radiolab]] program
| |
| * [http://plus.maths.org/issue9/features/benford/index-gifd.html Looking out for number one] by Jon Walthoe, Robert Hunt and Mike Pearson, ''Plus Magazine'', September 1999
| |
| * [http://www.kirix.com/blog/2008/07/22/fun-and-fraud-detection-with-benfords-law/ Video showing Benford's Law applied to Web Data (incl. Minnesota Lakes, US Census Data and Digg Statistics)]
| |
| * [http://www.mpi-inf.mpg.de/~fietzke/benford.html An illustration of Benford's Law], showing first-digit distributions of various sequences evolve over time, interactive.
| |
| * [http://blog.iharder.net/2010/11/10/benford-how-to-generate-your-own-benfords-law-numbers/ Generate your own Benford numbers] A script for generating random numbers compliant with Benford's Law.
| |
| * [http://testingbenfordslaw.com Testing Benford's Law] An open source project showing Benford's Law in action against publicly available datasets.
| |
| * {{cite web|last=Mould|first=Steve|title=Number 1 and Benford's Law|url=http://www.numberphile.com/videos/benfords_law.html|work=Numberphile|publisher=[[Brady Haran]]}}
| |
| | |
| ===More mathematical===
| |
| * {{MathWorld | urlname=BenfordsLaw | title=Benford's Law}}
| |
| * [http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/ Benford’s law, Zipf’s law, and the Pareto distribution] by [[Terence Tao]]
| |
| * [http://demonstrations.wolfram.com/CountryDataAndBenfordsLaw/ Country Data and Benford's Law], [http://demonstrations.wolfram.com/BenfordsLawFromRatiosOfRandomNumbers/ Benford's Law from Ratios of Random Numbers] at [[Wolfram Demonstrations Project]].
| |
| * [http://www.dspguide.com/CH34.PDF Benford's Law Solved with Digital Signal Processing]
| |
| {{ProbDistributions|discrete-finite}}
| |
| | |
| * Interactive graphic: [http://www.math.wm.edu/~leemis/chart/UDR/UDR.html Univariate Distribution Relationships]
| |
| | |
| {{DEFAULTSORT:Benford's Law}}
| |
| [[Category:Statistical laws]]
| |
| [[Category:Discrete distributions]]
| |
| [[Category:Probability distributions]]
| |
| {{Link GA|de}}
| |