Cevian: Difference between revisions

Revision as of 13:01, 9 July 2013

DFFITS is a diagnostic meant to show how influential a point is in a statistical regression. It was proposed in 1980.^[1] It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point:

DFFITS = \frac{\hat{y_{i}} - \hat{y_{i (i)}}}{s_{(i)} \sqrt{h_{i i}}}

where $\hat{y_{i}}$ and $\hat{y_{i (i)}}$ are the prediction for point i with and without point i included in the regression, $s_{(i)}$ is the standard error estimated without the point in question, and $h_{i i}$ is the leverage for the point.

DFFITS is very similar to the externally Studentized residual, and is in fact equal to the latter times $\sqrt{h_{i i} / (1 - h_{i i})}$ .^[2]

Since when the errors are Gaussian the externally Studentized residual is distributed as Student's t (with a number of degrees of freedom equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the leverage factor $\sqrt{h_{i i} / (1 - h_{i i})}$ for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.

For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as $\sqrt{\frac{p}{n - p}} \approx \sqrt{\frac{p}{n}}$ times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than $2 \sqrt{\frac{p}{n}}$ .

Although the raw values resulting from the equations are different, Cook's distance and DFFITS are conceptually identical and there is a closed-form formula to convert one value to the other (Cohen, Cohen, West & Aiken, 2003).

Development

Previously when assessing a dataset before running a linear regression, the possibility of outliers would be assessed using histograms and scatterplots. Both methods of assessing data points were subjective and there was little way of knowing how much leverage each potential outlier had on the results data. This led to a variety of quantitative measures, including DFFIT, DFBETA.

References

↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534
↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

[1] 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

[2] 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

[1]

[2]

@@ Line 1: / Line 1: @@
-Hello, my title is Andrew and my spouse doesn't like it at all. My spouse doesn't like it the way I do but what I truly like doing is caving but I don't have the time lately. Office supervising is exactly where my main income arrives from but I've always needed my own company. For  [http://www.khuplaza.com/dent/14869889 love psychic readings] years he's been residing in Mississippi and he doesn't strategy on altering it.<br><br>Feel free to visit my webpage; [http://isaworld.pe.kr/?document_srl=392088 online psychic reading] psychics ([http://www.010-5260-5333.com/index.php?document_srl=1880&mid=board_ALMP66 simply click the up coming webpage])
+'''DFFITS''' is a diagnostic meant to show how influential a point is in a [[statistical regression]]. It was proposed in 1980.<ref>{{cite book |last=Belsley |first=David A. |last2=Kuh |first2=Edwin |last3=Welsh |first3=Roy E. | year=1980 |title=Regression diagnostics: identifying influential data and sources of collinearity |publisher=[[John Wiley & Sons]] |location=New York |series=Wiley series in probability and mathematical statistics |isbn=0-471-05856-4}}</ref> It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point:
+:<math>\text{DFFITS} = {\widehat{y_i} - \widehat{y_{i(i)}} \over s_{(i)} \sqrt{h_{ii}}}</math>
+where <math>\widehat{y_i}</math> and <math>\widehat{y_{i(i)}}</math> are the prediction for point i with and without point i included in the regression,
+<math>s_{(i)}</math> is the standard error estimated without the point in question, and <math>h_{ii}</math> is the [[leverage (statistics)|leverage]] for the point.
+DFFITS is very similar to the externally [[Studentized residual]], and is in fact equal to the latter times <math>\sqrt{h_{ii}/(1-h_{ii})}</math>.<ref>{{cite book |last=Montogomery |first=Douglas C. |last2=Peck |first2=Elizabeth A. |last3=Vining |first3=G. Geoffrey |title=Introduction to Linear Regression Analysis |edition=5th |year=2012 |publisher=Wiley |isbn=978-0-470-54281-1 |page=[http://books.google.com/books?id=0yR4KUL4VDkC&lpg=PP1&dq=Introduction%20to%20Linear%20Regression%20Analysis%202nd%20edition&pg=PA218#v=onepage&q&f=false 218] |quote=Thus, ''DFFITS<sub>i</sub>'' is the value of ''R''-student multiplied by the leverage of the ''i''th observation [h<sub>ii</sub>/(1-h<sub>ii</sub>)]<sup>1/2</sup>. |url=http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470542810.html |accessdate=22 February 2013}}</ref>
+Since when the errors are [[Gaussian]] the externally Studentized residual is distributed as [[Student's t]] (with a number of [[Degrees of freedom (statistics)|degrees of freedom]] equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the [[Leverage (statistics)|leverage factor]] <math>\sqrt{h_{ii}/(1-h_{ii})}</math> for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.
+For a perfectly balanced experimental design (such as a [[factorial design]] or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as <math>\sqrt{p \over n-p} \approx \sqrt{p \over n}</math> times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than <math>2\sqrt{p \over n}</math>.
+Although the raw values resulting from the equations are different, [[Cook's distance]] and DFFITS are conceptually identical and there is a closed-form formula to convert one value to the other (Cohen, Cohen, West & Aiken, 2003).
+== Development ==
+Previously when assessing a dataset before running a linear regression, the possibility of outliers would be assessed using histograms and scatterplots. Both methods of assessing data points were subjective and there was little way of knowing how much leverage each potential outlier had on the results data. This led to a variety of quantitative measures, including DFFIT, DFBETA.
+==References==
+<references/>
+[[Category:Regression diagnostics]]

Cevian: Difference between revisions

Revision as of 13:01, 9 July 2013

Development

References

Navigation menu

Search