Ascendant: Difference between revisions

Latest revision as of 13:52, 19 December 2014

The name " ugg boots" refers to a specific design of boot that is made from authentic sheepskin. The boots were at first made in Australia. Now, an American company owns the Ugg trademark ("Ugg Australia"), but they are nonetheless made from the same Australian Merino sheepskin, only in Australia they are sold as "Australian Sheepskin Boots." A edition of Ugg Boots, the "Fug" (Traveling Ugg) very first made an appearance outside the house of the shepherd community for the duration of World War I. Pilots wore the cozy boots for the duration of chilly days. Afterwards, in the nineteen sixties, sensible surfers figured that Ugg Boots would be great for holding their feet heat in involving catching the waves or following coming out of the chilly ocean water.Regardless of what you connect with them, Ugg Boots are very a strike now, in all places. Trendsetters are tripping above by themselves to get a pair.

Ugg Boots arrive in so numerous variations and colors, it is tough to preserve monitor of them all. Men's Uggs come in far more basic, typical colors and models, but for women of all ages, it can be "just about anything goes" in conditions of hues and styles. Soon after all, ladies have to have a pair of footwear for every single outfit, and, with Ugg Boots, you can just about do that. Get some Ugg Boots in ankle-substantial, mid-calf, and tall variations. You can obtain Uggs that have flat heels or more chunky and textured soles. Typically, Ugg Boots are regarded as to be a relaxed footwear, which means they go wonderful with your preferred jeans. Pair them with a short skirt and opaque tights or a long skirt for some relaxed still fashionable wardrobe decisions.Wintertime, spring, summer season, or tumble, Discounted Ugg Boots get the job done. Being exceptionally heat, your feet remain cozy in the chilly temperature. Mainly because they are manufactured from purely natural sheepskin products, the fibers breath and your feet will not likely get sweaty or smelly in the warm weather conditions, either. Just about the only time you won't be able to have on Ugg Boots is through significantly moist or muddy temperature, the suede won't keep up as properly to these situations. Guys can search just a minimal additional rugged putting on Ugg Boots, you can find generally a thing beautiful about that glance!

Preserving your Ugg Boots seeking terrific is straightforward. They cleanse up really well with some cold h2o and a soft cloth. You may well have to have some authorized leather or suede cleaner on occasion. If they do occur to get damp (or immediately after cleansing), be thorough to retain them away from any direct warmth supply. Never set them in a washer or dryer. Try to remember that wool sweater you place in the dryer after? Sheepskin can shrink, so allowing your Ugg Boots airdry is important. To pace matters together you can stuff some newspapers or paper towels inside of to soak up any extra moisture.

UGG boots rule the fashion sneakers market as they are the two adaptable and fashionable. Lavish Australian twin-faced sheepskin is adopted to keep your ft cozy in wintertime and magnificent in summer, generating these footwear great to put on all yr spherical. like a distinctive style and design assertion, this style of shoes undoubtedly is right below to remain and you also can pair them with virtually any outfit. created to get cozy and strong, UGG boots are promoted usually are marketed at dear charges. when you desire to get hold of a wonderful pair with no breaking your money establishment account, ideal here are some important ideas that you simply will have to retain in mind.

Acquire as a result of the dropshippers who offer their services at expenditures just a little fewer high-priced than persons from UGG boots merchants. You also should enable preserve up ample dollars as these footwear also price a entire ton as a result of their luxurious products high-quality. getting from these methods makes it possible for you have authentic sheepskin boots.
Advertisements by Google

Buy them all through the away period although you could possibly get real pairs at lower expense selling prices. though this sort of boots may perhaps be donned at any period, they are generally witnessed donned in chilly winter months. Consequently, when you need a significant quality pair, be sure to create a acquire prior to the snow begins to tumble. continue being inform and appearance out for on the web and offline merchandise revenue and promotions. as rapidly as you find a pair you desire, commence swiftly otherwise other males and ladies will swoop in that great suede boots.

Store your pair from website suppliers. this definitely is actually fairly handy as there is no have to have to match your needs to verify out group stores and shops a one by a single. no matter what you only will have to do would be to simply click your mouth as effectively as your chosen pair shall be shipped for you before long. Besides, you can take gratification in a wide assortment if acquiring UGG Boots 5854 Typical Mini. world-wide-web suppliers give these boots inside of a broad assortment of designs, colours and measurements. one particular of the most required of all, on the world-wide-web sellers market place these boots at lessened charges than men and women equipped at unique merchants. Having said that, paying for on the net also may well be just a small frustrating as chances are that you basically will get fakes. So bear in views to undertake some investigation and learn the distinctive attributes in the authentic ugg boots for revenue prior to rushing to suppliers. at all moments recall to verify the site' s suggestions and tips pretty carefully prior to creating your previous determination. It is advisable to deliver your obtain from the trustworthy vendor providing rapid return and refund procedures.

If you cherished this article so you would like to obtain more info relating to cheap uggs generously visit the web page.

@@ Line 1: / Line 1: @@
-{{Regression bar}}
-In [[statistics]], '''logistic regression''' or '''logit regression''' is a type of probabilistic [[statistical classification]] model.<ref>{{cite book |author=Christopher M. Bishop |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer |quote=In the terminology of statistics, this model is known as ''logistic regression'', although it should be emphasized that this is a model for classification rather than regression. |page=205}}</ref> It is also used to predict a binary response from a [[Binary classification|binary predictor]], used for predicting the outcome of a [[categorical variable|categorical]] [[dependent and independent variables|dependent variable]] (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating empirical values of the parameters in a [[Qualitative response models|qualitative response model]]. The probabilities describing the possible outcomes of a single trial are modeled, as a function of the explanatory (predictor) variables, using a [[logistic function]]. Frequently (and subsequently in this article) "logistic regression" is used to refer specifically to the problem in which the dependent variable is [[Binary variable|binary]]&mdash;that is, the number of available categories is two&mdash;and problems with more than two categories are referred to as [[multinomial logistic regression]] or, if the multiple categories are [[Level of measurement#Ordinal type|ordered]], as [[ordered logistic regression]].
-Logistic regression measures the relationship between a categorical dependent variable and  one or more independent variables, which are usually (but not necessarily) [[Level of measurement#Interval scale|continuous]], by using probability scores as the predicted values of the dependent variable.<ref>Clinical Research for Surgeons. Mohit Bhandari,Anders Joensson. page 293</ref> As such it treats the same set of problems as does [[probit regression]] using similar techniques.
-==Fields and examples of applications==
+The name " ugg boots" refers to a specific design of boot that is made from authentic sheepskin. The boots were at first made in Australia. Now, an American company owns the Ugg trademark ("Ugg Australia"), but they are nonetheless made from the same Australian Merino sheepskin, only in Australia they are sold as "Australian Sheepskin Boots." A edition of Ugg Boots, the "Fug" (Traveling Ugg) very first made an appearance outside the house of the shepherd community for the duration of World War I. Pilots wore the cozy boots for the duration of chilly days. Afterwards, in the nineteen sixties, sensible surfers figured that Ugg Boots would be great for holding their feet heat in involving catching the waves or following coming out of the chilly ocean water.Regardless of what you connect with them, Ugg Boots are very a strike now, in all places. Trendsetters are tripping above by themselves to get a pair. <br><br>Ugg Boots arrive in so numerous variations and colors, it is tough to preserve monitor of them all. Men's Uggs come in far more basic, typical colors and models, but for women of all ages, it can be "just about anything goes" in conditions of hues and styles. Soon after all, ladies have to have a pair of footwear for every single outfit, and, with Ugg Boots, you can just about do that. Get some Ugg Boots in ankle-substantial, mid-calf, and tall variations. You can obtain Uggs that have flat heels or more chunky and textured soles. Typically, Ugg Boots are regarded as to be a relaxed footwear, which means they go wonderful with your preferred jeans. Pair them with a short skirt and opaque tights or a long skirt for some relaxed still fashionable wardrobe decisions.Wintertime, spring, summer season, or tumble, Discounted Ugg Boots get the job done. Being exceptionally heat, your feet remain cozy in the chilly temperature. Mainly because they are manufactured from purely natural sheepskin products, the fibers breath and your feet will not likely get sweaty or smelly in the warm weather conditions, either. Just about the only time you won't be able to have on Ugg Boots is through significantly moist or muddy temperature, the suede won't keep up as properly to these situations. Guys can search just a minimal additional rugged putting on Ugg Boots, you can find generally a thing beautiful about that glance!<br><br>Preserving your Ugg Boots seeking terrific is straightforward. They cleanse up really well with some cold h2o and a soft cloth. You may well have to have some authorized leather or suede cleaner on occasion. If they do occur to get damp (or immediately after cleansing), be thorough to retain them away from any direct warmth supply. Never set them in a washer or dryer. Try to remember that wool sweater you place in the dryer after? Sheepskin can shrink, so allowing your Ugg Boots airdry is important. To pace matters together you can stuff some newspapers or paper towels inside of to soak up any extra moisture.<br><br>UGG boots rule the fashion sneakers market as they are the two adaptable and fashionable. Lavish Australian twin-faced sheepskin is adopted to keep your ft cozy in wintertime and magnificent in summer, generating these footwear great to put on all yr spherical. like a distinctive style and design assertion, this style of shoes undoubtedly is right below to remain and you also can pair them with virtually any outfit. created to get cozy and strong, UGG boots are promoted usually are marketed at dear charges. when you desire to get hold of a wonderful pair with no breaking your money establishment account, ideal here are some important ideas that you simply will have to retain in mind.<br><br>Acquire as a result of the dropshippers who offer their services at expenditures just a little fewer high-priced than persons from UGG boots merchants. You also should enable preserve up ample dollars as these footwear also price a entire ton as a result of their luxurious products high-quality. getting from these methods makes it possible for you have authentic sheepskin boots.<br>Advertisements by Google<br><br><br>Buy them all through the away period although you could possibly get real pairs at lower expense selling prices. though this sort of boots may perhaps be donned at any period, they are generally witnessed donned in chilly winter months. Consequently, when you need a significant quality pair, be sure to create a acquire prior to the snow begins to tumble. continue being inform and appearance out for on the web and offline merchandise revenue and promotions. as rapidly as you find a pair you desire, commence swiftly otherwise other males and ladies will swoop in that great suede boots.<br><br>Store your pair from website suppliers. this definitely is actually fairly handy as there is no have to have to match your needs to verify out group stores and shops a one by a single. no matter what you only will have to do would be to simply click your mouth as effectively as your chosen pair shall be shipped for you before long. Besides, you can take gratification in a wide assortment if acquiring UGG Boots 5854 Typical Mini. world-wide-web suppliers give these boots inside of a broad assortment of designs, colours and measurements. one particular of the most required of all, on the world-wide-web sellers market place these boots at lessened charges than men and women equipped at unique merchants. Having said that, paying for on the net also may well be just a small frustrating as chances are that you basically will get fakes. So bear in views to undertake some investigation and learn the distinctive attributes in the authentic ugg boots for revenue prior to rushing to suppliers. at all moments recall to verify the site' s suggestions and tips pretty carefully prior to creating your previous determination. It is advisable to deliver your obtain from the trustworthy vendor providing rapid return and refund procedures.<br><br>If you cherished this article so you would like to obtain more info relating to [http://tinyurl.com/k7shbtq cheap uggs] generously visit the web page.
-Logistic regression was put forth in the 1940s as an alternative to Fisher's 1936 classification method, [[linear discriminant analysis]].<ref>{{cite book |author1=Gareth James |author2=Daniela Witten |author3=Trevor Hastie |author4=Robert Tibshirani |title=An Introduction to Statistical Learning |publisher=Springer |year=2013 |url=http://www-bcf.usc.edu/~gareth/ISL/ |page=6}}</ref>
-It is used extensively in numerous disciplines, including the medical and social science fields. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression.<ref>{{cite pmid|3106646}}</ref>  Logistic regression might be used to predict whether a patient has a given disease (e.g. [[Diabetes mellitus|diabetes]]), based on observed characteristics of the patient (age, gender, [[body mass index]], results of various [[blood test]]s, etc.).  Another example might be to predict whether an American voter will vote Democratic or Republican, based on age, income, gender, race, state of residence, votes in previous elections, etc.<ref name = Harrell /> The technique can also be used in [[engineering]], especially for predicting the probability of failure of a given process, system or product.<ref name=strano05>{{Cite doi|10.1016/j.ijmachtools.2005.07.005}}</ref><ref name=safety>{{Cite doi|10.1016/j.ssci.2008.01.002}}</ref> It is also used in [[marketing]] applications such as prediction of a customer's propensity to purchase a product or cease a subscription, etc.{{citation needed|date=July 2012}} In [[economics]] it can be used to predict the likelihood of a person's choosing to be in the labor force, and a business application would be to predict the likehood of a homeowner defaulting on a [[mortgage]]. [[Conditional random field]]s, an extension of logistic regression to sequential data, are used in [[natural language processing]].
-==Basics==
-Logistic regression can be binomial or multinomial. Binomial or binary logistic regression deals with situations in which the observed outcome for a [[dependent variable]] can have only two possible types (for example, "dead" vs. "alive"). [[multinomial logit|Multinomial logistic regression]] deals with situations where the outcome can have three or more possible types (e.g., "disease A" vs. "disease B" vs. "disease C"). In binary logistic regression, the outcome is usually coded as "0" or "1", as this leads to the most straightforward interpretation.<ref name=Hosmer/>  If a particular observed outcome for the dependent variable is the noteworthy possible outcome (referred to as a "success" or a "case") it is usually coded as "1" and the contrary outcome (referred to as a "failure" or a "noncase") as "0". Logistic regression is used to predict the [[odds]] of being a case based on the values of the [[independent variable]]s (predictors). The odds are defined as the probability that a particular outcome is a case divided by the probability that it is a noncase.
-Like other forms of [[regression analysis]], logistic regression makes use of one or more predictor variables that may be either continuous or [[categorical data]]. Unlike ordinary linear regression, however, logistic regression is used for predicting binary outcomes of the dependent variable (treating the dependent variable as the outcome of a [[Bernoulli trial]]) rather than continuous outcomes. Given this difference, it is necessary that logistic regression take the [[natural logarithm]] of the odds of the dependent variable being a case (referred to as the [[logit]] or log-odds) to create a continuous criterion as a transformed version of the dependent variable. Thus the logit transformation is referred to as the ''link function'' in logistic regression&mdash;although the dependent variable in logistic regression is binomial, the logit is the continuous criterion upon which linear regression is conducted.<ref name=Hosmer/>
-The logit of success is then fit to the predictors using [[linear regression]] analysis. The predicted value of the logit is converted back into predicted odds via the inverse of the natural logarithm, namely the [[exponential function]]. Therefore, although the observed dependent variable in logistic regression is a zero-or-one variable, the logistic regression estimates the odds, as a continuous variable, that the dependent variable is a success (a case). In some applications the odds are all that is needed. In others, a specific yes-or-no prediction is needed for whether the dependent variable is or is not a case; this categorical prediction can be based on the computed odds of a success, with predicted odds above some chosen cut-off value being translated into a prediction of a success.
-== Logistic function, odds ratio, and logit ==
-[[Image:Logistic-curve.svg|thumb|320px|right|Figure 1. The logistic function, with <math>\beta_0 + \beta_1 x</math> on the horizontal axis and <math>\pi(x)</math> on the vertical axis]]
-An explanation of logistic regression begins with an explanation of the [[logistic function]], which  always takes on values between zero and one:<ref name=Hosmer/>
-:<math>F(t) = \frac{e^t}{e^t+1} = \frac{1}{1+e^{-t}},</math>
-and viewing ''t'' as a linear function of an [[dependent and independent variables|explanatory variable]] ''x'' (or of a linear combination of explanatory variables), the logistic function can be written as:
-:<math>\pi(x) = \frac{e^{\beta_0 + \beta_1 x}}{e^{\beta_0 + \beta_1 x} + 1} = \frac {1}{1+e^{-(\beta_0 + \beta_1 x)}}.</math>
-This will be interpreted as the probability of the dependent variable equalling a "success" or "case" rather than a failure or non-case. We also define the inverse of the logistic function, the [[logit]]:
-:<math>g(x) = \ln \frac{\pi(x)}{1 - \pi(x)} = \beta_0 + \beta_1 x ,</math>
-and equivalently:
-:<math>\frac{\pi(x)}{1 - \pi(x)} = e^{\beta_0 + \beta_1 x}.</math>
-A graph of the logistic function <math>\pi(x)</math> is shown in Figure 1.  The input is the value of <math>\beta_0 + \beta_1 x</math> and the output is <math>\pi(x)</math>. The logistic function is useful because it can take an input with any value from negative infinity to positive infinity, whereas the output <math>\pi(x)</math> is confined to values between 0 and 1 and hence is interpretable as a probability. In the above equations, ''g''(''x'') refers to the logit function of some given linear combination ''x'' of the predictors, 'ln' denotes the natural logarithm, <math>\pi(x)</math> is the probability that the dependent variable equals a case, <math>\beta_0</math> is the [[Y-intercept|intercept]] from the linear regression equation (the value of the criterion when the predictor is equal to zero), <math>\beta_1 x</math> is the regression coefficient multiplied by some value of the predictor, and base ''e'' denotes the exponential function.
-The formula for <math>\pi(x)</math> illustrates that the probability of the dependent variable equaling a case is equal to the value of the logistic function of the linear regression expression. This is important in that it shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation, the resulting expression for the probability <math>\pi(x)</math> ranges between 0 and 1. The equation for <math>g(x)</math> illustrates that the [[logit]] (i.e., log-odds or natural logarithm of the odds) is equivalent to the linear regression expression. Likewise, the next equation illustrates that the odds of the dependent variable equaling a case is equivalent to the exponential function of the linear regression expression. This illustrates how the [[logit]] serves as a link function between the probability and the linear regression expression. Given that the logit ranges between minus infinity and infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted back into the odds.<ref name=Hosmer/>
-===Multiple explanatory variables===
-If there are multiple explanatory variables, then the above expression <math>\beta_0+\beta_1x</math> can be revised to <math>\beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_mx_m.</math> Then when this is used in the equation relating the logged odds of a success to the values of the predictors, the linear regression will be a [[multiple regression]] with ''m'' explanators; the parameters <math>\beta_j</math> for all ''j'' = 0, 1, 2, ..., ''m'' are all estimated.
-== Model fitting ==
-===Estimation===
-====Maximum likelihood estimation====
-The regression coefficients are usually estimated using [[maximum likelihood]] estimation.<ref name=Menard/> Unlike linear regression with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximizes the likelihood function, so an iterative process must be used instead, for example [[Newton's method]]. This process begins with a tentative solution, revises it slightly to see if it can be improved, and repeats this revision until improvement is minute, at which point the process is said to have converged.<ref name=Menard/>
-In some instances the model may not reach convergence. When a model does not converge this indicates that the coefficients are not meaningful because the iterative process was unable to find appropriate solutions. A failure to converge may occur for a number of reasons: having a large proportion of predictors to cases, [[multicollinearity]], [[sparse matrix|sparseness]], or complete separation.
-* Having a large proportion of variables to cases results in an overly conservative Wald statistic (discussed below) and can lead to nonconvergence.
-* Multicollinearity refers to unacceptably high correlations between predictors. As multicollinearity increases, coefficients remain unbiased but standard errors increase and the likelihood of model convergence decreases.<ref name=Menard/> To detect multicollinearity amongst the predictors, one can conduct a linear regression analysis with the predictors of interest for the sole purpose of examining the tolerance statistic <ref name=Menard/>  used to assess whether multicollinearity is unacceptably high.
-* Sparseness in the data refers to having a large proportion of empty cells (cells with zero counts). Zero cell counts are particularly problematic with categorical predictors. With continuous predictors, the model can infer values for the zero cell counts, but this is not the case with categorical predictors. The reason the model will not converge with zero cell counts for categorical predictors is because the natural logarithm of zero is an undefined value, so final solutions to the model cannot be reached. To remedy this problem, researchers may collapse categories in a theoretically meaningful way or may consider adding a constant to all cells.<ref name=Menard/>
-* Another numerical problem that may lead to a lack of convergence is complete separation, which refers to the instance in which the predictors perfectly predict the criterion – all cases are accurately classified. In such instances, one should reexamine the data, as there is likely some kind of error.<ref name=Hosmer/>
-Although not a precise number, as a general rule of thumb, logistic regression models require a minimum of 10 events per explaining variable (where ''event'' denotes the cases belonging to the less frequent category in the dependent variable).<ref>{{cite journal|last=Peduzzi|first=P|coauthors=Concato, J, Kemper, E, Holford, TR, Feinstein, AR|title=A simulation study of the number of events per variable in logistic regression analysis.|journal=[[Journal of Clinical Epidemiology]]|date=December 1996|volume=49|issue=12|pages=1373–9|pmid=8970487}}</ref>
-====Minimum chi-squared estimator for grouped data====
-While individual data will have a dependent variable with a value of zero or one for every observation, with [[grouped data]] one observation is on a group of people who all share the same characteristics (e.g., demographic characteristics); in this case the researcher observes the proportion of people in the group for whom the response variable falls into one category or the other. If this proportion is neither zero nor one for any group, the minimum chi-squared estimator involves using [[weighted least squares]] to estimate a linear model in which the dependent variable is the logit of the proportion: that is, the log of the ratio of the fraction in one group to the fraction in the other group.<ref name=Greene/>{{rp|pp.686–9}}
-===Evaluating goodness of fit===
-[[Goodness of fit]] in linear regression models is generally measured using the [[R square|R<sup>2</sup>]]. Since this has no direct analog in logistic regression, various methods<ref name=Greene>{{cite book |last=Greene |first=William N. |title=Econometric Analysis |edition=Fifth |publisher=Prentice-Hall |location= |year=2003 |isbn=0-13-066189-9 }}</ref>{{rp|ch.21}} including the following can be used instead.
-====Deviance and likelihood ratio tests====
-In linear regression analysis, one is concerned with partitioning variance via the [[Partition of sums of squares|sum of squares]] calculations – variance in the criterion is essentially divided into variance accounted for by the predictors and residual variance. In logistic regression analysis, [[Deviance (statistics)|deviance]] is used in lieu of sum of squares calculations.<ref name=Cohen/> Deviance is analogous to the sum of squares calculations in linear regression<ref name=Hosmer/>  and is a measure of the lack of fit to the data in a logistic regression model.<ref name=Cohen/> Deviance is calculated by comparing a given model with the saturated model – a model with a theoretically perfect fit.<ref name=Hosmer/>  This computation is called the [[likelihood-ratio test]]:<ref name=Hosmer/>
-:<math> D = -2\ln \frac{\text{likelihood of the fitted model}} {\text{likelihood of the saturated model}}.</math>
-In the above equation ''D'' represents the deviance and ln represents the natural logarithm. The results of the likelihood ratio (the ratio of the fitted model to the saturated model) will produce a negative value, so the product is multiplied by negative two times its natural logarithm to produce a value with an approximate [[chi-squared distribution]].<ref name=Hosmer/>  Smaller values indicate better fit as the fitted model deviates less from the saturated model. When assessed upon a chi-square distribution, nonsignificant chi-square values indicate very little unexplained variance and thus, good model fit. Conversely, a significant chi-square value indicates that a significant amount of the variance is unexplained.
-Two measures of deviance are particularly important in logistic regression: null deviance and model deviance. The null deviance represents the difference between a model with only the intercept (which means "no predictors") and the saturated model. And, the model deviance represents the difference between a model with at least one predictor and the saturated model.<ref name=Cohen/> In this respect, the null model provides a baseline upon which to compare predictor models. Given that deviance is a measure of the difference between a given model and the saturated model, smaller values indicate better fit. Therefore, to assess the contribution of a predictor or set of predictors, one can subtract the model deviance from the null deviance and assess the difference on a <math>\chi^2_{s-p},</math>  chi-square distribution with [[Degrees of freedom (statistics)|degree of freedom]]<ref name=Hosmer/> equal to the difference in the number of parameters estimated.
-Let
-:<math>
-\begin{align}
-  D_\text{null} &=-2\ln \frac{\text{likelihood of null model}} {\text{likelihood of the saturated model}}  \quad \text{ and } \quad
-D_\text{fitted} &=-2\ln \frac{\text{likelihood of fitted model}} {\text{likelihood of the saturated model}}. \\
-\end{align}
-</math>
-Then
-:<math>
-\begin{align}
-  D_\text{fitted}- D_\text{null} &=-2\ln \frac{\text{likelihood of null model}} {\text{likelihood of the saturated model}} -\left(-2\ln \frac{\text{likelihood of fitted model}} {\text{likelihood of the saturated model}} \right)\\
-  &=-2 \left(\ln \frac{\text{likelihood of null model}} {\text{likelihood of the saturated model}}-\ln \frac{\text{likelihood of fitted model}} {\text{likelihood of the saturated model}}\right)\\
- =& -2 \ln \frac{ \left( \frac{\text{likelihood of null model}}{\text{likelihood of the saturated model}}\right)}{ \left( \frac{\text{likelihood of fitted model}}{\text{likelihood of the saturated model}}\right)}\\
-  =& -2 \ln \frac{\text{likelihood of null model}}{\text{likelihood of the fitted model}}.
-\end{align}
-</math>
-If the model deviance is significantly smaller than the null deviance then one can conclude that the predictor or set of predictors significantly improved model fit. This is analogous to the ''F''-test used in linear regression analysis to assess the significance of prediction.<ref name=Cohen/>
-====Pseudo-R<sup>2</sup>s====
-In linear regression the squared multiple correlation, ''R''<sup>2</sup> is used to assess goodness of fit as it represents the proportion of variance in the criterion that is explained by the predictors.<ref name=Cohen/> In logistic regression analysis, there is no agreed upon analogous measure, but there are several competing measures each with limitations.<ref name=Cohen/> Three of the most commonly used indices are examined on this page beginning with the likelihood ratio ''R''<sup>2</sup>, ''R''<sup>2</sup><sub>L</sub>:<ref name=Cohen/>
-:<math>R^2_\text{L} = \frac{D_\text{null} - D_\text{model}} {D_\text{null}} .</math>
-This is the most analogous index to the squared multiple correlation in linear regression.<ref name=Menard/> It represents the proportional reduction in the deviance wherein the deviance is treated as a measure of variation analogous but not identical to the [[variance]] in [[linear regression]] analysis.<ref name=Menard/> One limitation of the likelihood ratio ''R''<sup>2</sup> is that it is not monotonically related to the odds ratio,<ref name=Cohen/> meaning that it does not necessarily increase as the odds ratio increases and does not necessarily decrease as the odds ratio decreases.
-The Cox and Snell ''R''<sup>2</sup> is an alternative index of goodness of fit related to the ''R''<sup>2</sup> value from linear regression.{{citation needed|date=July 2012}} The Cox and Snell index is problematic as its maximum value is .75, when the [[variance]] is at its maximum (.25). The Nagelkerke ''R''<sup>2</sup> provides a correction to the Cox and Snell ''R''<sup>2</sup> so that the maximum value is equal to one. Nevertheless, the Cox and Snell and likelihood ratio ''R''<sup>2</sup>s show greater agreement with each other than either does with the Nagelkerke ''R''<sup>2</sup>.<ref name=Cohen/> Of course, this might not be the case for values exceeding .75 as the Cox and Snell index is capped at this value. The likelihood ratio ''R''<sup>2</sup> is often preferred to the alternatives as it is most analogous to ''R''<sup>2</sup> in [[linear regression]], is independent of the base rate (both Cox and Snell and Nagelkerke ''R''<sup>2</sup>s increase as the proportion of cases increase from 0 to .5) and varies between 0 and 1.
-A word of caution is in order when interpreting pseudo-''R''<sup>2</sup> statistics. The reason these indices of fit are referred to as ''pseudo'' ''R''<sup>2</sup> is because they do not represent the proportionate reduction in error as the ''R''<sup>2</sup> in [[linear regression]] does.<ref name=Cohen/> Linear regression assumes [[homoscedasticity]], that the error variance is the same for all values of the criterion. Logistic regression will always be [[heteroscedastic]] – the error variances differ for each value of the predicted score. For each value of the predicted score there would be a different value of the proportionate reduction in error. Therefore, it is inappropriate to think of ''R''<sup>2</sup> as a proportionate reduction in error in a universal sense in logistic regression.<ref name=Cohen/>
-====Hosmer–Lemeshow test====
-The [[Hosmer–Lemeshow test]] uses a test statistic that asymptotically follows a [[chi-squared distribution|<math>\chi^2</math> distribution]] to assess whether or not the observed event rates match expected event rates in subgroups of the model population.
-====Evaluating binary classification performance====
-If the estimated probabilities are to be used to [[binary classification|classify]] each observation of independent variable values as predicting the category that the dependent variable is found in, the various [[#Model suitability|methods below]] for judging the model's suitability in out-of-sample forecasting can also be used on the data that were used for estimation—[[Accuracy and precision#In binary classification|accuracy, precision]] (also called [[positive predictive value]]), [[Precision and recall|recall]] (also called [[Sensitivity and specificity|sensitivity]]), [[Sensitivity and specificity|specificity]] and [[negative predictive value]]. In each of these evaluative methods, an aspect of the model's effectiveness in assigning instances to the correct categories is measured.
-== Coefficients ==
-After fitting the model, it is likely that researchers will want to examine the contribution of individual predictors. To do so, they will want to examine the regression coefficients. In linear regression, the regression coefficients represent the change in the criterion for each unit change in the predictor.<ref name=Cohen/> In logistic regression, however, the regression coefficients represent the change in the logit for each unit change in the predictor. Given that the logit is not intuitive, researchers are likely to focus on a predictor's effect on the exponential function of the regression coefficient – the odds ratio (see [[logistic regression#Definition|definition]]). In linear regression, the significance of a regression coefficient is assessed by computing a ''t''-test. In logistic regression, there are several different tests designed to assess the significance of an individual predictor, most notably the likelihood ratio test and the Wald statistic.
-=== Likelihood ratio test ===
-The [[likelihood-ratio test]] test discussed above to assess model fit is also the recommended procedure to assess the contribution of individual "predictors" to a given model.<ref name=Hosmer/><ref name=Menard/><ref name=Cohen/> In the case of a single predictor model, one simply compares the deviance of the predictor model with that of the null model on a chi-square distribution with a single degree of freedom. If the predictor model has a significantly smaller deviance (c.f chi-square using the difference in degrees of freedom of the two models), then one can conclude that there is a significant association between the "predictor" and the outcome. Given that some common statistical packages (e.g., SAS, SPSS) do not provide likelihood ratio test statistics, it can be more difficult to assess the contribution of individual predictors in the multiple logistic regression case. To assess the contribution of individual predictors one can enter the predictors hierarchically, comparing each new model with the previous to determine the contribution of each predictor.<ref name=Cohen/> (There is considerable debate among statisticians regarding the appropriateness of so-called "stepwise" procedures. They do not preserve the nominal statistical properties and can be very misleading.[http://www.amazon.com/Regression-Modeling-Strategies-Applications-Statistics/dp/1441929185/ref=sr_1_2?ie=UTF8&qid=1339171287&sr=8-2]
-=== Wald statistic ===
-Alternatively, when assessing the contribution of individual predictors in a given model, one may examine the significance of the [[Wald test|Wald statistic]]. The Wald statistic, analogous to the ''t''-test in linear regression, is used to assess the significance of coefficients. The Wald statistic is the ratio of the square of the regression coefficient to the square of the standard error of the coefficient and is asymptotically distributed as a chi-square distribution.<ref name=Menard/>
-<math>W_j = \frac{B^2_j} {SE^2_{B_j}}</math>
-Although several statistical packages (e.g., SPSS, SAS) report the Wald statistic to assess the contribution of individual predictors, the Wald statistic has limitations. When the regression coefficient is large, the standard error of the regression coefficient also tends to be large increasing the probability of [[Type I and Type II errors|Type-II error]]. The Wald statistic also tends to be biased when data are sparse.<ref name=Cohen/>
-== Formal mathematical specification ==
-There are various equivalent specifications of logistic regression, which fit into different types of more general models.  These different specifications allow for different sorts of useful generalizations.
-===Setup===
-The basic setup of logistic regression is the same as for standard [[linear regression]].
-It is assumed that we have a series of ''N'' observed data points.  Each data point ''i'' consists of a set of ''m'' explanatory variables ''x''<sub>1,''i''</sub> ... ''x''<sub>''m,i''</sub> (also called [[independent variable]]s, predictor variables, input variables, features, or attributes), and an associated [[binary-valued]] outcome variable ''Y''<sub>''i''</sub> (also known as a [[dependent variable]], response variable, output variable, outcome variable or class variable), i.e. it can assume only the two possible values 0 (often meaning "no" or "failure") or 1 (often meaning "yes" or "success").  The goal of logistic regression is to explain the relationship between the explanatory variables and the outcome, so that an outcome can be predicted for a new set of explanatory variables.
-Some examples:
-* The observed outcomes are the presence or absence of a given disease (e.g. diabetes) in a set of patients, and the explanatory variables might be characteristics of the patients thought to be pertinent (sex, race, age, [[blood pressure]], [[body-mass index]], etc.).
-* The observed outcomes are the votes (e.g. [[Democratic Party (United States)|Democratic]] or [[Republican Party (United States)|Republican]]) of a set of people in an election, and the explanatory variables are the demographic characteristics of each person (e.g. sex, race, age, income, etc.).  In such a case, one of the two outcomes is arbitrarily coded as 1, and the other as 0.
-As in linear regression, the outcome variables ''Y''<sub>''i''</sub> are assumed to depend on the explanatory variables ''x''<sub>1,''i''</sub> ... ''x''<sub>''m,i''</sub>.
-; Explanatory variables
-As shown above in the above examples, the explanatory variables may be of any [[statistical data type|type]]: [[real-valued]], [[binary variable|binary]], [[categorical variable|categorical]], etc.  The main distinction is between [[continuous variable]]s (such as income, age and [[blood pressure]]) and [[discrete variable]]s (such as sex or race).  Discrete variables referring to more than two possible choices are typically coded using [[Dummy variable (statistics)|dummy variable]]s (or [[indicator variable]]s), that is, separate explanatory variables taking the value 0 or 1 are created for each possible value of the discrete variable, with a 1 meaning "variable does have the given value" and a 0 meaning "variable does not have that value".  For example, a four-way discrete variable of [[blood type]] with the possible values "A, B, AB, O" can be converted to four separate two-way dummy variables, "is-A, is-B, is-AB, is-O", where only one of them has the value 1 and all the rest have the value 0.  This allows for separate regression coefficients to be matched for each possible value of the discrete variable. (In a case like this, only three of the four dummy variables are independent of each other, in the sense that once the values of three of the variables are known, the fourth is automatically determined.  Thus, it is only necessary to encode three of the four possibilities as dummy variables.  This also means that when all four possibilities are encoded, the overall model is not [[identifiable]] in the absence of additional constraints such as a regularization constraint.  Theoretically, this could cause problems, but in reality almost all logistic regression models are fit with regularization constraints.)
-; Outcome variables
-Formally, the outcomes ''Y''<sub>''i''</sub> are described as being [[Bernoulli distribution|Bernoulli-distributed]] data, where each outcome is determined by an unobserved probability ''p''<sub>''i''</sub> that is specific to the outcome at hand, but related to the explanatory variables.  This can be expressed in any of the following equivalent forms:
-:<math>
-\begin{align}
-Y_i\mid x_{1,i},\ldots,x_{m,i} \ & \sim  \operatorname{Bernoulli}(p_i) \\
-\mathbb{E}[Y_i\mid x_{1,i},\ldots,x_{m,i}] &= p_i  \\
-\Pr(Y_i=y_i\mid x_{1,i},\ldots,x_{m,i}) &=
-\begin{cases}
-p_i & \text{if }y_i=1 \\
--p_i & \text{if }y_i=0
-\end{cases}
-\\
-\Pr(Y_i=y_i\mid x_{1,i},\ldots,x_{m,i}) &= p_i^{y_i} (1-p_i)^{(1-y_i)}
-\end{align}
-</math>
-The meanings of these four lines are:
-# The first line expresses the [[probability distribution]] of each ''Y''<sub>''i''</sub>: Conditioned on the explanatory variables, it follows a [[Bernoulli distribution]] with parameters ''p''<sub>''i''</sub>, the probability of the outcome of 1 for trial ''i''. As noted above, each separate trial has its own probability of success, just as each trial has its own explanatory variables.  The probability of success ''p''<sub>''i''</sub> is not observed, only the outcome of an individual Bernoulli trial using that probability.
-# The second line expresses the fact that the [[expected value]] of each ''Y''<sub>''i''</sub> is equal to the probability of success ''p''<sub>''i''</sub>, which is a general property of the Bernoulli distribution.  In other words, if we run a large number of Bernoulli trials using the same probability of success ''p''<sub>''i''</sub>, then take the average of all the 1 and 0 outcomes, then the result would be close to ''p''<sub>''i''</sub>.  This is because doing an average this way simply computes the proportion of successes seen, which we expect to converge to the underlying probability of success.
-# The third line writes out the [[probability mass function]] of the Bernoulli distribution, specifying the probability of seeing each of the two possible outcomes.
-# The fourth line is another way of writing the probability mass function, which avoids having to write separate cases and is more convenient for certain types of calculations.  This relies on the fact that ''Y''<sub>''i''</sub> can take only the value 0 or 1.  In each case, one of the exponents will be 1, "choosing" the value under it, while the other is 0, "canceling out" the value under it.  Hence, the outcome is either ''p''<sub>''i''</sub> or 1&nbsp;&minus;&nbsp;''p''<sub>''i''</sub>, as in the previous line.
-; Linear predictor function
-The basic idea of logistic regression is to use the mechanism already developed for [[linear regression]] by modeling the probability ''p''<sub>''i''</sub> using a [[linear predictor function]], i.e. a [[linear combination]] of the explanatory variables and a set of [[regression coefficient]]s that are specific to the model at hand but the same for all trials.  The linear predictor function <math>f(i)</math> for a particular data point ''i'' is written as:
-:<math>f(i) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_m x_{m,i},</math>
-where <math>\beta_0, \ldots, \beta_m</math> are [[regression coefficient]]s indicating the relative effect of a particular explanatory variable on the outcome.
-The model is usually put into a more compact form as follows:
-* The regression coefficients ''β''<sub>0</sub>, ''β''<sub>1</sub>, ..., ''β''<sub>''m''</sub> are grouped into a single vector '''''β''''' of size ''m''&nbsp;+&nbsp;1.
-* For each data point ''i'', an additional explanatory pseudo-variable ''x''<sub>0,''i''</sub> is added, with a fixed value of 1, corresponding to the [[Y-intercept|intercept]] coefficient ''β''<sub>0</sub>.
-* The resulting explanatory variables ''x''<sub>0,''i''</sub>, ''x''<sub>1,''i''</sub>, ..., ''x''<sub>''m,i''</sub> are then grouped into a single vector '''''X<sub>i</sub>''''' of size ''m''&nbsp;+&nbsp;1.
-This makes it possible to write the linear predictor function as follows:
-:<math>f(i)= \boldsymbol\beta \cdot \mathbf{X}_i,</math>
-using the notation for a [[dot product]] between two vectors.
-===As a generalized linear model===
-The particular model used by logistic regression, which distinguishes it from standard [[linear regression]] and from other types of [[regression analysis]] used for [[binary-valued]] outcomes, is the way the probability of a particular outcome is linked to the linear predictor function:
-:<math>\operatorname{logit}(\mathbb{E}[Y_i\mid x_{1,i},\ldots,x_{m,i}]) = \operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_m x_{m,i}</math>
-Written using the more compact notation described above, this is:
-:<math>\operatorname{logit}(\mathbb{E}[Y_i\mid \mathbf{X}_i]) = \operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \boldsymbol\beta \cdot \mathbf{X}_i</math>
-This formulation expresses logistic regression as a type of [[generalized linear model]], which predicts variables with various types of [[probability distribution]]s by fitting a linear predictor function of the above form to some sort of arbitrary transformation of the expected value of the variable.
-The intuition for transforming using the logit function (the natural log of the odds) was explained above.  It also has the practical effect of converting the probability (which is bounded to be between 0 and 1) to a variable that ranges over <math>(-\infty,+\infty)</math> — thereby matching the potential range of the linear prediction function on the right side of the equation.
-Note that both the probabilities ''p''<sub>''i''</sub> and the regression coefficients are unobserved, and the means of determining them is not part of the model itself.  They are typically determined by some sort of optimization procedure, e.g. [[maximum likelihood estimation]], that finds values that best fit the observed data (i.e. that give the most accurate predictions for the data already observed), usually subject to [[regularization (mathematics)|regularization]] conditions that seek to exclude unlikely values, e.g. extremely large values for any of the regression coefficients.  The use of a regularization condition is equivalent to doing [[maximum a posteriori]] (MAP) estimation, an extension of maximum likelihood.  (Regularization is most commonly done using [[Ridge regression|a squared regularizing function]], which is equivalent to placing a zero-mean [[Gaussian distribution|Gaussian]] [[prior distribution]] on the coefficients, but other regularizers are also possible.)  Whether or not regularization is used, it is usually not possible to find a closed-form solution; instead, an iterative numerical method must be used, such as [[iteratively reweighted least squares]] (IRLS) or, more commonly these days, a [[quasi-Newton method]] such as the [[L-BFGS|L-BFGS method]].
-The interpretation of the ''β''<sub>''j''</sub> parameter estimates is as the additive effect on the log of the [[odds]] for a unit change in the ''j''th explanatory variable.  In the case of a dichotomous explanatory variable, for instance gender, <math>e^\beta</math> is the estimate of the odds of having the outcome for, say, males compared with females.
-An equivalent formula uses the inverse of the logit function, which is the [[logistic function]], i.e.:
-:<math>\mathbb{E}[Y_i\mid \mathbf{X}_i] = p_i = \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) = \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}</math>
-The formula can also be written (somewhat awkwardly) as a [[probability distribution]] (specifically, using a [[probability mass function]]):
-:<math>\operatorname{Pr}(Y_i=y_i\mid \mathbf{X}_i) = {p_i}^{y_i}(1-p_i)^{1-y_i} =\left(\frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{y_i} \left(1-\frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{1-y_i}</math>
-===As a latent-variable model===
-The above model has an equivalent formulation as a [[latent-variable model]].  This formulation is common in the theory of [[discrete choice]] models, and makes it easier to extend to certain more complicated models with multiple, correlated choices, as well as to compare logistic regression to the closely related [[probit model]].
-Imagine that, for each trial ''i'', there is a continuous [[latent variable]] ''Y''<sub>''i''</sub><sup>''*''</sup> (i.e. an unobserved [[random variable]]) that is distributed as follows:
-: <math> Y_i^\ast = \boldsymbol\beta \cdot \mathbf{X}_i + \varepsilon \, </math>
-where
-: <math>\varepsilon \sim \operatorname{Logistic}(0,1) \, </math>
-i.e. the latent variable can be written directly in terms of the linear predictor function and an additive random [[error variable]] that is distributed according to a standard [[logistic distribution]].
-Then ''Y''<sub>''i''</sub> can be viewed as an indicator for whether this latent variable is positive:
-: <math> Y_i = \begin{cases} 1 & \text{if }Y_i^\ast > 0 \ \text{ i.e. } - \varepsilon < \boldsymbol\beta \cdot \mathbf{X}_i, \\
-&\text{otherwise.} \end{cases} </math>
-The choice of modeling the error variable specifically with a standard logistic distribution, rather than a general logistic distribution with the location and scale set to arbitrary values, seems restrictive, but in fact it is not.  It must be kept in mind that we can choose the regression coefficients ourselves, and very often can use them to offset changes in the parameters of the error variable's distribution.  For example, a logistic error-variable distribution with a non-zero location parameter ''&mu;'' (which sets the mean) is equivalent to a distribution with a zero location parameter, where ''&mu;'' has been added to the intercept coefficient.  Both situations produce the same value for ''Y''<sub>''i''</sub><sup>''*''</sup> regardless of settings of explanatory variables.  Similarly, an arbitrary scale parameter ''s'' is equivalent to setting the scale parameter to 1 and then dividing all regression coefficients by ''s''.  In the latter case, the resulting value of ''Y''<sub>''i''</sub><sup>''*''</sup> will be smaller by a factor of ''s'' than in the former case, for all sets of explanatory variables — but critically, it will always remain on the same side of 0, and hence lead to the same ''Y''<sub>''i''</sub> choice.
-(Note that this predicts that the irrelevancy of the scale parameter may not carry over into more complex models where more than two choices are available.)
-It turns out that this formulation is exactly equivalent to the preceding one, phrased in terms of the [[generalized linear model]] and without any [[latent variable]]s.  This can be shown as follows, using the fact that the [[cumulative distribution function]] (CDF) of the standard [[logistic distribution]] is the [[logistic function]], which is the inverse of the [[logit function]], i.e.
-:<math>\Pr(\varepsilon < x) = \operatorname{logit}^{-1}(x)</math>
-Then:
-:<math>
-\begin{align}
-\Pr(Y_i=1\mid\mathbf{X}_i) &= \Pr(Y_i^\ast > 0\mid\mathbf{X}_i) & \\
-&= \Pr(\boldsymbol\beta \cdot \mathbf{X}_i + \varepsilon > 0) & \\
-&= \Pr(\varepsilon > -\boldsymbol\beta \cdot \mathbf{X}_i) &\\
-&= \Pr(\varepsilon < \boldsymbol\beta \cdot \mathbf{X}_i) & \text{(because the logistic distribution is symmetric)} \\
-&= \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) & \\
-&= p_i & \text{(see above)}
-\end{align}
-</math>
-This formulation — which is standard in [[discrete choice]] models — makes clear the relationship between logistic regression (the "logit model") and the [[probit model]], which uses an error variable distributed according to a standard [[normal distribution]] instead of a standard logistic distribution.  Both the logistic and normal distributions are symmetric with a basic unimodal, "bell curve" shape.  The only difference is that the logistic distribution has somewhat [[heavy-tailed distribution|heavier tails]], which means that it is less sensitive to outlying data (and hence somewhat more [[robust statistics|robust]] to model mis-specifications or erroneous data).
-===As a two-way latent-variable model===
-Yet another formulation uses two separate latent variables:
-: <math>
-\begin{align}
-Y_i^{0\ast} &= \boldsymbol\beta_0 \cdot \mathbf{X}_i + \varepsilon_0 \, \\
-Y_i^{1\ast} &= \boldsymbol\beta_1 \cdot \mathbf{X}_i + \varepsilon_1 \,
-\end{align}
-</math>
-where
-: <math>
-\begin{align}
-\varepsilon_0 & \sim \operatorname{EV}_1(0,1) \\
-\varepsilon_1 & \sim \operatorname{EV}_1(0,1)
-\end{align}
-</math>
-where ''EV<sub>1</sub>(0,1)'' is a standard type-1 [[extreme value distribution]]: i.e.
-:<math>\Pr(\varepsilon_0=x) = \Pr(\varepsilon_1=x) = e^{-x} e^{-e^{-x}}</math>
-Then
-: <math> Y_i = \begin{cases} 1 & \text{if }Y_i^{1\ast} > Y_i^{0\ast}, \\
-&\text{otherwise.} \end{cases} </math>
-This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable.  The reason for this separation is that it makes it easy to extend logistic regression to multi-outcome categorical variables, as in the [[multinomial logit]] model. In such a model, it is natural to model each possible outcome using a different set of regression coefficients.  It is also possible to motivate each of the separate latent variables as the theoretical [[utility]] associated with making the associated choice, and thus motivate logistic regression in terms of [[utility theory]]. (In terms of utility theory, a rational actor always chooses the choice with the greatest associated utility.) This is in fact the approach taken by economists when formulating [[discrete choice]] models, because it both provides a theoretically strong foundation and facilitates intuitions about the model, which in turn makes it easy to consider various sorts of extensions. (See the example below.)
-The choice of the type-1 [[extreme value distribution]] seems fairly arbitrary, but it makes the mathematics work out, and it may be possible to justify its use through [[rational choice theory]].
-It turns out that this model is equivalent to the previous model, although this seems non-obvious, since there are now two sets of regression coefficients and error variables, and the error variables have a different distribution.  In fact, this model reduces directly to the previous one with the following substitutions:
-:<math>\boldsymbol\beta = \boldsymbol\beta_1 - \boldsymbol\beta_0</math>
-:<math>\varepsilon = \varepsilon_1 - \varepsilon_0</math>
-An intuition for this comes from the fact that, since we choose based on the maximum of two values, only their difference matters, not the exact values — and this effectively removes one [[Degrees of freedom (statistics)|degree of freedom]]. Another critical fact is that the difference of two type-1 extreme-value-distributed variables is a logistic distribution, i.e. if <math>\varepsilon = \varepsilon_1 - \varepsilon_0 \sim \operatorname{Logistic}(0,1) .</math>
-We can demonstrate the equivalent as follows:
-:<math>
-\begin{align}
-\Pr(Y_i=1\mid\mathbf{X}_i) &= \Pr(Y_i^{1\ast} > Y_i^{0\ast}\mid\mathbf{X}_i) & \\
-&= \Pr(Y_i^{1\ast} - Y_i^{0\ast} > 0\mid\mathbf{X}_i) & \\
-&= \Pr(\boldsymbol\beta_1 \cdot \mathbf{X}_i + \varepsilon_1 - (\boldsymbol\beta_0 \cdot \mathbf{X}_i + \varepsilon_0) > 0) & \\
-&= \Pr((\boldsymbol\beta_1 \cdot \mathbf{X}_i - \boldsymbol\beta_0 \cdot \mathbf{X}_i) + (\varepsilon_1 - \varepsilon_0) > 0) & \\
-&= \Pr((\boldsymbol\beta_1 - \boldsymbol\beta_0) \cdot \mathbf{X}_i + (\varepsilon_1 - \varepsilon_0) > 0) & \\
-&= \Pr((\boldsymbol\beta_1 - \boldsymbol\beta_0) \cdot \mathbf{X}_i + \varepsilon > 0) & \text{(substitute }\varepsilon\text{ as above)} \\
-&= \Pr(\boldsymbol\beta \cdot \mathbf{X}_i + \varepsilon > 0) & \text{(substitute }\boldsymbol\beta\text{ as above)} \\
-&= \Pr(\varepsilon > -\boldsymbol\beta \cdot \mathbf{X}_i) & \text{(now, same as above model)}\\
-&= \Pr(\varepsilon < \boldsymbol\beta \cdot \mathbf{X}_i) & \\
-&= \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) & \\
-&= p_i &
-\end{align}
-</math>
-====Example====
-As an example, consider a province-level election where the choice is between a right-of-center party, a left-of-center party, and a secessionist party (e.g. the [[Parti Québécois]], which wants [[Quebec]] to secede from [[Canada]]).  We would then use three latent variables, one for each choice.  Then, in accordance with [[utility theory]], we can then interpret the latent variables as expressing the [[utility]] that results from making each of the choices.  We can also interpret the regression coefficients as indicating the strength that the associated factor (i.e. explanatory variable) has in contributing to the utility — or more correctly, the amount by which a unit change in an explanatory variable changes the utility of a given choice.  A voter might expect that the right-of-center party would lower taxes, especially on rich people.  This would give low-income people no benefit, i.e. no change in utility (since they usually don't pay taxes); would cause moderate benefit (i.e. somewhat more money, or moderate utility increase) for middle-incoming people; and would cause significant benefits for high-income people.  On the other hand, the left-of-center party might be expected to raise taxes and offset it with increased welfare and other assistance for the lower and middle classes.  This would cause significant positive benefit to low-income people, perhaps weak benefit to middle-income people, and significant negative benefit to high-income people.  Finally, the secessionist party would take no direct actions on the economy, but simply secede. A low-income or middle-income voter might expect basically no clear utility gain or loss from this, but a high-income voter might expect negative utility, since he/she is likely to own companies, which will have a harder time doing business in such an environment and probably lose money.
-These intuitions can be expressed as follows:
-{|class="wikitable"
-|+Estimated strength of regression coefficient for different outcomes (party choices) and different values of explanatory variables
-|-
-! !! Center-right !! Center-left !! Secessionist
-|-
-! High-income
-| strong + || strong − || strong −
-|-
-! Middle-income
-| moderate + || weak + || none
-|-
-! Low-income
-| none || strong + || none
-|-
-|}
-This clearly shows that
-#Separate sets of regression coefficients need to exist for each choice.  When phrased in terms of utility, this can be seen very easily. Different choices have different effects on net utility; furthermore, the effects vary in complex ways that depend on the characteristics of each individual, so there need to be separate sets of coefficients for each characteristic, not simply a single extra per-choice characteristic.
-#Even though income is a continuous variable, its effect on utility is too complex for it to be treated as a single variable.  Either it needs to be directly split up into ranges, or higher powers of income need to be added so that [[polynomial regression]] on income is effectively done.
-==={{anchor|log-linear model}}As a "log-linear" model===
-Yet another formulation combines the two-way latent variable formulation above with the original formulation higher up without latent variables, and in the process provides a link to one of the standard formulations of the [[multinomial logit]].
-Here, instead of writing the [[logit]] of the probabilities ''p''<sub>''i''</sub> as a linear predictor, we separate the linear predictor into two, one for each of the two outcomes:
-: <math>
-\begin{align}
-\ln \Pr(Y_i=0) &= \boldsymbol\beta_0 \cdot \mathbf{X}_i - \ln Z \, \\
-\ln \Pr(Y_i=1) &= \boldsymbol\beta_1 \cdot \mathbf{X}_i - \ln Z \, \\
-\end{align}
-</math>
-Note that two separate sets of regression coefficients have been introduced, just as in the two-way latent variable model, and the two equations appear a form that writes the [[logarithm]] of the associated probability as a linear predictor, with an extra term <math>- ln Z</math> at the end.  This term, as it turns out, serves as the [[normalizing factor]] ensuring that the result is a distribution.  This can be seen by exponentiating both sides:
-: <math>
-\begin{align}
-\Pr(Y_i=0) &= \frac{1}{Z} e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} \, \\
-\Pr(Y_i=1) &= \frac{1}{Z} e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i} \, \\
-\end{align}
-</math>
-In this form it is clear that the purpose of ''Z'' is to ensure that the resulting distribution over ''Y''<sub>''i''</sub> is in fact a [[probability distribution]], i.e. it sums to 1.  This means that ''Z'' is simply the sum of all un-normalized probabilities, and by dividing each probability by ''Z'', the probabilities become "[[normalizing constant|normalized]]".  That is:
-:<math> Z = e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}</math>
-and the resulting equations are
-:<math>
-\begin{align}
-\Pr(Y_i=0) &= \frac{e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i}}{e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}} \, \\
-\Pr(Y_i=1) &= \frac{e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}}{e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}} \,
-\end{align}
-</math>
-Or generally:
-:<math>\Pr(Y_i=c) = \frac{e^{\boldsymbol\beta_c \cdot \mathbf{X}_i}}{\sum_h e^{\boldsymbol\beta_h \cdot \mathbf{X}_i}}</math>
-This shows clearly how to generalize this formulation to more than two outcomes, as in [[multinomial logit]].
-In order to prove that this is equivalent to the previous model, note that the above model is overspecified, in that <math>\Pr(Y_i=0)</math> and <math>\Pr(Y_i=1)</math> cannot be independently specified: rather <math>\Pr(Y_i=0) + \Pr(Y_i=1) = 1</math> so knowing one automatically determines the other.  As a result, the model is [[nonidentifiable]], in that multiple combinations of '''''β'''''<sub>0</sub> and '''''β'''''<sub>1</sub> will produce the same probabilities for all possible explanatory variables.  In fact, it can be seen that adding any constant vector to both of them will produce the same probabilities:
-:<math>
-\begin{align}
-\Pr(Y_i=1) &= \frac{e^{(\boldsymbol\beta_1 +\mathbf{C}) \cdot \mathbf{X}_i}}{e^{(\boldsymbol\beta_0  +\mathbf{C})\cdot \mathbf{X}_i} + e^{(\boldsymbol\beta_1 +\mathbf{C}) \cdot \mathbf{X}_i}} \, \\
-&= \frac{e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i} e^{\mathbf{C} \cdot \mathbf{X}_i}}{e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} e^{\mathbf{C} \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i} e^{\mathbf{C} \cdot \mathbf{X}_i}} \, \\
-&= \frac{e^{\mathbf{C} \cdot \mathbf{X}_i}e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}}{e^{\mathbf{C} \cdot \mathbf{X}_i}(e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i})} \, \\
-&= \frac{e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}}{e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}} \, \\
-\end{align}
-</math>
-As a result, we can simplify matters, and restore identifiability, by picking an arbitrary value for one of the two vectors.  We choose to set <math>\boldsymbol\beta_0 = \mathbf{0} .</math>  Then,
-:<math>e^{\boldsymbol\beta_0 \cdot \mathbf{X}_i} = e^{\mathbf{0} \cdot \mathbf{X}_i} = 1</math>
-and so
-:<math>
-\Pr(Y_i=1) = \frac{e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}}{1 + e^{\boldsymbol\beta_1 \cdot \mathbf{X}_i}} = \frac{1}{1+e^{-\boldsymbol\beta_1 \cdot \mathbf{X}_i}} = p_i</math>
-which shows that this formulation is indeed equivalent to the previous formulation. (As in the two-way latent variable formulation, any settings where <math>\boldsymbol\beta = \boldsymbol\beta_1 - \boldsymbol\beta_0</math> will produce equivalent results.)
-Note that most treatments of the [[multinomial logit]] model start out either by extending the "log-linear" formulation presented here or the two-way latent variable formulation presented above, since both clearly show the way that the model could be extended to multi-way outcomes.  In general, the presentation with latent variables is more common in [[econometrics]] and [[political science]], where [[discrete choice]] models and [[utility theory]] reign, while the "log-linear" formulation here is more common in [[computer science]], e.g. [[machine learning]] and [[natural language processing]].
-===As a single-layer perceptron===
-The model has an equivalent formulation
-:<math>p_i = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i})}}. \, </math>
-This functional form is commonly called a single-layer [[perceptron]] or single-layer [[artificial neural network]]. A single-layer neural network computes a continuous output instead of a [[step function]]. The derivative of ''p<sub>i</sub>'' with respect to  ''X''&nbsp;=&nbsp;(''x''<sub>1</sub>, ..., ''x''<sub>''k''</sub>) is computed from the general form:
-: <math>y = \frac{1}{1+e^{-f(X)}}</math>
-where ''f''(''X'') is an [[analytic function]] in ''X''. With this choice, the single-layer neural network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in [[backpropagation]]. This function is also preferred because its derivative is easily calculated:
-: <math>\frac{\mathrm{d}y}{\mathrm{d}X} = y(1-y)\frac{\mathrm{d}f}{\mathrm{d}X}. \, </math>
-===In terms of binomial data===
-A closely related model assumes that each ''i'' is associated not with a single Bernoulli trial but with ''n''<sub>''i''</sub> [[independent identically distributed]] trials, where the observation ''Y''<sub>''i''</sub> is the number of successes observed (the sum of the individual Bernoulli-distributed random variables), and hence follows a [[binomial distribution]]:
-:<math>Y_i \ \sim  \operatorname{Bin}(n_i,p_i),\text{ for }i = 1, \dots , n</math>
-An example of this distribution is the fraction of seeds (''p''<sub>''i''</sub>) that germinate after ''n''<sub>''i''</sub> are planted.
-In terms of [[expected value]]s, this model is expressed as follows:
-:<math>p_i = \mathbb{E}\left[\left.\frac{Y_i}{n_{i}}\,\right|\,\mathbf{X}_i \right], </math>
-so that
-:<math>\operatorname{logit}\left(\mathbb{E}\left[\left.\frac{Y_i}{n_{i}}\,\right|\,\mathbf{X}_i \right]\right) = \operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \boldsymbol\beta \cdot \mathbf{X}_i,</math>
-Or equivalently:
-:<math>\operatorname{Pr}(Y_i=y_i\mid \mathbf{X}_i) = {n_i \choose y_i} p_i^{y_i}(1-p_i)^{n_i-y_i} ={n_i \choose y_i} \left(\frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{y_i} \left(1-\frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{n_i-y_i}</math>
-This model can be fit using the same sorts of methods as the above more basic model.
-== Bayesian logistic regression ==
-[[File:Logistic-sigmoid-vs-scaled-probit.svg|right|300px|thumb|Comparison of [[logistic function]] with a scaled inverse [[probit function]] (i.e. the [[cumulative distribution function|CDF]] of the [[normal distribution]]), comparing <math>\sigma(x)</math> vs. <math>\Phi(\sqrt{\frac{\pi}{8}}x)</math>, which makes the slopes the same at the origin.  This shows the [[heavy-tailed distribution|heavier tails]] of the logistic distribution.]]
-In a [[Bayesian statistics]] context, [[prior distribution]]s are normally placed on the regression coefficients, usually in the form of [[Gaussian distribution]]s.  Unfortunately, the Gaussian distribution is not the [[conjugate prior]] of the [[likelihood function]] in logistic regression; in fact, the likelihood function is not an [[exponential family]] and thus does not have a conjugate prior at all.  As a result, the [[posterior distribution]] is difficult to calculate, even using standard simulation algorithms (e.g. [[Gibbs sampling]]).
-There are various possibilities:
-*Don't do a proper Bayesian analysis, but simply compute a [[maximum a posteriori]] point estimate of the parameters.  This is common, for example, in "maximum entropy" classifiers in [[machine learning]].
-*Use a more general approximation method such as the [[Metropolis–Hastings algorithm]].
-*Draw a Markov chain Monte Carlo sample from the exact posterior by using the  Independent Metropolis–Hastings algorithm with heavy-tailed multivariate candidate distribution found by matching the mode and curvature at the mode of the normal approximation to the posterior and then using the Student’s t shape with low degrees of freedom.<ref name=Bolstad/>  This is shown to have excellent convergence properties.
-*Use a [[latent variable model]] and approximate the logistic distribution using a more tractable distribution, e.g. a [[Student's t-distribution]] or a [[mixture density|mixture]] of [[normal distribution]]s.
-*Do [[probit regression]] instead of logistic regression.  This is actually a special case of the previous situation, using a [[normal distribution]] in place of a Student's t, mixture of normals, etc.  This will be less accurate but has the advantage that probit regression is extremely common, and a ready-made Bayesian implementation may already be available.
-*Use the [[Laplace approximation]] of the posterior distribution.<ref>{{cite book |last=Bishop |first=Christopher M |title=Pattern Recognition and Machine Learning |publisher=Springer Science+Business Media, LLC |isbn=978-0387-31073-2 |chapter=Chapter 4. Linear Models for Classification |pages=217–218}}</ref>  This approximates the posterior with a Gaussian distribution.  This is not a terribly good approximation, but it suffices if all that is desired is an estimate of the posterior mean and variance.  In such a case, an approximation scheme such as [[variational Bayes]] can be used.<ref>{{cite book |last=Bishop |first=Christopher M |title=Pattern Recognition and Machine Learning |publisher=Springer Science+Business Media, LLC |isbn=978-0387-31073-2 |chapter=Chapter 10. Approximate Inference |pages=498–505}}</ref>
-===Gibbs sampling with an approximating distribution===
-As shown above, logistic regression is equivalent to a [[latent variable model]] with an [[error variable]] distributed according to a standard [[logistic distribution]].  The overall distribution of the latent variable <math>Y_i\ast</math> is also a logistic distribution, with the mean equal to <math>\boldsymbol\beta \cdot \mathbf{X}_i</math> (i.e. the fixed quantity added to the error variable).  This model considerably simplifies the application of techniques such as [[Gibbs sampling]].  However, sampling the regression coefficients is still difficult, because of the lack of [[conjugate prior|conjugacy]] between the normal and logistic distributions.  Changing the prior distribution over the regression coefficients is of no help, because the logistic distribution is not in the [[exponential family]] and thus has no [[conjugate prior]].
-One possibility is to use a more general [[Markov chain Monte Carlo]] technique, such as the [[Metropolis–Hastings algorithm]], which can sample arbitrary distributions.  Another possibility, however, is to replace the logistic distribution with a similar-shaped distribution that is easier to work with using Gibbs sampling.  In fact, the logistic and normal distributions have a similar shape, and thus one possibility is simply to have normally distributed errors.  Because the normal distribution is conjugate to itself, sampling the regression coefficients becomes easy.  In fact, this model is exactly the model used in [[probit regression]].
-However, the normal and logistic distributions differ in that the logistic has [[heavy-tailed distribution|heavier tails]].  As a result, it is more [[robust statistics|robust]] to inaccuracies in the underlying model (which are inevitable, in that the model is essentially always an approximation) or to errors in the data.  Probit regression loses some of this robustness.
-Another alternative is to use errors distributed as a [[Student's t-distribution]].  The Student's t-distribution has heavy tails, and is easy to sample from because it is the [[compound distribution]] of a normal distribution with variance distributed as an [[inverse gamma distribution]].  In other words, if a normal distribution is used for the error variable, and another [[latent variable]], following an inverse gamma distribution, is added corresponding to the variance of this error variable, the [[marginal distribution]] of the error variable will follow a Student's t-distribution.  Because of the various conjugacy relationships, all variables in this model are easy to sample from.
-The Student's t-distribution that best approximates a standard logistic distribution can be determined by [[method of moments (statistics)|matching the moments]] of the two distributions.  The Student's t-distribution has three parameters, and since the [[skewness]] of both distributions is always 0, the first four moments can all be matched, using the following equations:
-:<math>
-\begin{align}
-\mu &= 0 \\
-\frac{\nu}{\nu-2} s^2 &= \frac{\pi^2}{3} \\
-\frac{6}{\nu-4} &= \frac{6}{5}
-\end{align}
-</math>
-This yields the following values:
-:<math>
-\begin{align}
-\mu &= 0 \\
-s &= \sqrt{\frac{7}{9} \frac{\pi^2}{3}} \\
-\nu &= 9
-\end{align}
-</math>
-The following graphs compare the standard logistic distribution with the Student's t-distribution that matches the first four moments using the above-determined values, as well as the normal distribution that matches the first two moments.  Note how much closer the Student's t-distribution agrees, especially in the tails.  Beyond about two standard deviations from the mean, the logistic and normal distributions diverge rapidly, but the logistic and Student's t-distributions don't start diverging significantly until more than 5 standard deviations away.
-(Another possibility, also amenable to Gibbs sampling, is to approximate the logistic distribution using a [[mixture density]] of [[normal distribution]]s.)
-{|
-| [[File:Logistic-t-normal.svg|400px|thumb|Comparison of logistic and approximating distributions (t, normal).]]
-| [[File:Logistic-t-normal-tails.svg|400px|thumb|Tails of distributions.]]
-|-
-| [[File:Logistic-t-normal-further-tails.svg|400px|thumb|Further tails of distributions.]]
-| [[File:Logistic-t-normal-extreme-tails.svg|400px|thumb|Extreme tails of distributions.]]
-|}
-== Extensions ==
-There are large numbers of extensions:
-*[[Multinomial logistic regression]] (or '''multinomial logit''') handles the case of a multi-way [[categorical variable|categorical]] dependent variable (with unordered values, also called "classification").  Note that the general case of having dependent variables with more than two values is termed ''polytomous regression''.
-*[[Ordered logistic regression]] (or '''ordered logit''') handles [[Levels of measurement#Ordinal measurement|ordinal]] dependent variables (ordered values).
-*[[Mixed logit]] is an extension of multinomial logit that allows for correlations among the choices of the dependent variable.
-*An extension of the logistic model to sets of interdependent variables is the [[conditional random field]].
-== Model suitability ==
-A way to measure a model's suitability is to assess the model against a set of data that was not used to create the model.<ref>Jonathan Mark and Michael A. Goldberg (2001). Multiple Regression Analysis and Mass Assessment: A Review of the Issues. The Appraisal Journal, Jan. pp. 89–109</ref> The class of techniques is called [[cross-validation (statistics)|cross-validation]]. This holdout model assessment method is particularly valuable when data are collected in different settings (e.g., at different times or places) or when models are assumed to be generalizable.
-To measure the suitability of a binary regression model, one can classify both the actual value and the predicted value of each observation as either 0 or 1.<ref>{{cite journal |last=Myers |first=J. H. |last2=Forgy |first2=E. W. |year=1963 |title=The Development of Numerical Credit Evaluation Systems |journal=[[Journal of the American Statistical Association|J. Amer. Statist. Assoc.]] |volume=58 |issue=303 |pages=799–806 |doi=10.1080/01621459.1963.10500889 }}</ref> The predicted value of an observation can be set equal to 1 if the estimated probability that the observation equals 1 is above <math>\frac{1}{2}</math>, and set equal to 0 if the estimated probability is below <math>\frac{1}{2}</math>. Here logistic regression is being used as a [[binary classification]] model. There are four possible combined classifications:
-# prediction of 0 when the holdout sample has a 0 (True Negatives, the number of which is TN)
-# prediction of 0 when the holdout sample has a 1 (False Negatives, the number of which is FN)
-# prediction of 1 when the holdout sample has a 0 (False Positives, the number of which is FP)
-# prediction of 1 when the holdout sample has a 1 (True Positives, the number of which is TP)
-These classifications are used to calculate [[Accuracy and precision#In binary classification|accuracy, precision]] (also called [[positive predictive value]]), [[Precision and recall|recall]] (also called [[Sensitivity and specificity|sensitivity]]), [[Sensitivity and specificity|specificity]] and [[negative predictive value]]:
-: <math>\text{Accuracy}=\frac{TP+TN}{TP+FP+FN+TN}</math> = fraction of observations with correct predicted classification
-: <math>\text{Precision} = \text{PositivePredictiveValue} =\frac{TP}{TP+FP} \, </math> = Fraction of predicted positives that are correct
-:<math>\text{NegativePredictiveValue} = \frac{TN}{TN+FN}</math> = fraction of predicted negatives that are correct
-: <math>\text{Recall} = \text{Sensitivity} = \frac{TP}{TP+FN} \, </math> = fraction of observations that are actually 1 with a correct predicted classification
-:<math>\text{Specificity} = \frac{TN}{TN+FP}</math> = fraction of observations that are actually 0 with a correct predicted classification
-== See also ==
-{{Portal|Statistics}}
-* [[Logistic function]]
-* [[Discrete choice]]
-* [[Jarrow&ndash;Turnbull model]]
-* [[Limited dependent variable]]
-* [[Multinomial logit|Multinomial logit model]]
-* [[Ordered logit]]
-* [[Hosmer–Lemeshow test]]
-* [[Brier score]]
-* [[MLPACK (C++ library)|MLPACK]] - contains a [[C++]] implementation of logistic regression
-==References==
-{{Reflist|refs=
-<ref name=Hosmer>{{cite book  | last1 = Hosmer  | first1 = David W.
-  | first2= Stanley |last2=Lemeshow
-  | title = Applied Logistic Regression |edition= 2nd
-  | publisher = Wiley
-  | year = 2000   | isbn = 0-471-35632-8 }}  {{page needed|date=May 2012}}</ref>
-<ref name=Menard>{{cite book | last = Menard
-  | first = Scott W.
-  | title = Applied Logistic Regression |edition= 2nd
-  | publisher = SAGE
-  | year = 2002  | isbn = 978-0-7619-2208-7 }} {{page needed|date=May 2012}}</ref>
-<ref name=Cohen>{{cite book | last1 = Cohen  | first1 = Jacob
-  | first2= Patricia |last2=Cohen |first3= Steven G. |last3= West |first4= Leona S. |last4= Aiken
-  | title = Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences |edition= 3rd
-  | publisher = Routledge
-  | year = 2002 | isbn = 978-0-8058-2223-6 }} {{page needed|date=May 2012}}
-</ref>
-<ref name=Bolstad>{{cite book | last = Bolstad
- | first = William M.
- | title = Understandeing Computational Bayesian Statistics
- | publisher = Wiley
- | year=2010 | isbn = 978-0-470-04609-8 }}  {{page needed|date=December 2010}}
-</ref>
-<ref name=Harrell>{{cite book | last = Harrell
- | first = Frank E.
- | title = Regression Modeling Strategies
- | publisher = Springer-Verlag
- | year=2001
- | isbn = 0-387-95232-2 }}</ref>
-}}
-==Further reading==
-*{{cite book
-  | last = Agresti
-  | first = Alan.
-  | title = Categorical Data Analysis
-  | publisher = New York: Wiley-Interscience
-  | year = 2002
-  | isbn = 0-471-36093-7 }}
-*{{cite book
-  | last = Amemiya
-  | first = T.
-  | title = Advanced Econometrics
-  | publisher = Harvard University Press
-  | year = 1985
-  | isbn = 0-674-00560-0 }}
-*{{cite book
-  | last = Balakrishnan
-  | first = N.
-  | title = Handbook of the Logistic Distribution
-  | publisher = Marcel Dekker, Inc.
-  | year = 1991
-  | isbn = 978-0-8247-8587-1 }}
-*{{cite book
-  | last = Greene
-  | first = William H.
-  | title = Econometric Analysis, fifth edition
-  | publisher = Prentice Hall
-  | year = 2003
-  | isbn = 0-13-066189-9 }}
-*{{cite book
-  | last = Hilbe
-  | first = Joseph M.
-  | title = Logistic Regression Models
-  | publisher = Chapman & Hall/CRC Press
-  | year = 2009
-  | isbn = 978-1-4200-7575-5}}
-*{{cite book
-  | last = Howell
-  | first = David C.
-  | title = Statistical Methods for Psychology, 7th ed.
-  | publisher = Belmont, CA; Thomson Wadsworth
-  | year = 2010
-  | isbn = 978-0-495-59786-5 }}
-*{{cite journal
-  | last = Peduzzi
-  | first = P.
-  | coauthors = J. Concato, E. Kemper, T.R. Holford, A.R. Feinstein
-  | title = A simulation study of the number of events per variable in logistic regression analysis
-  | journal = [[Journal of Clinical Epidemiology]]
-  | volume = 49 (12)
-  | pages = 1373–1379
-  | year = 1996
-  | PMID = 8970487 }}
-==External links==
-{{wikiversity}}
-* [http://www.appricon.com/index.php/logistic-regression-analysis.html Logistic Regression Interpretation]
-* [http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm Logistic Regression tutorial]
-* [http://www.simafore.com/blog/?Tag=logistic+regression Using open source software for building Logistic Regression models]
-{{Statistics|correlation}}
-[[Category:Categorical data]]
-[[Category:Log-linear models]]

Ascendant: Difference between revisions

Latest revision as of 13:52, 19 December 2014

Navigation menu

Search