Euler system: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
→‎References: Changed link of the Mazur-Rubin paper
m →‎References: expand bibliodata
 
Line 1: Line 1:
{{machine learning bar}}
The other day I woke up and realised - I've been  [http://www.hotelsedinburgh.org Luke Bryan Concert Dates 2014] single for a little while today and following much intimidation from friends I now locate myself signed up for on line dating. They assured me that there are a lot of pleasant, regular and entertaining visitors to meet up, so here goes the pitch!<br>My buddies and household are awesome and hanging out together at bar gigs or meals is obviously a must. I have never been in to [http://Wordpress.org/search/night+clubs night clubs] as I locate that one can never have a nice conversation with all the noise. I also have two quite cunning and definitely cheeky puppies that are [https://Www.google.com/search?hl=en&gl=us&tbm=nws&q=consistently&btnI=lucky consistently] enthusiastic to  [http://minioasis.com luke bryan vip package] meet fresh folks.<br>I attempt to maintain as toned as possible coming to the fitness center several-times per week. I appreciate my sports and attempt to perform or see because many a potential. I shall often at Hawthorn fits being wintertime. Note: I have noticed the carnage of wrestling fits  [http://lukebryantickets.lazintechnologies.com luke bryan 2014] at stocktake revenue, Supposing that you will contemplated purchasing a hobby I really do not mind.<br><br><br><br>My site: luke bryan concert tickets cheap ([http://lukebryantickets.neodga.com Going in lukebryantickets.neodga.com])
'''Statistical learning theory''' is a framework for [[machine learning]]
drawing from the fields of [[statistics]] and [[functional analysis]].<ref>[[Mehryar Mohri]], Afshin Rostamizadeh, Ameet Talwalkar (2012) ''Foundations of Machine Learning'', The
MIT Press ISBN 9780262018258.</ref>
Statistical learning theory deals with the problem of finding a
predictive function based on data. Statistical learning
theory has led to successful applications in fields such as [[computer vision]], [[speech recognition]], [[bioinformatics]], and [[baseball]].<ref>Gagan Sidhu, Brian Caffo. Exploiting pitcher decision-making using Reinforcement Learning. ''Annals of Applied Statistics''</ref> It is the theoretical
framework underlying [[support vector machines]].
 
==Introduction==
The goal of learning is prediction. Learning falls into many
categories, including [[supervised learning]], [[unsupervised learning]],
[[Online machine learning|online learning]], and [[reinforcement learning]]. From the perspective of
statistical learning theory, supervised learning is best understood.<ref>Tomaso Poggio, Lorenzo Rosasco, et al. ''Statistical Learning Theory and Applications'', 2012, Class 1 [http://www.mit.edu/~9.520/spring12/slides/class01/class01.pdf]</ref>
Supervised learning involves learning from a [[training set]] of data.
Every point in the training is an input-output pair, where the input
maps to an output. The learning problem consists of inferring the
function that maps between the input and the output in a predictive fashion,
such that the learned function can be used to predict output from
future input.
 
Depending of the type of output, supervised learning problems are
either problems of [[regression analysis|regression]] or problems of [[Statistical classification|classification]]. If the
output takes a continuous range of values, it is a regression problem.
Using [[Ohm's Law]] as an example, a regression could be performed with
voltage as input and current as output. The regression would find the
functional relationship between voltage and current to be
{{nowrap|<math>\frac{1}{R}</math>}}, such that
:<math>
I = \frac{1}{R} V
</math>
Classification problems are those for which the output will be an
element from a discrete set of labels. Classification is very common
for machine learning applications. In [[facial recognition system|facial recognition]], for
instance, a picture of a person's face would be the input, and the
output label would be that person's name. The input would be
represented by a large multidimensional vector, in which each
dimension represents the value of one of the pixels.
 
After learning a function based on the training set data, that
function is validated on a test set of data, data that did not appear
in the training set. Classification functions can use the percentage
of inputs that are correctly classified as a metric for how predictive the learned
function is, while regression functions must use some distance metric,
called a [[loss function]], for how accurate the predicted value is. A
familiar example of a loss function is the square of the difference
between the actual value and the predicted value; this is the loss
function used in ordinary least squares regression.
 
==Formal Description==
Take <math>X</math> to be the vector space of all possible inputs, and <math>Y</math> to be
the vector space of all possible outputs. Statistical learning theory
takes the perspective that there is some unknown probability
distribution over the product space <math>Z = X \otimes Y</math>, i.e. there
exists some unknown <math>p(z) = p(\vec{x},y)</math>. The training
set is made up of <math>n</math> samples from this probability distribution, and is notated
:<math>S = \{(\vec{x}_1,y_1), \dots ,(\vec{x}_n,y_n)\} = \{\vec{z}_1, \dots ,\vec{z}_n\}</math>
Every <math>\vec{x}_i</math> is an input vector from the training data, and <math>y_i</math>
is the output that corresponds to it.
 
In this formalism, the inference problem consists of finding a
function <math>f: X \mapsto Y</math> such that <math>f(\vec{x}) \sim y</math>. Let
<math>\mathcal{H}</math> be a space of functions <math>f: X \mapsto Y</math> called the
hypothesis space. The hypothesis space is the space of functions the
algorithm will search through. Let <math>V(f(\vec{x}),y)</math> be the [[loss functional]], a metric for the difference between the predicted value
<math>f(\vec{x})</math> and the actual value <math>y</math>. The [[expected risk]] is defined to
be
:<math>I[f] = \displaystyle \int_{X \otimes Y} V(f(\vec{x}),y) p(\vec{x},y) d\vec{x} dy</math>
The target function, the best possible function <math>f</math> that can be
chosen, is given by the <math>f</math> that satisfies
:<math>\inf_{f \in \mathcal{H}} I[f]</math>
 
Because the probability distribution <math>p(\vec{x},y)</math> is unknown, a
proxy measure for the expected risk must be used. This measure is based on the
training set, a sample from this unknown probability distribution. It
is called the [[empirical risk]]
:<math>I_S[f] = \frac{1}{n} \displaystyle \sum_{i=1}^n V( f(\vec{x}_i),y_i)</math>
A learning algorithm that chooses the function <math>f_S</math> that minimizes
the empirical risk is called [[empirical risk minimization]].
 
==Loss Functions==
The choice of loss function is a determining factor on the function
<math>f_S</math> that will be chosen by the learning algorithm. The loss function
also affects the convergence rate for an algorithm. It is important
for the loss function to be convex.<ref>Rosasco, L., Vito, E.D., Caponnetto, A., Fiana, M., and Verri A. 2004. ''Neural computation'' Vol 16, pp 1063-1076</ref>
 
Different loss functions are used depending on whether the problem is
one of regression or one of classification.
 
===Regression===
The most common loss function for regression is the square loss
function. This familiar loss function is used in ordinary least
squares regression.  The form is:
:<math>V(f(\vec{x}),y) = (y - f(\vec{x}))^2</math>
 
The absolute value loss is also sometimes used:
:<math>V(f(\vec{x}),y) = |y - f(\vec{x})|</math>
 
===Classification===
In some sense the 0-1 loss is the most natural loss function for
classification. It takes the value 0 if the predicted output is the
same as the actual output, and it takes the value 1 if the predicted output
is different from the actual output. For binary classification, this is:
:<math>V(f(\vec{x},y)) = \theta(- y f(\vec{x}))</math>
where <math>\Theta</math> is the Heaviside step function.
 
The 0-1 loss function, however, is not convex. The hinge loss is thus
often used:
:<math>V(f(\vec{x},y)) = (- y f(\vec{x}))_+</math>
 
==Regularization==
[[File:Overfitting on Training Set Data.pdf|thumb|This image represents an example of overfitting in machine learning. The red dots represent training set data. The green line represents the true functional relationship, while the blue line shows the learned function, which has fallen victim to overfitting.]]
 
In machine learning problems, a major problem that arises is that of
overfitting. Because learning is a prediction problem, the goal is
not to find a function that most closely fits the data, but to find one
that will most accurately will predict output from future input.
Empirical risk minimization runs this risk of overfitting: finding a
function that matches the data exactly but does not predict future output well.
 
Overfitting is symptomatic of unstable solutions; a small perturbation
in the training set data would cause a large variation in the learned
function. It can be shown that if the stability for the solution can
be guaranteed, generalization and consistency are guaranteed as well.<ref>Vapnik, V.N. and Chervonenkis, A.Y. 1971. On the uniform convergence of relative frequencies of events to their probabilities. ''Theory of Probability and its Applications'' Vol 16, pp 264-280.</ref><ref>Mukherjee, S., Niyogi, P. Poggio, T., and Rifkin, R. 2006. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. ''Advances in Computational Mathematics''. Vol 25, pp 161-193.</ref> [[Regularization (mathematics)|Regularization]] can solve the overfitting problem and give
the problem stability.
 
Regularization can be accomplished by restricting the hypothesis space
<math>\mathcal{H}</math>. A common example would be restricting <math>\mathcal{H}</math> to
linear functions: this can be seen as a reduction to the standard problem of
[[linear regression]]. <math>\mathcal{H}</math> could also be restricted to
polynomial of degree <math>p</math>, exponentials, or bounded functions on
[[Lp space|L1]]. Restriction of the hypothesis space avoids overfitting because
the form of the potential functions are limited, and so does not allow
for the choice of a function that gives empirical risk arbitrarily
close to zero.
 
Regularization can also be accomplished through [[Tikhonov regularization]]. This
consists of minimizing
:<math>\frac{1}{n} \displaystyle \sum_{i=1}^n V(f(\vec{x}_i,y_i)) + \gamma
\|f\|_{\mathcal{H}}^2</math>
where <math>\gamma</math> is a fixed and positive parameter, the regularization
parameter. Tikhonov regularization ensures existence, uniqueness, and
stability of the solution.<ref>Tomaso Poggio, Lorenzo Rosasco, et al. ''Statistical Learning Theory and Applications'', 2012, Class 2 [http://www.mit.edu/~9.520/spring12/slides/class02/class02.pdf]</ref>
 
{{clear}}
 
==See also==
* [[Reproducing kernel Hilbert spaces]] are a useful choice for <math>\mathcal{H}</math>.
* [[Proximal gradient methods for learning]]
 
==References==
{{reflist}}
 
[[Category:Machine learning]]

Latest revision as of 08:36, 22 August 2014

The other day I woke up and realised - I've been Luke Bryan Concert Dates 2014 single for a little while today and following much intimidation from friends I now locate myself signed up for on line dating. They assured me that there are a lot of pleasant, regular and entertaining visitors to meet up, so here goes the pitch!
My buddies and household are awesome and hanging out together at bar gigs or meals is obviously a must. I have never been in to night clubs as I locate that one can never have a nice conversation with all the noise. I also have two quite cunning and definitely cheeky puppies that are consistently enthusiastic to luke bryan vip package meet fresh folks.
I attempt to maintain as toned as possible coming to the fitness center several-times per week. I appreciate my sports and attempt to perform or see because many a potential. I shall often at Hawthorn fits being wintertime. Note: I have noticed the carnage of wrestling fits luke bryan 2014 at stocktake revenue, Supposing that you will contemplated purchasing a hobby I really do not mind.



My site: luke bryan concert tickets cheap (Going in lukebryantickets.neodga.com)