Critical speed: Difference between revisions

Latest revision as of 16:04, 13 November 2013

The softmax activation function is a neural transfer function. In neural networks, transfer functions calculate a layer's output from its net input.

On the other hand, a paper by Cadieu et al. argues that it is a biologically plausible approximation to the maximum operation.^[1] The authors have used it to simulate an invariance operation of complex cells ^[2] where they defined it as

y=g\left({\frac {\sum _{j=1}^{n}x_{j}^{q+1}}{k+\left(\sum _{j=1}^{n}x_{j}^{q}\right)}}\right){\text{,}}

where g is a sigmoid function, x represents the value of input nodes, k a small constant to avoid division by zero, and the exponent q a parameter to control the non-linearity.

Artificial neural networks

In neural network simulations, the term softmax activation function refers to a similar function defined by^[3]

\sigma \colon \mathbb {R} ^{n}\times \mathbb {N} \to \mathbb {R}

\sigma ({\textbf {q}},i)={\frac {\exp(q_{i})}{\sum _{j=1}^{n}\exp(q_{j})}}{\text{,}}

where the vector q is the net input to a softmax node, and n is the number of nodes in the softmax layer. It ensures all of the output values are between 0 and 1, and that their sum is 1. This is a generalization of the logistic function to multiple variables.

Since the function maps a vector and a specific index i to a real value, the derivative needs to take the index into account:

{\frac {\partial }{\partial q_{k}}}\sigma ({\textbf {q}},i)=\dots =\sigma ({\textbf {q}},i)(\delta _{ik}-\sigma ({\textbf {q}},k))

Here, the Kronecker delta is used for simplicity (cf. the derivative of a sigmoid function, being expressed via the function itself).

See Multinomial logit for a probability model which uses the softmax activation function.

Reinforcement learning

In the field of reinforcement learning, a softmax function can be used to convert values into action probabilities. The function commonly used is:^[4]

P_{t}(a)={\frac {\exp(q_{t}(a)/\tau )}{\sum _{i=1}^{n}\exp(q_{t}(i)/\tau )}}{\text{,}}

where the action value $q_{t}(a)$ corresponds to the expected reward of following action a and $\tau$ is called a temperature parameter (in allusion to chemical kinetics). For high temperatures ( $\tau \to \infty$ ), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature ( $\tau \to 0^{+}$ ), the probability of the action with the highest expected reward tends to 1.

Smooth approximation of maximum

When parameterized by some constant, $\alpha >0$ , the following formulation becomes a smooth, differentiable approximation of the maximum function:

{\mathcal {S}}_{\alpha }\left(\left\{x_{i}\right\}_{i=1}^{n}\right)={\frac {\sum _{i=1}^{n}x_{i}e^{\alpha x_{i}}}{\sum _{i=1}^{n}e^{\alpha x_{i}}}}

${\mathcal {S}}_{\alpha }$ has the following properties:

${\mathcal {S}}_{\alpha }\to \max$ as $\alpha \to \infty$
${\mathcal {S}}_{0}$ is the average of its inputs
${\mathcal {S}}_{\alpha }\to \min$ as $\alpha \to -\infty$

The gradient of softmax is given by:

\nabla _{x_{i}}{\mathcal {S}}_{\alpha }\left(\left\{x_{i}\right\}_{i=1}^{n}\right)={\frac {e^{\alpha x_{i}}}{\sum _{j=1}^{n}e^{\alpha x_{j}}}}\left[1+\alpha \left(x_{i}-{\mathcal {S}}_{\alpha }\left(\left\{x_{i}\right\}_{i=1}^{n}\right)\right)\right]{\text{,}}

which makes the softmax function useful for optimization techniques that use gradient descent.

Softmax transformation

Template:Section OR The softmax function is also used to standardize data which is positively skewed and includes many values around zero.Potter or Ceramic Artist Truman Bedell from Rexton, has interests which include ceramics, best property developers in singapore developers in singapore and scrabble. Was especially enthused after visiting Alejandro de Humboldt National Park. It will take a variable such as revenue or age and transform the values to a scale from zero to one.^[5] This type of data transformation is needed especially when the data spans many magnitudes.

For example, revenues for customers could span anywhere from 0 to 300.000. Let's say we have a range of revenue numbers between 3 and 300.000. If these numbers are expressed in powers of 10, then 3 becomes $3\times 10^{0}$ and 300.000 becomes $3\times 10^{5}$ . The number 10 when raised to the power 0 becomes 1, so these two numbers expressed as powers of 10 span 5 orders of magnitude. Range scaling is a typical reason to use a function such as softmax.Potter or Ceramic Artist Truman Bedell from Rexton, has interests which include ceramics, best property developers in singapore developers in singapore and scrabble. Was especially enthused after visiting Alejandro de Humboldt National Park.

References

↑ Cadieu C, Kouh M, Pasupathy A, Conner CE, Riesenhuber M, and Poggio T. A Model of V4 Shape Selectivity and Invariance. J Neurophysiol 98: 1733-1750, 2007.
↑ Serre T, Kouh M, Cadieu C, Knoblich U, Kreiman G, and Poggio T. A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. CBCL Paper 259/AI Memo 2005-036. Cambridge, MA: MIT, 2005.
↑ ai-faq What is a softmax activation function?
↑ Sutton, R. S. and Barto A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998.Softmax Action Selection
↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

External links

description of softmax as a substitute for maximum by John D. Cook

[1] Cadieu C, Kouh M, Pasupathy A, Conner CE, Riesenhuber M, and Poggio T. A Model of V4 Shape Selectivity and Invariance. J Neurophysiol 98: 1733-1750, 2007.

[Serre2005-2] Serre T, Kouh M, Cadieu C, Knoblich U, Kreiman G, and Poggio T. A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. CBCL Paper 259/AI Memo 2005-036. Cambridge, MA: MIT, 2005.

[3] -faq What is a softmax activation function?

[4] Sutton, R. S. and Barto A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998.Softmax Action Selection

[5] 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

[1]

[2]

[3]

[4]

[5]

@@ Line 1: / Line 1: @@
-Hi there. Let me start by introducing the writer, her title is Sophia Boon but she by no means really liked that title. Her family members life in Ohio but her husband desires them to transfer. My working day occupation is a journey agent. I am really fond of handwriting but I can't make it my occupation really.<br><br>Also visit my webpage :: [http://c045.danah.co.kr/home/index.php?document_srl=1356970&mid=qna cheap psychic readings]
+{{expert-subject|date=January 2014}}
+The '''softmax activation function''' is a neural [[transfer function]]. In [[neural network]]s, transfer functions calculate a layer's output from its net input.
+On the other hand, a paper by Cadieu et al. argues that it is a biologically plausible approximation to the [[Maxima and minima|maximum]] operation.<ref>Cadieu C, Kouh M, Pasupathy A, Conner CE, Riesenhuber M, and Poggio T. A Model of V4 Shape Selectivity and Invariance. ''J Neurophysiol'' 98: 1733-1750, 2007.</ref>  The authors have used it to simulate an invariance operation of [[complex cells]] <ref name="Serre2005">Serre T, Kouh M, Cadieu C, Knoblich U, Kreiman G, and Poggio T. [http://people.csail.mit.edu/knoblich/papers/MIT-CSAIL-TR-2005-082.pdf A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex.] ''CBCL Paper 259/AI Memo 2005-036.'' Cambridge, MA: MIT, 2005.</ref> where they defined it as
+:<math>
+y=g \left(
+\frac{\sum_{j=1}^n x_j^{q+1}}
+{k+\left( \sum_{j=1}^n x_j^q \right)}
+\right) \text{,}
+</math>
+where ''g'' is a [[sigmoid function]], ''x'' represents the value of input nodes, ''k'' a small constant to avoid division by zero, and the exponent ''q'' a parameter to control the non-linearity.
+== Artificial neural networks ==
+In neural network simulations, the term '''softmax activation function''' refers to a similar function defined by<ref>ai-faq [http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-12.html What is a softmax activation function?]</ref>
+:<math> \sigma \colon \mathbb{R}^n\times\mathbb{N} \to \mathbb{R} </math>
+:<math> \sigma(\textbf{q}, i) = \frac{\exp(q_i)}{\sum_{j=1}^n\exp(q_j)} \text{,} </math>
+where the vector ''q'' is the net input to a softmax node, and ''n'' is the number of nodes in the softmax layer. It ensures all of the output values are between 0 and 1, and that their sum is 1. This is a generalization of the [[logistic function]] to multiple variables.
+Since the function maps a vector and a specific index ''i'' to a real value, the derivative needs to take the index into account:
+:<math> \frac{\partial}{\partial q_k}\sigma(\textbf{q}, i) = \dots =  \sigma(\textbf{q}, i)(\delta_{ik} - \sigma(\textbf{q}, k))</math>
+Here, the [[Kronecker delta]] is used for simplicity (cf. the derivative of a [[sigmoid function]], being expressed via the function itself).
+See [[Multinomial logit]] for a probability model which uses the softmax activation function.
+==Reinforcement learning==
+In the field of [[reinforcement learning]], a softmax function can be used to convert values into action probabilities. The function commonly used is:<ref>Sutton, R. S. and Barto A. G. ''Reinforcement Learning: An Introduction''. The MIT Press, Cambridge, MA, 1998.[http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node17.html Softmax Action Selection]</ref>
+:<math>
+P_t(a) = \frac{\exp(q_t(a)/\tau)}{\sum_{i=1}^n\exp(q_t(i)/\tau)} \text{,}
+</math>
+where the action value <math>q_t(a)</math> corresponds to the expected reward of following action a and <math>\tau</math> is called a temperature parameter (in allusion to [[chemical kinetics]]). For high temperatures (<math>\tau\to \infty</math>), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature (<math>\tau\to 0^+</math>), the probability of the action with the highest expected reward tends to 1.
+==Smooth approximation of maximum==
+When parameterized by some constant, <math>\alpha > 0</math>, the following formulation becomes a smooth, differentiable approximation of the maximum function:
+:<math>
+\mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right) = \frac{\sum_{i=1}^{n}x_i e^{\alpha x_i}}{\sum_{i=1}^{n}e^{\alpha x_i}}
+</math>
+<math>\mathcal{S}_{\alpha}</math> has the following properties:
+#<math>\mathcal{S}_{\alpha}\to \max</math> as <math>\alpha\to\infty</math>
+#<math>\mathcal{S}_{0}</math> is the average of its inputs
+#<math>\mathcal{S}_{\alpha}\to \min</math> as <math>\alpha\to -\infty</math>
+The gradient of softmax is given by:
+:<math>
+\nabla_{x_i}\mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right) = \frac{e^{\alpha x_i}}{\sum_{j=1}^{n}e^{\alpha x_j}}\left[1 + \alpha\left(x_i - \mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right)\right)\right] \text{,}
+</math>
+which makes the softmax function useful for optimization techniques that use gradient descent.
+==Softmax transformation==
+{{Section OR|date=June 2013}}
+The softmax function is also used to standardize data which is positively skewed and includes many values around zero.{{Citation needed| date=June 2013}} It will take a variable such as revenue or age and transform the values to a scale from zero to one.<ref>{{cite book|title=Data Preparation for Data Mining|last=Pyle|year=1999|pages=271–274, 355–359}}</ref> This type of data transformation is needed especially when the data spans many magnitudes.
+For example, revenues for customers could span anywhere from 0 to 300.000. Let's say we have a range of revenue numbers between 3 and 300.000. If these numbers are expressed in powers of 10, then 3 becomes <math>3 \times 10^0</math>  and 300.000 becomes <math>3 \times 10^5</math>. The number 10 when raised to the power 0 becomes 1, so these two numbers expressed as powers of 10 span 5 orders of magnitude. Range scaling is a typical reason to use a function such as softmax.{{Citation needed| date=June 2013}}
+==References==
+<references/>
+==External links==
+* [http://www.johndcook.com/blog/2010/01/13/soft-maximum/ description of softmax as a substitute for maximum by John D. Cook]
+{{DEFAULTSORT:Softmax Activation Function}}
+[[Category:Computational neuroscience]]
+[[Category:Log-linear models]]
+[[Category:Neural networks]]

Critical speed: Difference between revisions

Latest revision as of 16:04, 13 November 2013

Contents

Artificial neural networks

Reinforcement learning

Smooth approximation of maximum

Softmax transformation

References

External links

Navigation menu

Critical speed: Difference between revisions

Latest revision as of 16:04, 13 November 2013

Artificial neural networks

Reinforcement learning

Smooth approximation of maximum

Softmax transformation

References

External links

Navigation menu

Search