Indecomposable distribution - Revision history

en>Michael Hardy: /* Related concepts */

2014-06-29T18:47:53Z

Related concepts

← Older revision		Revision as of 20:47, 29 June 2014
Line 1:		Line 1:
	~~{{no footnotes\|date=September 2013}}~~		Andrew Simcox is the name his parents gave him and he totally loves this title. Her family lives in Ohio. To climb is some thing I really appreciate doing. She functions as a travel agent but quickly she'll be on her own.<br><br>My blog; psychic solutions by lynne ([http://www.seekavideo.com/playlist/2199/video/ www.seekavideo.com])
	~~{{ref improve\|date=September 2013}}~~
	~~The '''cross-entropy (CE) method''' attributed to [[Reuven Rubinstein]]~~ is ~~a general [[Monte Carlo method\|Monte Carlo]] approach to~~
	~~[[Combinatorial optimization\|combinatorial]] and [[Continuous optimization\|continuous]] multi-extremal [[Optimization (mathematics)\|optimization]] and [[importance sampling]].~~
	~~The method originated from~~ the ~~field of ''rare event simulation'', where~~
	~~very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.~~
	~~The CE method can be applied to static~~ and noisy combinatorial optimization problems such as the [[traveling salesman problem]], the [[quadratic assignment problem]], [[Sequence_alignment\|DNA sequence alignment]], the [[Maxcut\|max-cut]] problem and the buffer allocation problem, as well as continuous [[global optimization]] problems with many local [[extremum\|extrema]].

	~~In a nutshell the CE method consists of two phases:~~

	~~#Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.~~
	~~#Update the parameters of the random mechanism based on the data to produce a "better" sample~~ in ~~the next iteration~~. ~~This step involves minimizing the [[cross entropy\|''cross-entropy'']] or [[Kullback–Leibler divergence]].~~

	~~==Estimation via importance sampling==~~
	Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ~~''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions~~. ~~Using [[importance sampling]] this quantity can be estimated~~ as ~~<math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is~~ a ~~random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically~~ '~~'optimal'' importance sampling [[probability density function\|density]] (pdf)is given by~~
	~~<math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends~~ on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence\|Kullback–Leibler]] sense) to the optimal PDF <math>g^</math>.

	~~==Generic CE algorithm==~~
	~~# Choose initial parameter vector~~ <~~math>\mathbf{v}^{(0)}</math>; set t = 1.~~
	~~# Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math~~>~~</p>~~
	~~# Solve for <math>\mathbf{v}^{(t)}</math>, where~~<br><math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{v}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i)\frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v})</math>
	~~# If convergence is reached then '''stop'''~~; ~~otherwise, increase t~~ by ~~1 and reiterate from step 2.~~

	~~In several cases, the solution to step 3 can be found ''analytically''. Situations in which this occurs are~~
	* When <math>f\,</math> belongs to the [[Exponential_family\|natural exponential family]]
	* When <math>f\,</math> is [[discrete space\|discrete]] with finite [[Support (mathematics)\|support]]
	* When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(~~t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the~~ [~~[Maximum likelihood\|maximum likelihood estimator]] based on those <math>\mathbf{X}_k \in A</math>.~~

	~~== Continuous optimization—example==~~
	~~The same CE algorithm can be used for optimization, rather than estimation.~~
	~~Suppose the problem is to maximize some function <math>S(x)</math>, for example,~~
	~~<math>S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}</math>.~~
	~~To apply CE, one considers first the ''associated stochastic problem'' of estimating~~
	~~<math>\mathbb{P}_{\boldsymbol{\theta}}(S(X)\geq\gamma)</math>~~
	~~for a given ''level'' <math>\gamma\,</math>, and parametric family <math>\left\{f(\cdot;\boldsymbol{\theta})\right\}</math>, for example the 1-dimensional~~
	~~[[Gaussian distribution]],~~
	~~parameterized by its mean <math>\mu_t\,</math> and variance <math>\sigma_t^2</math> (so <math>\boldsymbol{\theta} = (\mu,\sigma^2)</math> here).~~
	~~Hence, for a given <math>\gamma\,</math>, the goal is to find <math>\boldsymbol{\theta}</math> so that~~
	~~<math>D_{\mathrm{KL}}(\textrm{I}_{\{S(x)\geq\gamma\}}\\|f_{\boldsymbol{\theta}})</math>~~
	~~is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above.~~
	~~It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and~~
	~~parametric family are the sample mean and sample variance corresponding to the ''elite samples'', which are those samples that have objective function value <math>\geq\gamma</math>.~~
	~~The worst of the elite samples is then used as the level parameter for the next iteration.~~
	~~This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an [[estimation of distribution algorithm]].~~

	~~===Pseudo-code===~~
	~~1. mu:=-6; sigma2:=100; t~~:~~=0; maxits=100;~~ // ~~Initialize parameters~~
	~~2. N:=100; Ne:=10; //~~
	~~3. while t < maxits and sigma2 > epsilon // While maxits not exceeded and not converged~~
	~~4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution~~
	~~5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points~~
	~~6. X = sort(X,S); // Sort X by objective function values (in descending order)~~
	~~7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution~~
	8. ~~t = t+1; // Increment iteration counter~~
	9. ~~return mu~~ // ~~Return mean of final sampling distribution as solution~~

	~~==Related methods==~~
	*[[Simulated annealing]]
	*[[Genetic algorithms]]
	*[[Harmony search]]
	*[[Estimation of distribution algorithm]]
	*[[Tabu search]]

	~~==See also==~~
	*[[Cross entropy]]
	*[[Kullback–Leibler divergence]]
	*[[Randomized algorithm]]
	*[[Importance sampling]]

	~~==References==~~
	*De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', '''134''' (1), 19–67.[http://www.~~maths.uq.edu.au/~kroese/ps/aortut~~.~~pdf~~]
	*Rubinstein, R.Y. (1997). Optimization of Computer simulation Models with Rare Events, ''European Journal of Operations Research'', '''99''', 89–112.
	*Rubinstein, R.Y., Kroese, D.P. (2004)~~. ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York.~~

	~~==External links==~~
	*[http://iew3.technion.ac.il/CE/ Homepage for the CE method]

	~~[[Category:Heuristics]]~~
	~~[[Category:Optimization algorithms and methods]]~~
	~~[[Category:Monte Carlo methods]]~~
	~~[[Category:Machine learning]]~~

174.53.163.119: /* Indecomposable */ spacing

2011-08-19T17:47:16Z

Indecomposable: spacing

New page

{{no footnotes|date=September 2013}}
{{ref improve|date=September 2013}}
The '''cross-entropy (CE) method''' attributed to [[Reuven Rubinstein]] is a general [[Monte Carlo method|Monte Carlo]] approach to
[[Combinatorial optimization|combinatorial]] and [[Continuous optimization|continuous]] multi-extremal [[Optimization (mathematics)|optimization]] and [[importance sampling]].
The method originated from the field of ''rare event simulation'', where
very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.
The CE method can be applied to static and noisy combinatorial optimization problems such as the [[traveling salesman problem]], the [[quadratic assignment problem]], [[Sequence_alignment|DNA sequence alignment]], the [[Maxcut|max-cut]] problem and the buffer allocation problem, as well as continuous [[global optimization]] problems with many local [[extremum|extrema]].

In a nutshell the CE method consists of two phases:

#Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.
#Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the [[cross entropy|''cross-entropy'']] or [[Kullback–Leibler divergence]].

==Estimation via importance sampling==
Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function|density]] (pdf)is given by
<math> g^*(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence|Kullback–Leibler]] sense) to the optimal PDF <math>g^*</math>.

==Generic CE algorithm==
# Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1.
# Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math></p>
# Solve for <math>\mathbf{v}^{(t)}</math>, where<br><math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{v}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i)\frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v})</math>
# If convergence is reached then '''stop'''; otherwise, increase t by 1 and reiterate from step 2.

In several cases, the solution to step 3 can be found ''analytically''. Situations in which this occurs are
* When <math>f\,</math> belongs to the [[Exponential_family|natural exponential family]]
* When <math>f\,</math> is [[discrete space|discrete]] with finite [[Support (mathematics)|support]]
* When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the [[Maximum likelihood|maximum likelihood estimator]] based on those <math>\mathbf{X}_k \in A</math>.

== Continuous optimization—example==
The same CE algorithm can be used for optimization, rather than estimation.
Suppose the problem is to maximize some function <math>S(x)</math>, for example,
<math>S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}</math>.
To apply CE, one considers first the ''associated stochastic problem'' of estimating
<math>\mathbb{P}_{\boldsymbol{\theta}}(S(X)\geq\gamma)</math>
for a given ''level'' <math>\gamma\,</math>, and parametric family <math>\left\{f(\cdot;\boldsymbol{\theta})\right\}</math>, for example the 1-dimensional
[[Gaussian distribution]],
parameterized by its mean <math>\mu_t\,</math> and variance <math>\sigma_t^2</math> (so <math>\boldsymbol{\theta} = (\mu,\sigma^2)</math> here).
Hence, for a given <math>\gamma\,</math>, the goal is to find <math>\boldsymbol{\theta}</math> so that
<math>D_{\mathrm{KL}}(\textrm{I}_{\{S(x)\geq\gamma\}}\|f_{\boldsymbol{\theta}})</math>
is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above.
It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and
parametric family are the sample mean and sample variance corresponding to the ''elite samples'', which are those samples that have objective function value <math>\geq\gamma</math>.
The worst of the elite samples is then used as the level parameter for the next iteration.
This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an [[estimation of distribution algorithm]].

===Pseudo-code===
1. mu:=-6; sigma2:=100; t:=0; maxits=100; // Initialize parameters
2. N:=100; Ne:=10; //
3. while t < maxits and sigma2 > epsilon // While maxits not exceeded and not converged
4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution
5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points
6. X = sort(X,S); // Sort X by objective function values (in descending order)
7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution
8. t = t+1; // Increment iteration counter
9. return mu // Return mean of final sampling distribution as solution

==Related methods==
*[[Simulated annealing]]
*[[Genetic algorithms]]
*[[Harmony search]]
*[[Estimation of distribution algorithm]]
*[[Tabu search]]

==See also==
*[[Cross entropy]]
*[[Kullback–Leibler divergence]]
*[[Randomized algorithm]]
*[[Importance sampling]]

==References==
*De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', '''134''' (1), 19–67.[http://www.maths.uq.edu.au/~kroese/ps/aortut.pdf]
*Rubinstein, R.Y. (1997). Optimization of Computer simulation Models with Rare Events, ''European Journal of Operations Research'', '''99''', 89–112.
*Rubinstein, R.Y., Kroese, D.P. (2004). ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York.

==External links==
*[http://iew3.technion.ac.il/CE/ Homepage for the CE method]

[[Category:Heuristics]]
[[Category:Optimization algorithms and methods]]
[[Category:Monte Carlo methods]]
[[Category:Machine learning]]