Hahn–Banach theorem: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Mgkrupa
No edit summary
Line 1: Line 1:
{{more footnotes|date=January 2011}}
Got nothing to tell about me I think.<br>Hurrey Im here and a member of wmflabs.org.<br>I just wish I'm useful in some way .<br><br>My homepage [http://hemorrhoidtreatmentfix.com/hemorrhoid-relief hemorrhoid relief]
[[Image:Huffman tree 2.svg|thumb|Huffman tree generated from the exact frequencies of the text "this is an example of a huffman tree". The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed to 288 bits if 36 characters of 8 bits were used. (This assumes that the code tree structure is known to the decoder and thus does not need to be counted as part of the transmitted information.)]]
{| class="wikitable sortable" style="float:right; clear:right;"
!Char!!Freq!!Code
|-
|space||7||111
|-
|a    ||4||010
|-
|e    ||4||000
|-
|f    ||3||1101
|-
|h    ||2||1010
|-
|i    ||2||1000
|-
|m    ||2||0111
|-
|n    ||2||0010
|-
|s    ||2||1011
|-
|t    ||2||0110
|-
|l    ||1||11001
|-
|o    ||1||00110
|-
|p    ||1||10011
|-
|r    ||1||11000
|-
|u    ||1||00111
|-
|x    ||1||10010
|}
In [[computer science]] and [[information theory]], '''Huffman coding''' is an [[entropy encoding]] [[algorithm]] used for [[lossless data compression]]. The term refers to the use of a [[variable-length code]] table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by [[David A. Huffman]] while he was a [[Doctor of Philosophy|Ph.D.]] student at [[Massachusetts Institute of Technology|MIT]], and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
 
Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a [[prefix code]] (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method ''of this type'': no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.{{citation needed|date=July 2013}} The running time of Huffman's method is fairly efficient, it takes <math> O(n \log n) </math> operations to construct it. A method was later found to design a Huffman code in [[linear time]] if input probabilities (also known as ''weights'') are sorted.<ref>Jan van Leeuwen, On the construction of Huffman trees, ICALP 1976, 382-410</ref>
 
For a set of symbols with a uniform probability distribution and a number of members which is a [[power of two]], Huffman coding is equivalent to simple binary [[Block code|block encoding]], e.g., [[ASCII]] coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.
 
Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbol-by-symbol restriction is dropped, or when the [[probability mass function]]s are unknown, not [[independent and identically-distributed random variables|identically distributed]], or not [[independence (probability theory)|independent]] (e.g., "cat" is more common than "cta").{{citation needed|date=July 2013}}  Other methods such as [[arithmetic coding]] and [[LZW]] coding often have better compression capability:  both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream.  However, the limitations of Huffman coding should not be overstated; it can be used adaptively, accommodating unknown, changing, or context-dependent probabilities.  In the case of known [[independent and identically distributed random variables]], combining symbols reduces inefficiency in a way that approaches optimality as the number of symbols combined increases.
 
== History ==
 
In 1951, [[David A. Huffman]] and his [[MIT]] [[information theory]] classmates were given the choice of a term paper or a final [[exam]]. The professor, [[Robert M. Fano]], assigned a [[term paper]] on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted [[binary tree]] and quickly proved this method the most efficient.<ref>see Ken Huffman (1991)</ref>
 
In doing so, the student outdid his professor, who had worked with [[information theory]] inventor [[Claude Shannon]] to develop a similar code.  By building the tree from the bottom up instead of the top down, Huffman avoided the major flaw of the suboptimal [[Shannon-Fano coding]].
 
== Problem definition ==
 
=== Informal description ===
;Given: A set of symbols and their weights (usually [[Proportionality (mathematics)|proportional]] to probabilities).
;Find: A [[Prefix code|prefix-free binary code]] (a set of codewords) with minimum [[Expected value|expected]] codeword length (equivalently, a tree with minimum [[weighted path length from the root]]).
 
=== Formalized description ===
'''Input'''.<br>
Alphabet <math>A = \left\{a_{1},a_{2},\cdots,a_{n}\right\}</math>, which is the symbol alphabet of size <math>n</math>. <br>
Set <math>W = \left\{w_{1},w_{2},\cdots,w_{n}\right\}</math>, which is the set of the (positive) symbol weights (usually proportional to probabilities), i.e. <math>w_{i} = \mathrm{weight}\left(a_{i}\right), 1\leq i \leq n</math>. <br>
<br>
'''Output'''.<br>
Code <math>C \left(A,W\right) = \left\{c_{1},c_{2},\cdots,c_{n}\right\}</math>, which is the set of (binary) codewords, where <math>c_{i}</math> is the codeword for <math>a_{i}, 1 \leq i \leq n</math>.<br>
<br>
'''Goal'''.<br>
Let <math>L\left(C\right) = \sum_{i=1}^{n}{w_{i}\times\mathrm{length}\left(c_{i}\right)}</math> be the weighted path length of code <math>C</math>. Condition: <math>L\left(C\right) \leq L\left(T\right)</math> for any code <math>T\left(A,W\right)</math>.
 
=== Samples ===
{|class="wikitable"
!rowspan="2" style="background:#efefef"| Input (''A'', ''W'')
!style="background:#efefef;font-weight:normal"| Symbol (''a''<sub>''i''</sub>)
|align="center" style="background:#efefef"| a
|align="center" style="background:#efefef"| b
|align="center" style="background:#efefef"| c
|align="center" style="background:#efefef"| d
|align="center" style="background:#efefef"| e
!style="background:#efefef"| Sum
|-
!style="background:#efefef;font-weight:normal"| Weights (''w''<sub>''i''</sub>)
|align="center"| 0.10
|align="center"| 0.15
|align="center"| 0.30
|align="center"| 0.16
|align="center"| 0.29
|align="center"| = 1
|-
!rowspan="3" style="background:#efefef"| Output ''C''
!style="background:#efefef;font-weight:normal"| Codewords (''c''<sub>''i''</sub>)
|align="center"| <tt>010</tt>
|align="center"| <tt>011</tt>
|align="center"| <tt>11</tt>
|align="center"| <tt>00</tt>
|align="center"| <tt>10</tt>
|rowspan="2"|&nbsp;
|-
!style="background:#efefef;font-weight:normal"| Codeword length (in bits)<br />(''l''<sub>''i''</sub>)
|align="center"| 3
|align="center"| 3
|align="center"| 2
|align="center"| 2
|align="center"| 2
|-
!style="background:#efefef;font-weight:normal"| Weighted path length<br />(''l''<sub>''i''</sub> ''w''<sub>''i''</sub> )
|align="center"| 0.30
|align="center"| 0.45
|align="center"| 0.60
|align="center"| 0.32
|align="center"| 0.58
|align="center"| ''L''(''C'') = 2.25
|-
!rowspan="3" style="background:#efefef"| Optimality
!style="background:#efefef;font-weight:normal"| Probability budget<br />(2<sup>-''l''<sub>''i''</sub></sup>)
| align="center" | 1/8
| align="center" | 1/8
| align="center" | 1/4
| align="center" | 1/4
| align="center" | 1/4
| align="center" | = 1.00
|-
! style="background: #efefef; font-weight: normal;" | Information content (in bits)<br />(−'''log'''<sub>2</sub> ''w''<sub>''i''</sub>) ≈
|align="center"| 3.32
|align="center"| 2.74
|align="center"| 1.74
|align="center"| 2.64
|align="center"| 1.79
|align="center"| &nbsp;
|-
! style="background: #efefef; font-weight: normal;" | Entropy<br />(−''w''<sub>''i''</sub> '''log'''<sub>2</sub> ''w''<sub>''i''</sub>)
|align="center"| 0.332
|align="center"| 0.411
|align="center"| 0.521
|align="center"| 0.423
|align="center"| 0.518
|align="center"| ''H''(''A'') = 2.205
|}
 
For any code that is ''biunique'', meaning that the code is ''uniquely decodeable'', the sum of the probability budgets across all symbols is always less than or equal to one. In this example, the sum is strictly equal to one; as a result, the code is termed a ''complete'' code. If this is not the case, you can always derive an equivalent code by adding extra symbols (with associated null probabilities), to make the code complete while keeping it ''biunique''.
 
As defined by [[A Mathematical Theory of Communication|Shannon (1948)]], the information content ''h'' (in bits) of each symbol ''a''<sub>i</sub> with non-null probability is
 
:<math>h(a_i) = \log_2{1 \over w_i}. </math>
 
The [[information entropy|entropy]] ''H'' (in bits) is the weighted sum, across all symbols ''a''<sub>''i''</sub> with non-zero probability ''w''<sub>''i''</sub>, of the information content of each symbol:
 
:<math> H(A) = \sum_{w_i > 0} w_i h(a_i) = \sum_{w_i > 0} w_i \log_2{1 \over w_i} = - \sum_{w_i > 0} w_i \log_2{w_i}. </math>
 
(Note: A symbol with zero probability has zero contribution to the entropy, since <math>\lim_{w \to 0^+} w \log_2 w = 0</math> So for simplicity, symbols with zero probability can be left out of the formula above.)
 
As a consequence of [[Shannon's source coding theorem]], the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights. In this example, the weighted average codeword length is 2.25 bits per symbol, only slightly larger than the calculated entropy of 2.205 bits per symbol. So not only is this code optimal in the sense that no other feasible code performs better, but it is very close to the theoretical limit established by Shannon.
 
Note that, in general, a Huffman code need not be unique, but it is always one of the codes minimizing <math>L(C)</math>.
 
== Basic technique ==
 
===Compression===
[[Image:Huffman coding example.svg|thumb|A source generates 4 different symbols <math>\{a_1 , a_2 , a_3 , a_4 \}</math> with probability <math>\{0.4 ; 0.35 ; 0.2 ; 0.05 \}</math>. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is:
{|class="wikitable"
! Symbol !! Code
|-
|a1 || 0
|-
|a2 || 10
|-
|a3 || 110
|-
|a4 || 111
|-
|}
The standard way to represent a signal made of 4 symbols is by using 2 bits/symbol, but the [[Information entropy|entropy]] of the source is 1.74 bits/symbol. If this Huffman code is used to represent the signal, then the average length is lowered to 1.85 bits/symbol; it is still far from the theoretical limit because the probabilities of the symbols are different from negative powers of two.]]
 
The technique works by creating a [[binary tree]] of nodes. These can be stored in a regular [[Array data type|array]], the size of which depends on the number of symbols, <math>n</math>. A node can be either a [[leaf node]] or an [[internal node]]. Initially, all nodes are leaf nodes, which contain the '''symbol''' itself, the '''weight''' (frequency of appearance) of the symbol and optionally, a link to a '''parent''' node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol '''weight''', links to '''two child nodes''' and the optional link to a '''parent''' node. As a common convention, bit '0' represents following the left child and bit '1' represents following the right child. A finished tree has up to <math>n</math> leaf nodes and <math>n-1</math> internal nodes. A Huffman tree that omits unused symbols produces optimal code lengths.
 
The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent, then a new node whose children are the 2 nodes with smallest probability is created, such that the new node's probability is equal to the sum of the children's probability. With the previous 2 nodes merged into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree.
 
The simplest construction algorithm uses a [[priority queue]] where the node with lowest probability is given highest priority:
 
# Create a leaf node for each symbol and add it to the priority queue.
# While there is more than one node in the queue:
## Remove the two nodes of highest priority (lowest probability) from the queue
## Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
## Add the new node to the queue.
# The remaining node is the root node and the tree is complete.
 
Since efficient priority queue data structures require O(log ''n'') time per insertion, and a tree with ''n'' leaves has 2''n''−1 nodes, this algorithm operates in O(''n'' log ''n'') time, where ''n'' is the number of symbols.
 
If the symbols are sorted by probability, there is a [[linear-time]] (O(''n'')) method to create a Huffman tree using two [[Queue (data structure)|queues]], the first one containing the initial weights (along with pointers to the associated leaves), and combined weights (along with pointers to the trees) being put in the back of the second queue. This assures that the lowest weight is always kept at the front of one of the two queues:
 
#Start with as many leaves as there are symbols.
#Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue).
#While there is more than one node in the queues:
##Dequeue the two nodes with the lowest weight by examining the fronts of both queues.
##Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight.
##Enqueue the new node into the rear of the second queue.
#The remaining node is the root node; the tree has now been generated.
 
Although this algorithm may appear "faster" complexity-wise than the previous algorithm using a priority queue, this is not actually the case because the symbols need to be sorted by probability before-hand, a process that takes O(''n'' log ''n'') time in itself.
 
In many cases, time complexity is not very important in the choice of algorithm here, since ''n'' here is the number of symbols in the alphabet, which is typically a very small number (compared to the length of the message to be encoded); whereas complexity analysis concerns the behavior when ''n'' grows to be very large.
 
It is generally beneficial to minimize the variance of codeword length. For example, a communication buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the tree is especially unbalanced. To minimize variance, simply break ties between queues by choosing the item in the first queue.  This modification will retain the mathematical optimality of the Huffman coding while both minimizing variance and minimizing the length of the longest character code.
 
Here's an example of optimized Huffman coding using the French subject string "j'aime aller sur le bord de l'eau les jeudis ou les jours impairs". Note that original Huffman coding tree structure would be different from the given example:
 
[[Image:Huffman huff demo.gif|center]]
 
===Decompression===
Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Huffman tree must be somehow reconstructed. In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression efficiency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. If the data is compressed using [[canonical Huffman code|canonical encoding]], the compression model can be precisely reconstructed with just <math>B2^B</math> bits of information (where <math>B</math> is the number of bits per symbol). Another method is to simply prepend the Huffman tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the Huffman tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet). Many other techniques are possible as well. In any case, since the compressed data can include unused "trailing bits" the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by defining a special code symbol to signify the end of input (the latter method can adversely affect code length optimality, however).
 
== Main properties ==
The probabilities used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed.
This requires that a [[frequency table]] must be stored with the compressed text. See the Decompression section above for more information about the various techniques employed for this purpose.
 
Huffman coding is optimal when the probability of each input symbol is the inverse of a power of two. Prefix codes tend to have inefficiency on small alphabets, where probabilities often fall between these optimal points. "Blocking", or expanding the alphabet size by grouping multiple symbols into "words" of fixed or variable-length before Huffman coding helps both to reduce that inefficiency and to take advantage of statistical dependencies between input symbols within the group (as in the case of natural language text). The worst case for Huffman coding can happen when the probability of a symbol exceeds 2<sup>−1</sup> = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called [[run-length encoding]]; for the simple case of [[Bernoulli process]]es, [[Golomb coding]] is a provably optimal run-length code.
 
[[Arithmetic coding]] produces some gains over Huffman coding, although arithmetic coding has higher computational complexity. Also, arithmetic coding was historically a subject of some concern over [[patent]] issues. However, as of mid-2010, various well-known effective techniques for arithmetic coding have passed into the public domain as the early patents have expired.
 
== Variations ==
Many variations of Huffman coding exist, some of which use a Huffman-like algorithm, and others of which find optimal prefix codes (while, for example, putting different restrictions on the output). Note that, in the latter case, the method need not be Huffman-like, and, indeed, need not even be [[polynomial time]]. An exhaustive list of papers on Huffman coding and its variations is given by "Code and Parse Trees for Lossless Source Encoding"[http://scholar.google.com/scholar?hl=en&lr=&cluster=6556734736002074338].
 
=== ''n''-ary Huffman coding ===
The '''''n''-ary Huffman''' algorithm uses the {0, 1, ... , ''n'' − 1} alphabet to encode message and build an ''n''-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (''n'' equals 2) codes, except that the ''n'' least probable symbols are taken together, instead of just the 2 least probable. Note that for ''n'' greater than 2, not all sets of source words can properly form an ''n''-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added. This is because the tree must form an ''n'' to 1 contractor; for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractor.  If the number of source words is congruent to 1 modulo ''n''-1, then the set of source words will form a proper Huffman tree.
 
=== Adaptive Huffman coding ===
A variation called '''[[adaptive Huffman coding]]''' involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols, and changing the coding tree structure to match the updated probability estimates. It is used rarely in practice, since the cost of updating the tree makes it slower than optimized [[Arithmetic_coding#Adaptive_arithmetic_coding|adaptive arithmetic coding]], that is more flexible and has a better compression.
 
=== Huffman template algorithm ===
Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only that the weights form a [[total order|totally ordered]] [[Monoid#Commutative monoid|commutative monoid]], meaning a way to order weights and to add them. The '''Huffman template algorithm''' enables one to use any kind of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many combining methods (not just addition). Such algorithms can solve other minimization problems, such as minimizing <math>\max_i\left[w_{i}+\mathrm{length}\left(c_{i}\right)\right]</math>, a problem first applied to circuit design.
 
=== Length-limited Huffman coding/minimum variance huffman coding ===
'''Length-limited Huffman coding''' is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The [[package-merge algorithm]] solves this problem with a simple [[Greedy algorithm|greedy]] approach very similar to that used by Huffman's algorithm. Its time complexity is <math>O(nL)</math>, where <math>L</math> is the maximum length of a codeword. No algorithm is known to solve this problem in [[Big O notation#Orders of common functions|linear or linearithmic]] time, unlike the presorted and unsorted conventional Huffman problems, respectively.
 
=== Huffman coding with unequal letter costs ===
In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is ''N'' digits will always have a cost of ''N'', no matter how many of those digits are 0s, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing.
 
''Huffman coding with unequal letter costs'' is the generalization without this assumption: the letters of the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An example is the encoding alphabet of [[Morse code]], where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in the same manner or with the same efficiency as conventional Huffman coding.
 
=== Optimal alphabetic binary trees (Hu-Tucker coding) ===
In the standard Huffman coding problem, it is assumed that any codeword can correspond to any input symbol. In the alphabetic version, the alphabetic order of inputs and outputs must be identical. Thus, for example, <math>A = \left\{a,b,c\right\}</math> could not be assigned code <math>H\left(A,C\right) = \left\{00,1,01\right\}</math>, but instead should be assigned either <math>H\left(A,C\right) =\left\{00,01,1\right\}</math> or <math>H\left(A,C\right) = \left\{0,10,11\right\}</math>. This is also known as the '''Hu-Tucker''' problem, after the authors of the paper presenting the first [[linearithmic]] solution to this optimal binary alphabetic problem,<ref>T.C. Hu and A.C. Tucker, ''Optimal computer search trees and variable length alphabetical codes'', Journal of SIAM on Applied Mathematics, vol. 21, no. 4, December 1971, pp. 514-532.</ref> which has some similarities to Huffman algorithm, but is not a variation of this algorithm. These optimal alphabetic binary trees are often used as [[binary search tree]]s.
 
=== The canonical Huffman code ===
 
If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman code has the same lengths as the optimal alphabetic code, which can be found from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting from numerically (re-)ordered input is sometimes called the ''[[canonical Huffman code]]'' and is often the code used in practice, due to ease of encoding/decoding. The technique for finding this code is sometimes called '''Huffman-Shannon-Fano coding''', since it is optimal like Huffman coding, but alphabetic in weight probability, like [[Shannon-Fano coding]]. The Huffman-Shannon-Fano code corresponding to the example is <math>\{000,001,01,10,11\}</math>, which, having the same codeword lengths as the original solution, is also optimal.
 
== Applications ==
[[Arithmetic coding]] can be viewed as a generalization of Huffman coding, in the sense that they produce the same output when every symbol has a probability of the form 1/2<sup>''k''</sup>; in particular it tends to offer significantly better compression for small alphabet sizes. Huffman coding nevertheless remains in wide use because of its simplicity and high speed. Intuitively, arithmetic coding can offer better compression than Huffman coding because its "code words" can have effectively non-integer bit lengths, whereas code words in Huffman coding can only have an integer number of bits. Therefore, there is an inefficiency in Huffman coding where a code word of length ''k'' only optimally matches a symbol of probability 1/2<sup>''k''</sup> and other probabilities are not represented as optimally; whereas the code word length in arithmetic coding can be made to exactly match the true probability of the symbol.
 
Huffman coding today is often used as a "back-end" to some other compression methods.
[[DEFLATE (algorithm)|DEFLATE]] ([[PKZIP]]'s algorithm) and multimedia [[codec]]s such as [[JPEG]] and [[MP3]] have a front-end model and [[quantization (signal processing)|quantization]] followed by Huffman coding (or variable-length prefix-free codes with a similar structure, although perhaps not necessarily designed by using Huffman's algorithm{{clarify|date=February 2012}}).
 
==See also==
*[[Adaptive Huffman coding]]
*[[Canonical Huffman code]]
*[[Data compression]]
*[[Huffyuv]]
*[[Lempel–Ziv–Welch]]
*[[Modified Huffman coding]] - used in [[fax machines]]
*[[Shannon-Fano coding]]
*[[Varicode]]
 
== Notes ==
{{Reflist}}
 
== References==
* For Java Implementation see: [https://github.com/Glank/Huffman-Compression GitHub:Glank]
* D.A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proceedings of the I.R.E., September 1952, pp 1098–1102. Huffman's original article.
* Ken Huffman. [http://www.huffmancoding.com/my-uncle/scientific-american Profile: David A. Huffman], [[Scientific American]], September 1991, pp.&nbsp;54–58
* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 16.3, pp.&nbsp;385–392.
 
== External links ==
{{external links|date=January 2014}}
{{Commons category|Huffman coding}}
* [http://scanftree.com/Data_Structure/huffman-code Huffman Coding with c Algorithm ]
* [http://demo.tinyray.com/huffman Huffman Encoding process animation]
* [http://www.cs.pitt.edu/~kirk/cs1501/animations/Huffman.html Huffman Encoding & Decoding Animation]
* [http://alexvn.freeservers.com/s1/huffman_template_algorithm.html n-ary Huffman Template Algorithm]
* [http://huffman.ooz.ie/ Huffman Tree visual graph generator]
* [http://www.research.att.com/projects/OEIS?Anum=A098950 Sloane A098950] Minimizing k-ordered sequences of maximum height Huffman tree
* [http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html A quick tutorial on generating a Huffman tree]
* Pointers to [http://web-cat.cs.vt.edu/AlgovizWiki/HuffmanCodingTrees Huffman coding visualizations]
* [http://rosettacode.org/wiki/Huffman_codes Explanation of Huffman coding with examples in several languages]
* [http://www.hightechdreams.com/weaver.php?topic=huffmancoding Interactive Huffman Tree Construction]
* [http://github.com/elijahbal/huffman-coding/ A C program doing basic Huffman coding on binary and text files]
* [http://www.reznik.org/software.html#ABC Efficient implementation of Huffman codes for blocks of binary sequences]
{{Compression Methods}}
 
{{DEFAULTSORT:Huffman Coding}}
[[Category:1952 in computer science]]
[[Category:Lossless compression algorithms]]
[[Category:Binary trees]]

Revision as of 17:18, 19 February 2014

Got nothing to tell about me I think.
Hurrey Im here and a member of wmflabs.org.
I just wish I'm useful in some way .

My homepage hemorrhoid relief