Huffman coding is an entropy encoding algorithm developed by David A. Huffman that is widely used as a lossless data compression technique. The Huffman coding algorithm uses a variable length code table to encode a source character where the variable-length code table is derived on the basis of the estimated probability of occurrence of the source character
The key idea behind Huffman algorithm is that it encodes the most common characters using shorter strings of bits than those used for less common source characters.
The algorithm works by creating a binary tree of nodes that are stored in an array. A node can be either a leaf node or an internal node. Initially, all the nodes in the tree are at the leaf level and store the source character and its frequency of occurrence (also known as weight).
While the internal node is used to store the weight and contains links to its child nodes, the external node contains the actual character. Conventionally, a ‘0’ represents following the left child and a ‘1’ represents following the right child. A finished tree that has n leaf nodes will have n – 1 internal nodes
The running time of the algorithm depends on the length of the paths in the tree. So, before going into further details of Huffman coding, let us first learn how to calculate the length of the paths in the tree. The external path length of a binary tree is defined as the sum of all path lengths summed over each path from the root to an external node. The internal path length is also defined in the same manner. The internal path length of a binary tree is defined as the sum of all path lengths summed over each path from the root to an internal node. Look at the binary tree given in Fig. 9.22.
The internal path length, Li = 0 + 1 + 2 + 1 + 2 + 3 + 3 = 12
The external path length, LE = 2 + 3 + 3 + 2 + 4 + 4 + 4 + 4 = 26
Note that, LI + 2 * n = 12 + 2 * 7 = 12 + 14 = 26 = LE
Thus, Li + 2n = LE , where n is the number of internal nodes. Now if the tree has n external nodes and each external node is assigned a weight, then the weighted path length P is defined as the sum of the weighted path lengths
Therefore, P = W1 L1 + W2 L2 + …. + Wn Ln
where Wi and Li are the weight and path length of an external node Ni
Example Consider the trees T1 , T2 , and T3 given below, calculate their weighted external path lengths.
Weighted external path length of T1 can be given as,
P1 = 2.3 + 3.3 + 5.2 + 11.3 + 5.3 + 2.2 = 6 + 9 + 10 + 33 + 15 + 4 = 77
Weighted external path length of T2 can be given as
P2 = 5.2 + 7.2 + 3.3 + 4.3 + 2.2 = 10 + 14 + 9 + 12 + 4 = 49
Weighted external path length of T3 can be given as,
P3 = 2.3 + 3.3 + 5.2 + 11.1 = 6 + 9 + 10 + 11 = 36