Tunstall Codes in data compression

Most of the variable-length codes that we look at in this book encode letters from the source alphabet using codewords with varying numbers of bits: codewords with fewer bits for letters that occur more frequently and codewords with more bits for letters that occur less frequently. The Tunstall code is an important exception. In the Tunstall code, all codewords are of equal length. However, each codeword represents a different number of letters.

An example of a 2-bit Tunstall code for an alphabet = B is shown in Table 3.18. The main advantage of a Tunstall code is that errors in codewords do not propagate, unlike other variable-length codes, such as Huffman codes, in which an error in one codeword will cause a series of errors to occur.

T A B L E 3 . 18A 2-bit Tunstall code.
SequenceCodeword
AAA00
AAB01
AB10
B11
T A B L E 3 . 19A 2-bit (non-Tunstall) code
SequenceCodeword
AAA00
ABA01
AB10
B11

The design of a code that has a fixed codeword length but a variable number of symbols per codeword should satisfy the following conditions:

  • We should be able to parse a source output sequence into sequences of symbols that appear in the codebook.
  • We should maximize the average number of source symbols represented by each codeword.

Leave a Comment