Diagram Coding in data compression

One of the more common forms of static dictionary coding is diagram coding. In this form of coding, the dictionary consists of all letters of the source alphabet followed by as many pairs of letters, called diagrams, as can be accommodated by the dictionary. For example, suppose we were to construct a dictionary of size 256 for diagram coding of all printable ASCII characters. The first 95 entries of the dictionary would be the 95 printable ASCII characters. The remaining 161 entries would be the most frequently used pairs of characters

The diagram encoder reads a two-character input and searches the dictionary to see if this input exists in the dictionary. If it does, the corresponding index is encoded and transmitted. If it does not, the first character of the pair is encoded. The second character in the pair then becomes the first character of the next diagram. The encoder reads another character to complete the diagram, and the search procedure is repeated

Example

Suppose we have a source with a five-letter alphabet = b c d r. Based on knowledge about the source, we build the dictionary

CodeEntry CodeEntry
000a 100r
 
001b 101ab
010c 110ac
011d 111ad
     
A sample dictionary

Suppose we wish to encode the sequence

abracadabra

The encoder reads the first two characters ab and checks to see if this pair of letters exists in the dictionary. It does and is encoded using the codeword 101. The encoder then reads he next two characters ra and checks to see if this pair occurs in the dictionary. It does not, so the encoder sends out the code for r, which is 100, then reads in one more character, c, to make the two-character pattern ac. This does exist in the dictionary and is encoded as 110. Continuing in this fashion, the remainder of the sequence is coded. The output string for the given input sequence is 101100110111101100000

Leave a Comment