Text Compression in data compression

Text compression seems natural for Huffman coding. For example, the probability model for a particular novel will not differ significantly from the probability model for another novel. Similarly, the probability model for a set of FORTRAN programs is not going to be much different than the probability model for a different set of FORTRAN programs. The probabilities in Table 3.26 are the probabilities of the 26 letters (upper- and lowercase) obtained for the U.S. Constitution and are representative of English text. The probabilities in Table 3.27 were obtained by counting the frequency of occurrences of letters in an earlier version of this chapter. While the two documents are substantially different, the two sets of probabilities are very much alike.

T A B L E 3 . 26

 LetterProbability LetterProbability
         
 A 0057305  N0.056035 
    
 B 0014876  O0.058215 
 C 0025775  P0.021034 
 D 0026811  Q0.000973 
 E 0112578  R0.048819 
 F 0022875  S0.060289 
 G 0009523  T0.078085 
 H 0042915  U0.018474 
 I 0053475  V0.009882 
 J 0002031  W0.007576 
 K 0001016  X0.002264 
 L 0031403  Y0.011702 
 M 0015892  Z0.001502 
     
  TABLE 3. 27  
 
  
 LetterProbability LetterProbability
        
 A0049855  N0.048039 
    
 B0016100  O0.050642 
 C0025835  P0.015007 
 D0030232  Q0.001509 
 E0097434  R0.040492 
 F0019754  S0.042657 
 G0012053  T0.061142 
 H0035723  U0.015794 
 I0048783  V0.004988 
 J0000394  W0.012207 
 K0002450  X0.003413 
 L0025835  Y0.008466 
 M0016494  Z0.001050 
         

Leave a Comment