next up previous
Next: References Up: A brief introduction Previous: A brief introduction

Applying Information Theory

Consider Methanococcus Janoschii, an organism which lives in hot vents in the ocean. Life in these conditions is tough, and it is not surprising that this organism should have some unusual features - for instance, look at its highly non-uniform base composition: the distribution of {A,C,G,T} is . Comparing this with a uniform distribution, the relative entropy is 0.103 bits per base pair (or about 10 bits per 100 bp). To interpret this: a string of 100 bases in M. Janoschii is about times more probable under the correct model than the same sequence under the uniform model.

Now look at the relative entropy of the frame-dependent base composition model (call it r) with respect to the average frequency model (distribution p above). bits per codon, which means that a run of 20 codons under the frame dependent model is about times more likely under the correct model than under p.

Similarly the relative entropy of the independent codon model with respect to the frame independent model is about 0.22 bits per codon, and a stationary first-order Markov model (for codons) with respect to the independent codon model has a relative entropy of about 0.12 bits per codon. So basically we have three levels of improvement, each of them significant. These figures are taken from draft notes of Phil Green's course ``Genome Sequence Analysis'' which were available at one time at http://www.genome.washington.edu/MBT599C but as of (05-01-98) are not available.



Simon Cawley
Fri May 1 15:50:13 PDT 1998