next up previous
Next: The Problem of Up: Substitution Models Previous: Substitution Models

Background

Understanding changes in nucleotides over time is essential for understanding the evolution of DNA sequences as well as estimating the rate of evolution and reconstructing the evolutionary history of organisms (Li 1997). To study this process, we must make assumptions regarding the probability of substitution of one nucleotide by another. A variety of models have been proposed (Jukes and Cantor 1969; Kimura 1980, 81; Hasegawa et al. 1985; Tamura and Nei 1993).

For DNA sequences, rate of substitution per site can be expressed as a 4 x 4 instantaneous rate matrix, Q, in which each element Qij represents the rate of change from base i to base j during some infinitesimal time period dt. Models may be based on a constant rate of change, regardless of the nucleotide, different rates for different nucleotides, or emphasis on base composition of sequences being studied.

In the simplest case, there is one constant rate, , to any nucleotide. Substitutions occur randomly among the four nucleotides. The rate of substitution for each nucleotide is 3 per unit time, and the rate of substitution to each of the three possible directions of change is . This one parameter model was proposed by Jukes and Cantor (1969). If you assume that a nucleotide at time 0 at a specific site is an C, we then consider the probability that this site will have a C at time t. Since the initial state is a C, the = 1. At time 1, the probability of still having a C is

which is the probability that the nucleotide remains the same. The probability of having C at time 2 is

This includes both the probability of no change at the site as well as the probability of the nucleotide changing to another state and then changing back. This relationship will hold for any t. If we approximate this as a continuous time model and solve the first order differential equation, we see that for the case of , the solution is

For , the probability of having C at time t is

We can generalize these formulas above to be the probability of becoming nucleotide i given that we started with nucleotide i as well as the probability of starting with nucleotide i and changing to nucleotide j. Under this model, the equilibrium frequency of the four nucleotides is .

The key assumption of the Jukes-Cantor model that all nucleotide substitutions occur randomly can be quite unrealistic. Nucleotide substitutions can be categorized as transitions and transversions. Transitions are substitutions between A and G (purines) or between C and T (pyrimidines). Transversions are substitutions between a purine and a pyrimidine (e.g., A changing to C). Transitions are more frequent than transversions. Kimura(1980) incorporated this in a two parameter model, in which the rate of transitional substitution at each nucleotide site is per unit time, and the rate of transversions is per unit time.



next up previous
Next: The Problem of Up: Substitution Models Previous: Substitution Models



Simon Cawley
Tue May 12 16:54:23 PDT 1998