Understanding changes in nucleotides over time is essential for understanding the evolution of DNA sequences as well as estimating the rate of evolution and reconstructing the evolutionary history of organisms (Li 1997). To study this process, we must make assumptions regarding the probability of substitution of one nucleotide by another. A variety of models have been proposed (Jukes and Cantor 1969; Kimura 1980, 81; Hasegawa et al. 1985; Tamura and Nei 1993).
For DNA sequences, rate of substitution per site can be expressed as a 4 x 4 instantaneous rate matrix, Q, in which each element Qij represents the rate of change from base i to base j during some infinitesimal time period dt. Models may be based on a constant rate of change, regardless of the nucleotide, different rates for different nucleotides, or emphasis on base composition of sequences being studied.
In the simplest case, there is one constant rate,
, to
any nucleotide. Substitutions occur randomly among the four
nucleotides. The rate of substitution for each nucleotide
is 3
per unit time, and the rate of substitution
to each of the three possible directions of change is
.
This one parameter model was proposed by Jukes and Cantor (1969).
If you assume that a nucleotide at time 0 at a specific site is an
C, we then consider the probability
that this site will
have a C at time t. Since the initial state is a C, the
= 1. At time 1, the probability of still having a C is

which is the probability that the nucleotide remains the same. The probability of having C at time 2 is

This includes both the probability of no change at the site as well
as the probability of the nucleotide changing to another state
and then changing back. This relationship will hold for any t.
If we approximate this as a continuous time model and solve the
first order differential equation, we see that for the case of
, the solution is

For
, the probability of having C at time t is

We can generalize these formulas above to be the probability of
becoming nucleotide i given that we started with nucleotide i
as well as the probability of starting with nucleotide i and
changing to nucleotide j. Under this model, the equilibrium
frequency of the four nucleotides is
.
The key assumption of the Jukes-Cantor model that all nucleotide
substitutions occur randomly can be quite unrealistic.
Nucleotide substitutions can be categorized as transitions
and transversions. Transitions are substitutions between
A and G (purines) or between C and T (pyrimidines). Transversions
are substitutions between a purine and a pyrimidine (e.g.,
A changing to C). Transitions are more frequent than
transversions. Kimura(1980) incorporated this in a two
parameter model, in which the rate of transitional substitution
at each nucleotide site is
per unit time, and the
rate of transversions is
per unit time.