Like all other biochemical processes, PCR is not a perfect process. Occasionally, DNA polymerase will substitute, add, or delete a nucleotide to the growing DNA chain, and these mismatches happening in vitro cannot be removed in the way the DNA replication machinery in the cell does. Statistical models can be used to model the distribution of the number of mutations in PCR, see Sun [4]. We present some basic results in the following.
Without loss of generality, we just study one strand of the DNA.
Let
be the
initial number of the identical copies of the single-stranded
sequences which will serve as templates for DNA replication.
During each cycle, we assume that DNA polymerase forms a new strand
from each existing template with probability
. These newly
formed strands as well as the old templates will serve as templates in
the next cycle. Let
be the number of single-stranded sequences
containing the target and the two primers after n PCR cycles.
Then the sequence
is a branching process. Further, we
make the following two assumptions.
is a Markov process, that is, the distribution of
only depends on
.
is a Galton-Watson process (see
[1]).
Now we define the number of generation. The original sequences is
called the 0-th generation. The sequences generated directly from the
original sequences are call first-generation. Inductively, the
sequences generated from the k-th generation are called the
(k+1)-th generation.
Let
be the number of k-th generation sequences after n
cycles. Then
. It can be shown
that
, and thus
. After n cycles, the probability that we
get a k-th generation sequence from a random chosen sequence is
, which can be approximated to be
if
is sufficiently large. This is a result of
strong law of large numbers. Therefore, the following assumption can
be made when
is sufficiently large.
Assmption(A1). The distribution of the generation number K of
a random chosen sequence after n PCR cycles is
.
For the proof of this and a number of related results, see [4]