By far the most popular stochastic model for reproduction in population genetics is the Wright-Fisher model (developed implicitly by Fisher in [4] and earlier papers and explicitly by Wright[14]). It is, of course, highly idealized in the form presented below. However, it can be (and has been) generalized and its assumptions relaxed, and even in its pure form, it succeeds in capturing the essence of the biology involved.
Wright-Fisher has many of the same assumptions as Hardy-Weinberg Equilibrium, with the important exception of finite population size N (after all, it is the effects of sampling gametes in a finite population that we are interested in modeling). The really crucial assumptions are:
First, note that because allelic variants are neutral, it makes no
difference to the fate of individuals (and thus the descent of their
alleles) how the alleles are distributed among individuals. We can
consider a haploid model then, as equivalent to a diploid model and in
general, no matter what the ploidy level, consider the``individuals" in
the model to be gametes, without regard to their arrangement in the
organisms themselves. Because most organisms of biological study are
diploid
, we
will keep that convention, but the only effect in this case is that our
population size is 2N gametes, rather than N. The remaining necessary
specifications to the model are as follows:
and
number of
alleles at time t
In the Wright-Fisher model, we imagine that gametes are chosen
randomly each generation from an effectively infinite gamete pool
reflecting the parental allele frequencies
. Then the sampling is
binomial, and

Recall that one of the implications of Hardy-Weinberg was that
under random mating and absent any directional perturbing forces such as
mutation and selection, genetic systems will be at a stable equilibrium.
Here, although we are allowing stochastic fluctuations in
from
generation-to-generation sampling, there is no directionality expected in
the changes. This, plus the observation that
is Markovian
justifies the assertion that

and now we see that
is also a martingale, with two
possible limits, 0 and 2N. We can further write

and derive

It follows from the stopping time theorem for bounded
martingales that the probability of
being absorbed at either of
the two boundaries is

We are interested mainly in the situation where
has
entered a monomorphic
population (through, perhaps, mutation).
This result tells us that when the new mutant
enters the
population (in a single copy,
), the probability that it
eventually fixes and replaces the resident
is its frequency,
.
There are other ways to derive this result, one being to solve the Markov
chain
directly. Another makes use of the ``coalescent"
reasoning described earlier by considering the genealogy of alleles in the
following way: at time 0, there will be 2N gametes in the population,
any of which might or might not leave descendants in the next generation.
If they do not, the lineage of that allele copy is extinct in the
population. If we follow the population through time, eventually all but
one of the 2N original lineages will be extinct, and the remaining one
will be fixed in the population. Because all of the original gametes have
equal probability of generating the surviving lineage, the fixation
probability of any allelic type is simply the frequency of that type.
Although this is simply a verbal argument, the genealogical perspective
underlying it is an extremely powerful one in analyzing molecular sequence
data, and it is thus worth thinking about some long-solved problems in
this way.