We present here the efficient algorithm introduced by Sobel and
Lange [6] and Kruglyak et al. [1] to
compute
at a locus l. To simplify the notation, we
drop the subscript l.
As before, let
be a vector of alleles
assigned to the founders of the pedigree. The idea is to represent by
a graph the restrictions imposed by the observed marker genotypes on
the vectors a that can be assigned to the founder genes. The
algorithm extracts from the graph only the vectors a compatible with
the marker data, and
is obtained by summing over the
compatible vectors.
We define a founder graph as a graph whose vertices are the founder
genes and whose edges connect the genes appearing together in a
genotyped individual for the gene flow specified by the inheritance
vector v. The edges are labeled with the genotype of the
corresponding individuals.
An example helps understanding the concept. Let's say
figure 3(a) represents the gene flow in the pedigree of
figure 2. Adopting the convention that paternally inherited
genes are on the left, and placing the non-founders in increasing
order, this correspond to the inheritance vector
.
Figure 2: Example of marker data on a pedigree.
Figure 3: Examples of descent and founder graphs.
The founder graph associated with this realization of the gene flow is
represented in figure 3(b). An edge connects genes 3 and
5 because they both pass through individual 12 whose genotype is
observed. Then, the gene 3 is transmitted to individuals 21 and 22
along with gene 1 from individual 11, so we trace an edge between
genes 1 and 3, and so on.
A founder graph comprises a given number m of connected components
that we label
. By construction, the founder genes
assigned to different components appear in different genotyped
individuals. Under random mating and Hardy-Weinberg equilibrium, the
vectors of alleles assigned to different components are independent
and each component can be processed individually.
It may happen that some genes never pass through genotyped
individuals. Their vertices form singleton components, and any allele
type can be assigned to them. In the founder graph of
figure 3(b), the vertex 2 forms a singleton.
We find the set of compatible allelic assignments of the non-singleton
components by doing the following. First, we identify the set of
compatible alleles for each vertex. It is the intersection of the
genotypes attached to the edges incident to the vertex. This is
basically saying that a gene can only be of the allele type or types
that are observed in all the individuals where the gene is present.
Notice that the genotype of one individual (one edge) already reduces
the number of compatible alleles to at most two.
In our example, the two edges incident to gene 3 are labeled
,
so its set of possible alleles is
. The edges incident to
gene 6 represent individuals with genotypes
and
. This
forces gene 6 to be of type b.
The intersection of the genotypes may be empty. In
this case, there is no compatible assignment for the component, and
hence for the graph, so
.
In the second part of the algorithm, the whole structure of the connected component is utilized to further restrict the allelic assignments to only those compatible with the observed genotypes for the component. The following steps are performed:
At the end of the process, we get a set of compatible allelic
assignments for each connected component and we denote them by
. Table 1 gives the sets of compatible
assignments for the components in the founder graph of
figure 3(b).
Table 1: Compatible allelic assignments for the components of the example
founder graph.
Each set
contains 0,1 or 2 assignments, except for singleton
components. An empty set for at least one of the components means that
for the inheritance vector we are considering, there is no compatible
allelic assignment and therefore
. The only global
compatible assignments are those in the Cartesian product of the sets
.
The final step is the computation of the desired probability. We
ignore singleton components since their probability is 1. For the
other components, let
be an element of
(a
vector of alleles assigned to the vertices of the component
). Then,

The summation over the
contains a maximum of 2
terms. We take the product over 2f elements. The maximum number of
operations is therefore 4f, and we see that the size of the
computation scales linearly with the number of founders in the pedigree.