next up previous
Next: Yates' algorithm Up: Stat 260: Statistics Previous: The HMM formulation

Computing the probability of marker genotypes given an inheritance vector

 

We present here the efficient algorithm introduced by Sobel and Lange [6] and Kruglyak et al. [1] to compute at a locus l. To simplify the notation, we drop the subscript l.

As before, let be a vector of alleles assigned to the founders of the pedigree. The idea is to represent by a graph the restrictions imposed by the observed marker genotypes on the vectors a that can be assigned to the founder genes. The algorithm extracts from the graph only the vectors a compatible with the marker data, and is obtained by summing over the compatible vectors.

We define a founder graph as a graph whose vertices are the founder genes and whose edges connect the genes appearing together in a genotyped individual for the gene flow specified by the inheritance vector v. The edges are labeled with the genotype of the corresponding individuals.

An example helps understanding the concept. Let's say figure 3(a) represents the gene flow in the pedigree of figure 2. Adopting the convention that paternally inherited genes are on the left, and placing the non-founders in increasing order, this correspond to the inheritance vector .

  
Figure 2: Example of marker data on a pedigree.

  
Figure 3: Examples of descent and founder graphs.

The founder graph associated with this realization of the gene flow is represented in figure 3(b). An edge connects genes 3 and 5 because they both pass through individual 12 whose genotype is observed. Then, the gene 3 is transmitted to individuals 21 and 22 along with gene 1 from individual 11, so we trace an edge between genes 1 and 3, and so on.

A founder graph comprises a given number m of connected components that we label . By construction, the founder genes assigned to different components appear in different genotyped individuals. Under random mating and Hardy-Weinberg equilibrium, the vectors of alleles assigned to different components are independent and each component can be processed individually.

It may happen that some genes never pass through genotyped individuals. Their vertices form singleton components, and any allele type can be assigned to them. In the founder graph of figure 3(b), the vertex 2 forms a singleton.

We find the set of compatible allelic assignments of the non-singleton components by doing the following. First, we identify the set of compatible alleles for each vertex. It is the intersection of the genotypes attached to the edges incident to the vertex. This is basically saying that a gene can only be of the allele type or types that are observed in all the individuals where the gene is present. Notice that the genotype of one individual (one edge) already reduces the number of compatible alleles to at most two.

In our example, the two edges incident to gene 3 are labeled , so its set of possible alleles is . The edges incident to gene 6 represent individuals with genotypes and . This forces gene 6 to be of type b.

The intersection of the genotypes may be empty. In this case, there is no compatible assignment for the component, and hence for the graph, so .

In the second part of the algorithm, the whole structure of the connected component is utilized to further restrict the allelic assignments to only those compatible with the observed genotypes for the component. The following steps are performed:

  1. Pick an arbitrary vertex in the component.
  2. If the set of compatible alleles for that vertex contains one element, select that allele type. If the set contains 2 elements, repeat step 3 for each of the 2 allele types.
  3. Selecting one allele type forces the allele type of the adjacent vertices. In turn, the allele type of these vertices forces the allele type of their adjacent vertices. By continuing like that, we set the allele type of all the vertices in the component. We just have to traverse the graph and to record the alleles assigned to each vertex to obtain a compatible allelic assignment. If, at any point, an incompatibility is encountered, we can't get a compatible assignment from the allele type we started from.

At the end of the process, we get a set of compatible allelic assignments for each connected component and we denote them by . Table 1 gives the sets of compatible assignments for the components in the founder graph of figure 3(b).

  
Table 1: Compatible allelic assignments for the components of the example founder graph.

Each set contains 0,1 or 2 assignments, except for singleton components. An empty set for at least one of the components means that for the inheritance vector we are considering, there is no compatible allelic assignment and therefore . The only global compatible assignments are those in the Cartesian product of the sets .

The final step is the computation of the desired probability. We ignore singleton components since their probability is 1. For the other components, let be an element of (a vector of alleles assigned to the vertices of the component ). Then,

The summation over the contains a maximum of 2 terms. We take the product over 2f elements. The maximum number of operations is therefore 4f, and we see that the size of the computation scales linearly with the number of founders in the pedigree.



next up previous
Next: Yates' algorithm Up: Stat 260: Statistics Previous: The HMM formulation



Simon Cawley
Thu Apr 16 15:30:12 PDT 1998