next up previous
Next: Incomplete IBD information: Up: Stat 260: Statistics Previous: Some properties of

Incomplete IBD information: single markers

In this section we consider modifications to our analysis necessary when we have marker data at or near a locus of interest on sibs and possibly parents too, not necessarily determining IBD status. Let's begin by looking at some examples.

Example A. At a single marker locus, parental mating type is and sib genotypes are and . Clearly, the sibs share DNA IBD on 1 chromosome (paternally inherited allele "1").

Example B. As above, but parental mating type is and sib genotypes are and . Again, it is clear that the sibs share DNA IBD on 0 chromosome at this locus.

Example C. Parental mating type now is , and sib genotypes are (a) , , or (b) , . In case (a) we could have IBD =0 (the "3" alleles are distinct by descent, i.e. come from different maternal chromosomes) or IBD=1 (the "3" alleles are IBD, i.e. come from the same maternal chromosome). Intuitively, these two possibilities are equally likely, given no extra information, but what if the sibs are both affected? Similarly, for case (b), the sibs can share DNA IBD on either 1 or 2 chromosomes.

Example D. Parental mating type is and sib phenotypes are (a) , ; or (b) , ; or (c) , ; or (d) , . It is clear that in cases (a), (b) and (c), the sibs share 2, 1 and 0 alleles IBD at this locus, while in case (d) they could share 0 ("1" alleles from different parent and "2" alleles from different parent) or 2 ( "1" and "2" alleles from same parent). Again, it is intuitively clear that these last two options are equally probable, given no further information. To check this a bit more formally, list the 16 possible inheritance vectors, and look at those compatible with these data. If we consider the ordered parental genotype compatible with mating type , where denotes a maternal allele, then the four inheritance vectors compatible with the data on the sibs are , , , and , which involve sharing of 2, 0, 0 and 2 alleles IBD, respectively. These 4 vectors are equally likely given no further information. Similarly for the other three ordered parental genotypes. What if both sibs are affected?

Let pgm denote ordered parental genotypes at the marker, mtm parental mating type at the marker (as defined in Section 2.2), and sgm sib genotypes at the marker (unordered and sib genotypes can be permuted). The observed marker data is m, and or if no parental data are available.

With no phenotype information on the sibs

where pgm is any ordered genotype compatible with mtm.

The general approach to incomplete IBD information is to expand in terms of expressions which we can evaluate. To do this we need the following additional genetic assumption:

Assumption G3. Within a family, sib phenotypes are conditionally independent of any maker genotype data given multilocus genotypes at the DS loci.

Assumption G4. There is linkage equilibrium between marker and DS loci, i.e. parental genotypes at the marker are independent of parental genotypes at the DS loci.

This seems like a fairly strong assumption, and it clearly excludes "markers" right on top of a DS locus. Nevertheless, we are unable to get the conclusion of the following proposition without it. Of course, one doesn't need the proposition if IBD can be established directly.

In practice, we don't need to sum over all parental genotypes pgm, since ,

Hence

where the first sum is over all parental mating types mtm at the marker compatible with observed parental marker data (if any), and pgm is any ordered parental genotype compatible with mtm. When the parental mating-type mtm is known, likelihood based tests for testing linkage don't depend on , the parental mating-type frequencies.

Let's use this proposition on Examples C and D (d) above. With Example C, we have and . There are 8 inheritance vectors consistent with sgm and a representative ordered parental genotype , namely , , , and and 4 other vectors obtained by permuting the two sibs. Of these 4 have IBD= 0 and 4 have IBD = 1. Hence,

Similarly, with Example D(d),

If we had two sibs with marker genotypes and NO parental information, it would be necessary to sum over those parental mating types at the marker compatible with sgm, specifically . The resulting probability would then involve the parental mating type frequencies, typically calculated under the assumptions of Hardy-Weinberg equilibrium and random mating, from observed marker allele frequencies. Just how sensitive the results will be to violations of these (usually unexamined) assumptions is difficult to say.

What do we do with these expressions? Under the sampling assumptions introduced earlier ( Assumptions S1, S2), plus the following additional assumption we calculate the likelihood of marker data , , on n ASPs.

Assumption S3. For a particular sib-pair, the parental genotypes at the marker and at the DS loci are independent of any phenotype and marker data from OTHer families, i.e.

where OTH denotes any marker and phenotype data from OTHer families.

Under Assumptions S1, S2, S3, the likelihood of the marker data given the phenotype data is:

We can now go on to carry out likelihood-based tests as before, e.g. likelihood ratio test of vs. or of vs. , or a score test of . Maximization of the likelihood with respect to the 's is done by using the EM algorithm.



next up previous
Next: Incomplete IBD information: Up: Stat 260: Statistics Previous: Some properties of



Simon Cawley
Tue May 26 19:30:26 PDT 1998