next up previous
Next: Multilocus mapping in Up: Stat 260: Statistics Previous: Three-locus ordering.

ML estimation of order with 3 loci.

First let us note that if we put our data into a table of counts as above, then it is not necessary to know the order of the three loci. In the above table, the smallest entry is for the double recombinant category because the best order is in fact w-m-f. But if we had labelled our rows by recombination or not across w-f (say), leaving the columns the same, the entries would simply be permuted.

Exercise 4: Check this last assertion.

Some time early in the study of such three-point test-crosses (as the above cross is known in the genetic literature), it was realized that the best order is that which assigns the smallest of the four cell entries to the double recombinant category. (Here and below, I assume that we have four counts with a unique smallest and largest. When there are ties, the details are a bit messier, and the conclusion less neat.)

My aim in this section is to prove that the above rule for determining order is in fact the ML order, under the assumption of No Chromatid Interference (NCI). It is also true under the assumption of independence of recombinations across disjoint intervals, but this stronger assumption is not necessary. Indeed the fact that it is true under an assumption that is often contradicted by the data is of interest in itself.

To get the proof going, we need some notation and some preliminary facts.

First, suppose that we have three loci whose true order is 1-2-3, and denote the intervals 1-2 and 2-3 by and . Define the joint recombination probabilities as follows:

= pr( rec. across & rec. across )

= pr( rec. across & no rec. across )

and similarly for and . These probabilities can be put into a table with rows labelled by rec. or not across (1 or 0) and colummns labelled by rec. or not across (1 or 0), but we repeat an earlier assertion in a slightly different form: if we labelled the rows and columns by rec. across different intervals (e.g. 1-3 and 3-2), the entries would simply be permuted. A reformulation of this assertion is as follows: the joint distribution of recombinations across the two intervals can be described by such a set of relative to any order, not necessarily the true one. However, for what we want next, the labelling must refer to the true order, but the expressions involved to not refer to observable quantities, so this is no impediment. We define probabilities q by:

= pr(# exchanges in & # exchanges in )

= pr(# exchanges in & # exchanges in )

and similarly for and . Clearly these qs refer to the 4-strand bundle, and so to unobservable events. Nevertheless, they play an important role in the theory under NCI.

An exercise for last week was to generalize Mather's formula to 2 intervals, and here is where we need it:

Mather formula for 2 intervals: Under NCI,

If we assume that all the are non-zero, then we see that under NCI, is always the smallest, and the largest of a set of such probabilities, when labelled in this way corresponding to the true order. In general, however they are labelled, the smallest p corresponds to the double recombinant category, and the largest to the non-recombinant category, and this is sufficient to single out just one locus order (up to a reversal).

Exercise 5: Prove this last assertion, that is, a set of p's satisfying the inequalities that follow from the Mather relations, defines a unique (up to inversion order) for the loci.

If are not necessarily given relative to the true order, then there are three possible sets of inequalities they might satisfy (under NCI), and we call them (corresponding to 2-1-3), (1-2-3) and (1-3-2).

Exercise 6: Write out the three sets of inequalities, with the ps written w.r.t order 1-2-3.

Estimating the order is in essence deciding which of these sets of inequalities is most compatible with the data. The easiest case is when there are no ties, for then there is a unique such order, and this must be the ML order. The proof of our main result is left to you, now all the preliminaries are available.

Exercise 7: Suppose that our data are denoted by as above, relative to labelling 1-2 of rows and 2-3 of columns. Assume that there is a unique smallest and a unique largest count. Prove that under a multinomial model for the data, there is an ML order and it is the one with the smallest count as the count of double recombinants.

The final item of interest in this story is the equivalence of the ML order under NCI with that under the independence model. The proof (first given by Mary Sara McPeek in her Berkeley thesis) rests on a neat inequality. As before we suppose that our counts and the ps are written relative to the true order, leaving the modification to other orders as an exercise. Under the independence model, the ps have the form

, ,

If the counts are a, b, c and d with a < b, c < d, then the likelihood under a multinomial model and is

and its maximum value over the parameter space is

where n = a+b+c+d.

Similarly we can calculate and . (Note that we have assumed that here, without assuming NCI. Can you prove these bounds under the independence model?)

Exercise 8: Using the strict monotonicity of on , prove that the ML order here is 1-2-3.



next up previous
Next: Multilocus mapping in Up: Stat 260: Statistics Previous: Three-locus ordering.



Simon Cawley
Mon Apr 20 19:52:17 PDT 1998