next up previous
Next: Deciding upon significance Up: Stat 260: Statistics Previous: Mapping QTL: preliminaries

Interval mapping for QTL

The idea behind interval mapping is simple: one can gain power in testing the null hypothesis against the alternative that there is a QTL at a specified locus, by incorporating marker information from either side of the locus. Specifically, if we are testing at s, we make use of the two markers flanking s, the ones at loci L (for left) and R (for right) say. The original interval mapping method assumed normality, and since it is a nice exercise in the use of the EM with mixtures, I'll run through it. These days a rank variant is more popular, particularly for QT which are obviously not normally distributed in either of the pure lines or the .

Suppose that our DATA are (), and that our statistical model is:

Given the entire genotype g, y .

Write . Our null hypothesis is = 0, and our alternative . The genotype at s is not part of our DATA, so we enlarge DATA to the augmented data ADATA defined to be (). Although it would not be hard to carry out this analysis using a realistic model for recombination, we adopt the standard Assumption: Recombinations across disjoint intervals (same meiosis) are mutually independent.

A little thought reveals that our model for DATA is a set of four 2-component normal mixtures, one for each of the 4 combinations of genotype at L and R. However, the model of ADATA is straightforward, and the corresponding likelihood has up to a constant in given by

Now the EM algorithm makes use of the Baum et al lemma given last week, where here Q() = | DATA].

In order to carry out the E-step and so compute Q, we need the following table, where , and r are the recombination fractions between L and s, s and R, and L and R respectively.

Conditional distribution of given and

Having computed Q, we next need to maximize it in , which is easy, because Q is essentially the log likelihood associated with a normal linear regression with a single regressor. Many iterations later, we have under the alternative, and again under the null, and so can compute LOD at

where (resp. ) is the likelihood under the null (resp. alternative), and the s are the corresponding MLEs.

Exercise 4: Complete the details of the E part of the above EM.

What happens next is that the LOD score is plotted as a function of s. A significant peak (see next section) is then declared to be a putative QTL. That is almost what we see in the Nature Genetics paper, which used MAPMAKER/QTL. There is a difference between those plots and our current discussion. They were considering an intercross, and so there were 3 possible genotypes at each locus, arising in proportions 1:2:1, and leaving 2 d.f. for differences in means. However, we can easily define contrasts essentially equivalent to our 1 d.f. story.

A recent enhancement to MAPMAKER/QTL permits a more robust analysis, suitable for QTs which are not normally distributed, given genotype. Here is a quick description of what this is, following Kruglyak and Lander (1995). Define the test statistic at s by

where is the rank of within (, ..., ), and is -1 or +1 according as is A or H. When all data are available at the flanking markers L and R, the conditional expectation of can be calculated using the table above. In general one can go to the first pair of flanking markers on which full data are available.

The statistics can be normalized by the square root of its mean square under the null hypothesis, as it has null mean zero. Kruglyak and Lander call the resulting statistic , and either plot that or a LOD equivalent.

Exercise 5: Check that , and that .

Haley and Knott (1992) have given a simple approximation to interval mapping, which requires only one regression (rather than iteration) at each location s. Their idea is to regress on pr(), and to compute the LOD at s from the RSS of this regression in the natural way. This appears to give a pretty good approximation to the LOD curve.

Finally, we must turn to the ever important question of



next up previous
Next: Deciding upon significance Up: Stat 260: Statistics Previous: Mapping QTL: preliminaries



Simon Cawley
Mon Apr 20 19:59:26 PDT 1998