next up previous
Next: Final Comments Up: Stat 260: Statistics Previous: Interval mapping for

Deciding upon significance in genome-wide scans.

Conducting one of the analyses I have just described at many loci, say every 1 cM along a genome, is known as a genome(-wide) scan in search of QTL. Clearly we are only looking for one QTL at a time, and ignoring the possibility of closely linked QTL, of interactions, and other complications, but this is a start. A question of great interest to biologists is: what value of LOD or should be regarded as high enough to warrant rejecting the null hypothesis? These days there are always two approaches to significance: one using some formula based upon an exact or asymptotic distribution, and one using a computer: simulation, randomization, or bootstrapping. In this context, MAPMAKER/QTL makes use of a formula, but there is also a very simple method of obtaining critical values computationally, first outlined by Churchill and Doerge (1994). It is so simple I can say it quickly: after carrying out a genome scan, using any approach at all, simply randomly reassign genotype data to QT values using a random permutation of , and redo the analysis using the same method with this relabelled data. Repeating this random reassignment another 999 (or 9,999) times, one can then obtain a threshold for a LOD or Z score, above which one will not go (under the null hypothesis) more than any prescribed fraction of scans.

Exercise 6: What precisely does the procedure just described test? How would you adapt it to looking for multiple unlinked QTL ?

We close this section by outlining the derivation of Lander and Botstein's asymptotic formula giving threshold for carrying out genome-wide scans using a statistic which is marginally like a Z-score. Note that we only get a genuine Z score at markers. In between markers the EM algorithm has been in action, and this will mean that the test ``Z-statistic'' at a locus s in between markers is really only asymptotically N(0,1), as the sample size gets large and as the intermarker spacing gets uniformly small. It should also be apparent that Z scores at nearby loci are highly correlated, because most marker data at two nearby loci will be the same. It is thus not surprising that the correlation function is essentially given by the relation between recombination fractions and map distance. Specifically, under our independence of recombinations assumption, the process of Z scores is, asymptotically as and the inter-marker spacing uniformly, and under the null, an Ornstein-Uhlenbeck (OU) process, with mean zero and covariance function , where s is measured in Morgans, and is a constant, here 2. (A different model for the relation between map distance and recombination fraction would lead to a different Gaussian process, but it would have the same correlation function near s=0, and that is what matters here.)

Our problem of determining the threshold for genome-wide scans has thus been reduced to calculating a boundary-crossing probability for an OU process, at least in the case of dense markers and large n. (Note that the permutation approach does not make any assumptions about sample size and marker density.)

The approximation Lander et al give is as follows: the chance that the Z process exceeds a threshold T say, in a genome-wide scan of C chromosomes having total map length G Morgans, is times the corresponding marginal threshold probability, whether one or two-sided.

This result can be pieced together from different approximations to OU crossing probabilities, for which I refer you to books by Leadbetter et al (1983, chap. 12), Siegmund (1985, chap. 4), or Aldous (1989, chap. D). One first needs to establish the approximation for a single chromosome of length , say, obtaining a factor of . Then these need to be combined across C chromosomes, using a Bonferroni argument. For some forms of the approximate crossing probabilities, you will need to recall that for large x, is approx. small .

Exercise 8: Complete the details in the derivation just outlined.



next up previous
Next: Final Comments Up: Stat 260: Statistics Previous: Interval mapping for



Simon Cawley
Mon Apr 20 19:59:26 PDT 1998