next up previous
Next: Difficulties with MCMC Up: Stat 260: Statistics Previous: Assessing the power

MCMC in pedigree analysis

 

In the context of pedigree analysis, the states of the Markov chain are the unknown genetic states X. MCMC implementations described in the literature have used haplotypes, genotypes and inheritance vectors as genetic states X. Notice that X includes all the loci for all the pedigree members (or all the non-founders in the case of inheritance vectors). With the Gibbs sampler, X is partitioned in the individuals in the pedigree .

The probability distribution that we want to simulate from is , where Y represents the observed data, including disease phenotype, observed marker data, etc. and represents the parameters of the genetic model including recombination fractions, penetrance functions, etc.

That probability distribution can be expressed as:

The denominator is the likelihood of the data at and is hard or impossible to work out, but the numerator is usually doable. It can be written as

is the ``prior'' probability of X and is usually straightforward. The acceptance probabilities of the Metropolis-Hastings algorithm are computed using only ratios of .

The ability to simulate genetic states X conditional on the observed data Y is an important feature of MCMC. Before the introduction of MCMC in pedigree analysis, the likelihood of the data under a specified genetic model had been expressed as an expectation that could be approximated by Monte Carlo simulation (see Ott [11]):

The advantage of Lange's formulation is that it allows us to simulate at , which could be easier or more effective than simulating at , the parameter value of interest.

The problem is that the distribution of X, , is not conditional on the data. Most values of X are incompatible with the data, so that . The result is that a large proportion of the simulated Xs contribute nothing to the likelihood. With MCMC, we simulate from , and the chain always steps between values of X compatible with Y.

One thing we can do with MCMC is to estimate a likelihood ratio using the following equality (Thompson and Guo [13], Thompson [14]):

Proof of the equality:

To estimate the likelihood ratio using MCMC, we simulate from , getting and forming

In practice, we may want to discard the first observations because the chain has not yet reached equilibrium. This initial phase is called the burn-in.

We can estimate the likelihood ratio over a range of values of by simulating at a single value . However, the estimate will be good only for in the neighborhood of , where the distribution is close to .



next up previous
Next: Difficulties with MCMC Up: Stat 260: Statistics Previous: Assessing the power



Simon Cawley
Wed Apr 22 19:50:08 PDT 1998