next up previous
Next: References Up: Phylogenetic Methods Previous: Parsimony Methods

Likelihood Methods

Likelihood methods for phylogenies were first introduced by Edwards and Cavalli-Sforza (1964) for gene frequency data. Neyman (1971) applied likelihood to molecular sequences and this work was extended by Kashyap and Subas (1974). Felsenstein (1973,1981) brought the maximum likelihood framework to nucleotide-based phylogenetic inference. As an example, we will compute the likelihood for a given tree using DNA sequences, but these procedures can be used for other characters as well. This example is borrowed from Felsenstein (1981).

Figure 2. An example tree for Maximum Likelihood

We have a set of aligned sequences with m sites. To compute the probability of a tree, we must have a phylogeny with branch lengths and an evolutionary model giving the probabilities of change along the tree. This model will allow us to compute the transition probabilities , the probability that state j will exist at the end of a branch of length t, if the state at the start of the branch is i. We must remember the t is a measure of branch length, not time. There are two central assumptions: (1) evolution at different sites on the tree is independent (and identically distributed) and (2) evolution in different lineages is independent. This gives us the Markov property down trees (cf. pedigrees).

The first assumption allows us to express the probability as a product, with one term for each site:

where is the data at the i-th site. Therefore, we need to know how to compute the probability at a single site. If we have the tree and the data at a site, the probability is the sum, over all possible nucleotides that may have existed at the interior nodes of the tree, of the probabilities of each scenario. )

Based on the Markov property (2) above, we can decompose the probability on the right side of the equation into a product:

If we assume that evolution has been proceeding for a long time according to the model of substitution that we are using, then the Pr(x) is the equilibrium probability of base x under that model.

However, there are still a large number of terms in the previous equation, such that for each site we sum terms. The number of terms will rise exponentially with the number of species. On a tree with n species, there are n-1 interior nodes, each of which can have one of 4 states. If we need terms, then for 10 taxa, this would be 262,144 terms!

In order to make the computation more realistic, Felsenstein (1973, 1981) introduced a "pruning" method, which is a version of the "peeling algorithm" introduced by Hilden (1970), Elston and Stewart (1971) and Heuch and Li (1972). As we saw in a previous week, these in turn are analogous to the HMM forward-backward algorithms, and all are special cases of the idea of dynamic programming over a directed acyclic graph (of Week 14). The method can be derived simply by trying to move the summation signs as far right as possible and enclose them in parentheses where possible.

The pattern of parentheses and terms for tips in this expression is

which corresponds exactly to the structure of the tree. The flow of computations is from the inside of the innermost parentheses outwards, with a flow of information down the tree).

We will utilize the conditional probability of a subtree, , which is the probability of everything observed from node k on the tree on up, at site i, conditional on node k having state s (cf. HMM backward probabilities). In the previous equation, the term

is one of the quantities, i.e. it is the probability of everything seen above that node, given that the node has base w. There will be four such quantities, one for each of the values of w. The key to the pruning algorithm is that once the four numbers are computed, they don't need to to be recomputed again. The algorithm is a recursion that computes at each node of the tree from the same quantities in the immediate descendent node. The algorithm is applied starting at the node which has all of its immediate descendents being tips. Then it is applied successively further down the tree, not applying it to any node until all of its descendents have been processed. The result is for the bottom node. The evaluation of likelihood is then completed for this site by making a weighted average of these over all four bases, weighted by their prior probabilities under the model:

Estimation of branch lengths () can be done by a form of the EM algorithm adapted to trees in the same way as the HMM estimation algorithm is adapted to (linear) Markov chains. Forward (above) and backward (below) probabilities can be combined to give joint probabilities of states at adjacent nodes given the data, and these become the E-terms in the EM iteration. Details can be found in Felsenstein (1981) or worked out independently by analogy with the HMM case.



next up previous
Next: References Up: Phylogenetic Methods Previous: Parsimony Methods



Simon Cawley
Tue May 12 16:54:23 PDT 1998