next up previous
Next: References Up: Stat 260: Statistics Previous: Markovian Models

Markov chain



next up previous
Next: References Up: Stat 260: Statistics Previous: Markovian Models

Markov chain

Higher order Markov chains are usually required with DNA sequences, but for simplicity we only consider 1st order. Note that a MC of any order can be vectorized to give 1st order MC. Recall the notation for stationary transition probabilities. Under such a model, we can write

Let be the dinucleotide frequencies in a string of length N. i.e.

etc. Write .

If the sequence starts at s and finishes at f, the following holds: and differ by 1; and differ by -1; if .

All sequences S with given are equally probable in the independent case and there were of them. It is known that similar expression holds for Markovian model.

Proof: See Whittle (1955).

Here is an application of this to DNA sequences.

Proof: We follow the line of proof for the independence model. Let k be the length of w.

Here where is , with the dinucleotide counts in w subtracted and added.

Observing that we get the desired result.

An expression for was recently calculated by Prum et al (1995) (see also S. Schbath(1995)), and used to standardize. Leung et al (1996) found that words of the some length () could be ranked fairly accurately using the simplest standardization : . However, the more elaborate formula are to be preferred for particular words of interest.



Simon Cawley
Thu May 14 03:30:08 PDT 1998


next up previous
Next: References Up: Stat 260: Statistics Previous: Markovian Models



Simon Cawley
Thu May 14 03:30:09 PDT 1998