next up previous
Next: Distance model: Maximum Up: Stat 260: Statistics Previous: Variant 4: Double-end-sequencing

STS-content mapping

Another important step in the physical mapping process is the assembly of the map. In this section, we will discuss methods of reconstructing clone and marker orders in the context of STS-content mapping.

In STS-content mapping, we have N clones and M STSs, and also the incidences between the STSs and clones. Clones hit by a common STS overlap due to the uniqueness of the STS in the genome. The goal is to retrieve the order of the clones and the STSs along the genome, and estimate the distances between them if plausible. Ideally either ordering result can be inferred by the other, provided we have enough information. We will not worry about the condition for the implication of clone order from the STS order in this discussion. We can reformulate the problem as the following: given a incidence matrix I = where if STS j hits clone i; otherwise, find a permutation of the rows and the columns of I such that the rows and the columns both have the consecutive ones property. A 0-1 matrix is said to have the consecutive ones property in the columns if all the 1s in each column are grouped. The incidence matrix in the correct order has the consecutive ones property since STSs hit by a common clone should be grouped, and the clones hit by a STS are overlapping neighboring clones. Apart from the impractical exhaustive search, this problem actually can be solved in linear time ()by a nice PQ-tree algorithm (Booth and Leuker, 1976 [3]), assuming no errors in the incidences. See Figure 6 for illustration of an example.

  
Figure 6: An example of PQ-tree algorithm: A P-node is denoted by a circle, whose children may be rearranged in any order; a Q-node is represented by a rectangle, whose children are in order, up to reversal. The process illustrates the use of the PQ-tree data structure to permute the columns (the STSs) of the above clone-STS incidence matrix such that the resulting matrix has the consecutive-ones property. To start, put every STS (column) as a child node of a root P-node. At each step i, group the STSs that hit the same clone i, i.e., those nodes with a 1 in row i should be neighbors. If three or more nodes can only be arranged in one way (up to reversal) accordingly, put them as the children of a Q-node. Iterate the above step until all rows (constraints) have been checked. The final PQ-tree actually gives all compatible orderings in a very compact way. In this example, only two orderings are possible (which is the correct order of the STSs): FAEBCDG, and its reversal GDCBEAF.

With the presence of errors, one can either screen out possible errors prior to assembly and employ this PQ-tree algorithm, or use methods that are more robust to errors. There have been a lot of algorithms for this type of reordering problem in the literature, most of them use combinatorial optimization approaches that optimize an objective function w.r.t. the order, and use different techniques to search the permutation space. Here is a survey of them:





next up previous
Next: Distance model: Maximum Up: Stat 260: Statistics Previous: Variant 4: Double-end-sequencing



Simon Cawley
Thu Apr 30 03:30:28 PDT 1998