next up previous
Next: More sophisticated approach: Up: Theory Previous: Theory

Basic coverage result: random clones of constant length, fixed point

Suppose we have a clone library of N clones of length with mean for a genome of length G. Example of the parameters as in LBL: bp, bp, N = 5,000. We want to calculate the coverage of these clones, and the number of islands.

 


: Sample clone overlap configuration.

We model the genome as the line , and assume initially for simplicity that the clones have constant length . Suppose left ends of clones are ``random'', i.e. i.i.d. uniform on . Then at a given point ,

More generally,

where coverage = average number of times the genome is covered. Denote the ordered left ends of clones by , and the spacings between consecutive left ends by .

Ex 1. Show that the spacings {} are exchangeable (the joint distribution is symmetric), and have the marginal density

with mean .

So an ocean (a gap between neighboring islands) follows a fragment with left end at iff :

Assume that the spacings are independent (not quite true, but ok for high clone redundancy), so the number of gaps :

So

implying

 
Figure 3: Comparison of the exact formula, the exponential approximation, and the simulated results.

Note that the exponential approximation for the expected number of islands as , which is inconsistent with the correct limit 1. However, the approximation is pretty good for smaller N. This indicates that caution should be taken when using exponential approximation in this type of analysis. For the calculation of the expected size of a random island, see Roach [15].



Simon Cawley
Thu Apr 30 03:30:28 PDT 1998