Suppose we have a clone library of N clones of length
with mean
for a genome of length G. Example of the parameters as in LBL:
bp,
bp, N = 5,000.
We want to calculate the coverage of these
clones, and the number of islands.

: Sample clone overlap configuration.
We model the genome as the line
, and assume initially for
simplicity that the clones have constant length
.
Suppose left ends of clones are ``random'', i.e. i.i.d. uniform on
.
Then at a given point
,


More generally,

where
coverage = average number of times
the genome is covered.
Denote the ordered left ends of clones by
, and the
spacings between consecutive left ends by
.
Ex 1. Show that the spacings {
} are exchangeable
(the joint distribution is symmetric), and have the marginal density

with mean
.
So an ocean
(a gap between neighboring islands) follows a fragment with left end at
iff
:

Assume that the spacings are independent (not quite true, but ok for
high clone redundancy), so the
number of gaps
:

So

implying

Figure 3: Comparison of the exact formula, the exponential approximation, and the simulated results.
Note that the exponential approximation for the expected number of
islands
as
, which is
inconsistent with the correct limit 1. However, the approximation is
pretty good for smaller N. This indicates that caution should be
taken when using exponential approximation in this type of analysis.
For the calculation of the expected size of a random island, see
Roach [15].