Suppose you're tossing a fair coin independently, and are interested in the event of observing `HHHH'. It's easy to see that

If you toss a fair coin n times,

Then, what will be the average number of tosses from the beginning
to the end of the 1st HHHH?
One easy way to solve this is to consider
a Markov chain consisting of states
`start',`H',`HH',`HHH' and end=`HHHH'
and assign
to each of them,
where
=average number of times to the end, starting
with i heads already.
(See Fig. 1)
For end=`HH', say, we have the following equations at hand.

We get
as the solution.
Similarly, we can get
for
H's.
In general,
for end=`n consecutive Hs' is
.
Figure 1: Markov chain for the case when end=HHHH
How about the sequence HT? The answer changes as follows.

Thus,
and
.
We can see a tension between these two seemingly inconsistent answers since

This can be resolved by observing the fact that
you are very likely to get more HHHHs following your first one.
The average number of further HHHHs following
your first one is 1, since the expectation of
geometric(
) distribution to the first T is 2 (counting that tail).
On the average, we wait 30 tosses for HHHH, then get it again immediately,
followed by a T, i.e.

Think about how the above answers will be changed for other `words', such as HTHT, HTHT, HHHT etc.
We can see now that, unlike the average, the variability of the number of occurrences of a particular word depends on the self-overlap of that word. Let's introduce some notation to generalize the above computation.

In summary, we've shown the following:
is relatively easy,
depending only on the model.
Meanwhile,
is tricky, even in the i.i.d. case.
The correct answer depends on the self-overlap of the w.
For more details in this topic, and proofs,
see Waterman's 1995 book.
Exercise 1. Prove variance of time to the first HH with a fair coin is 22. (Above we found the mean time to be 6)
One can get recursion for variances as I did for
.
The full story involving probability generating function and
renewal theory, can be found in Feller
( An Introduction to Probability Theory and Its Applications
Vol.1.)
Simple independence model for DNA sequences have their uses,
but only go a little way. One crude application is
to counting the possible number of, say 20-mers.
Another is to restriction enzymes: 4-cutters are more frequent
sites than 6 cutters or 8 cutters. Loosely,
their relative frequencies would be in proportions
.