Looking at word frequencies using any model as well (including the
i.i.d. one), we
one need
and
.
Even if they have a known form, they will involve unknown parameters.
For example, if you have a formula like
,
you still need an estimator of
(and perhaps plug it in).
As an alternative, under independence model, let's calculate

where
=frequency of A, etc.
Suppose w=AAA.
All strings (S) of length N with given
are equally probable (check!), so that we have

Now,

where
.
What we're doing in the third line is
nothing more than
just to take out two As in front of the A at location k
compressing the string and using the known value of
.
You can actually get the result from line 2 to the answer by direct calculation, since

However, the above argument will give you a feel for the way we prove the similar result in a Markovian context.