next up previous
Next: Applying Information Theory Up: Finding Genes Previous: Locating splice sites

A brief introduction to some information theory

Given a distribution , define the entropy of p, , by

(Unless otherwise specified, it is assumed that we are using base 2, in which case the unit of entropy is ``bits''). The entropy of a distribution can be thought of in some sense as the ``surprise index'' of a random variable associated with that distribution. Think of it this way: the surprise index in playing the lottery is small (as we expect to lose and almost always we are not surprised by the final outcome) whereas the surprise index in tossing a coin is high (we can't make much of a guess as to the final outcome, so the surprise index is higher). In fact, it can be shown that if the ``surprise'' of an event...
is a function of
increases as decreases
is additive over independent events
Then is the ``correct'' measure of surprise to use.

Another way of thinking about entropy is to consider it to be the average number of ``yes/no'' questions needed to identify the outcome of a random experiment. For example, consider drawing randomly from with probabilities , on average two questions will be required to identify the outcome (``is it G or T?'' followed by ``it is T?'' for example). But if the probabilities were instead then most of the time just one question will do (``is it T?''). See Cover & Thomas [2] for much more on information theory generally, including a more complete discussion on what follows.

For distributions p,q on define the divergence (also know as the relative entropy or the Kullback-Leibler distance) between p and q, by

(As with entropy, we assume the log is using base 2, and the unit of divergence is the bit). So if distribution p corresponds to a splice site, and q corresponds to background, then is the average value in the log likelihood ratio when at a splice site. This can be used as a measure of the effectiveness of the log likelihood ratio, see below.

Exercise 2: show that , equality iff p is uniform.

Exercise 3: show that , equality iff p=q.

Exercise 4: Show that and , where is n iid trials of p.

An important consequence of exercise 4 is that if you have two distributions p,q and an ability to distinguish between them, then with 2 independent trials you have twice the ability to distinguish between them.

Stein's Lemma: Suppose we want to discriminate between a null q and an alternative p, using n iid random variables distributed as p or q. Let be the acceptance region for q, and for any let

Then

or in other words, like . This gives us a very direct interpretation of , it is a measure of the rate at which the power of the hypothesis test (of q against p) increases.





next up previous
Next: Applying Information Theory Up: Finding Genes Previous: Locating splice sites



Simon Cawley
Fri May 1 15:50:13 PDT 1998