How the TDT came to pass
Ott (1989) and Terwilliger & Ott (1992): saw merit in disaggregating the two genotypes containing a given marker allele, and reformatting the data along the following lines:

A contribution of 1 to a cell in this table corresponds to the fate of one parent's alleles, in relation to one affected child.
They called their approach haplotype-based (H)HRR, and dubbed Falk &
Rubinstein's genotype-based (G)HRR. They also gave a
variety of tests and reformulations of the problem whose details need not
concern us. However, they were clear that at all times they were concerned with
the null hypothesis of no population gametic associaton between
and
.
Spielman et al (1993): In this paper the TDT was invented. It came from the recognition that Ott & Terwilliger's reformulation of Falk & Rubinstein's HRR, namely, writing the data in the form

permits a test of

rather than (or as well as) a test of the null hypothesis of no gametic
association. The TDT is based upon the observation than under the null hypothesis of no linkage,
and the fact that all the information concerning
is in b and c. In general

where A depends on penetrances and joint haplotype frequencies, and A = 0 when there is no gametic association (suitably defined).
A different approach to the TDT
In the usual analysis of TDT, the population assumptions of random mating and Hardy-Weinberg equilibrium for haplotypes are made. In this context the relevent notion of gametic association is linkage disequilibrium. (An exception to these wide-sweeping population assumptions was an analysis by Ewens and Spielman (1995) which specifically considered mixed populations). We shall take a different path and define a notion of gametic association that allows derivation of the TDT without the population assumptions of random mating and Hardy-Weinberg equilibrium.
To define this notion of gametic association we first set up
some notation. Let
be a marker locus with alleles
and
be a disease susceptibility locus with
alleles
as above. Then we write the population mating-type
frequencies as follows

where the genotype on the left,
, refers to the maternal
genotype, and the genotype on the right,
refers to the
paternal genotype. (Writing genotypes in this way is not meant to imply that the M and D loci are linked. It indicates that, for example, the mother received her
and
alleles from the same grandparent). The
notation is invariant under switches
between
and
, and
between
and
.
That is

The relevant notion of no association (for me) is the following.
No mating-type gametic association: is defined to be symmetry of
under switches between i and j, and between k and l.
That is, for all index combinations,

Notice that this is equivalent to symmetry under switches between s and t, and between u and v.
It is not hard to show that under the assumptions of Hardy-Weinberg equilibrium , random mating and linkage equilibrium between the DS locus and the marker, the above symmetry holds. However, if that is all that is needed to make our proof work, why assume more?
What we wish to analyse with the TDT are data consisting of the marker alleles transmitted and not transmitted by parents to affected children. This data can be put in a table as follows (where the allele pairs consist of a maternal allele on the left and a paternal allele on the right):

Take note of the subscripts in the cell counts
. The first two subscripts refer to the maternal alleles transmitted and not-transmitted (in that order), and the second two subscripts refer to the paternal alleles in a similar way.
In most existing discussions of this material, only one parent is considered. Run
the argument for this case in parallel with what follows.
In deriving the TDT we shall first of all consider a probability associated with each cell in the table. We define
to be the
probability of the mother transmitting
and not transmitting
, and the father transmitting
and not transmitting
, conditional on the parents genotypes at the marker locus and affectedness of the child. In order to write
the necessary probability expressions more compactly, we shall write
as short-hand for ``transmits marker allele
'', and
as short-hand for ``does not transmit marker allele
''. Using this notation we write

If we are considering the parents separately, then we use the notation

The null hypothesis of no association between parental alleles transmitted and affectedness of the child can be defined very naturally using the
(or
the
or the
) :

It turns out that this null hypothesis is the disjunction of two related nulls, that of no mating-type gametic association and that of no linkage between
and
.
Either of these nulls imply that our
are symmetric in ij and kl.
To show this we prove the following in the case of a single disease
susceptibility locus
(where
and
are the
recombination fractions between the marker locus
and the
disease susceptibility locus
for mothers and fathers respectively). Fix i,j,k and l.
Proposition. Assume either no mating-type gametic association or
.

Proof: First we define some terms
, related to the
terms, as follows:

What we will eventually prove is that either no mating-type gametic association or
implies that
. This gives us the result we want because it
is easy to show that

Exercise 4: Check this result.
Now we derive an expression for
which allows for comparison
between
, etc. This will involve expanding the expression to include the disease locus, and then expanding further with conditional probabilities. Then we shall use two assumptions to simplify the expansion into the form we want. These assumptions are frequently used in the
literature without being spelt out. Below is the definition of
followed by an expansion which includes all the possible joint genotypes at the disease and marker loci.

In the next step we expand this expression to include the different transmission
possibilities involving the alleles at the disease locus. We also expand the
notation so that
means ``transmitted allele i at the marker locus, and transmitted allele s at the disease locus''. The redundant ``not transmitted'' terms are omitted.
In the next step we split up each of the four probabilities above into products of conditional probabilities. We'll just show it for the first!
We examine each of the three probabilities in the derived expression.
The first probability is the mating-type frequency
. The second probability is a simple expression involving the recombination ratios
and
. But to write it we need to use the following assumption that we've been implicitly using all along.
Assumption 1: No segregation distortion.(cf. assumption G2 in week 6 notes).
This means that parents are equally likely to transmit either of their two alleles at a locus. This is extended to multilocus transmission by conditioning on presence or absence of recombination in each parent.
Then we write

assuming independence of recombination events between mothers and fathers, and no segregation distortion, respectively.
To simplify the third probability we use the following
Assumption 2: The child's phenotype is determined solely by his/her genotype at the DS locus.
Here
are the parents' genotypes at the locus/loci in the bracket,
is the parents' transmission at the locus/loci in the bracket
and
is child's genotype at the locus/loci in the bracket. (cf. assumption G3 in week 6 notes).
This enables us to write the third probability as
(since we are assuming only one disease locus
, so that what is transmitted at the marker locus to the child can be ignored too). Now we put all
this together to get

Similarily we can derive

and so on for the other two probabilities in the
expansion. Putting these together gives

To show that this gives the result we want, consider the similar expression
for
. It is

Under no mating-type gametic association we can swap
i and j in the expression on the right, and this gives the expression for
. So
. A similar argument estabilishes equality also with
and
. Under no linkage, i.e.
, the expression for
becomes

This concludes the proof of the proposition.
Testing the hypothesis.
The likelihood expression.
In order to test hypotheses concerning the
, we first need
to obtain a likelihood for the data. We'd like to believe that
the results for each set of two parents and an affected child are mutually
independent, when appropriately conditioned. Then we can write the likelihood expression as a product
of
terms. If we have data from family
in the form

where
is phenotype information (in this case, that the child is affected),
then we'd like to be able to split up the likelihood function into terms

This could be achieved directly by making the following sampling assumption
Assumption: Conditional independence of marker transmission between families, given marker genotypes and disease phenotypes.
where OTH means all the data for all of the OTHer families in the data set. This assumption means that data from the other families should not tell us anything about likely segregation at the marker locus if we know the parents genotypes at the marker locus and that the child is affected. This rules out using related
individuals from a pedigree.
However I don't assume this directly (how would we know?),
but rather go back
to the way in which the probabilities
were calculated
(via
) to see where we meet the need for assumptions.
What we do is write the
likelihood function

Next we analyse the probability

which is
conditioned on the rest of the data, and see what assumptions we need to make. In the first step we simply write-out the conditional probability (leaving out the redundant stuff)...
The numerator is similar to
. Indeed we can manipulate it in virtually the same way as for
to get the analogue of equation (
) on page
, which is

Each of the four terms in this expansion can be written as a product of
conditional probabilities in a similar way to
, but with
an extra term in the product to account for
. We show this for the first term (cf. equation (
) on page
). Writing the expression out in this way enables us to see the assumptions we
need to make in order to get the desired result. The expression is

To get the forms we want for the last three terms in this expression we make the following sampling assumptions (i.e. assumptions referring to the extent to which independence between families is necessary).
Assumption A. Independence of parental genotypes between families
This assumption can be thought of as ruling out related families from the sample. (cf. S2 in the week 6 notes).
Assumption B. The child's phenotype is (still) determined solely by his/her genotype at the DS locus.
This is the sampling version of assumption 2 on page
. (cf. S1 in week 6 notes).
Using these assumptions, and then the same argumentation and assumptions (1 & 2) as for
gives

Finally we get

Now we go back to the expression (
) on page
. The numerator has just been dealt with,
and we note that the denomimator can be written as the sum of four probabilities
expressing the four different transmission possibilities (same as for
in terms of the
). Writing it like this, the
terms cancel, and
what is left is
. This gives us the result that under assumptions 1, 2, A and B we can write the likelihood function as a product of the
.
Possible statistical tests.
To simplify the story, let us suppose that we only have data on the mothers (say) of affected children. In this case
simplifies to

where
and
.
Then the
terms simplify to

The likelihood function (of r) generated by data on mothers
with genotype
at the marker locus is

where n is the number of mothers with marker genotype
, and
is the number of these mothers
who transmitted
to their affected child, and so
.
There are various ways in which we might go about testing the null hypthesis of
, e.g. a likelihood ratio test (after specifying the alternative hypothesis), a score test, a Wald test, etc. Let us try a likelihood ratio test to compare the null hypothesis of
against the alternative r=0 (for example). Then we would compute

and compare the observed value to the null distributon. However, we don't know
and
.
What turns out to be much better is to use the score test. The score statistic
is

and it is computable.
This is the TDT. In essence, it tests whether there are an unusually large or
small number of tranmissions of i, under the null hypothesis
....
a binomial test. Ideally, we would prefer to combine all such likelihoods
over i and j and do a single score test in r. Unfortunately, our luck fails
here, for there is no computable score test for all transmission data: the unknown quantities
are play an important role as weights. However, there are
a number of sensible ways to proceed, but we must stop. Read the American
Journal of Human Genetics over the last couple of years for some of this
research, and/or try Ex. 5.
Exercise 5: A kind of "main-effects only" model for the ratios
is
, where the
are quantities expressing
the extent to which allele i is preferentially transmitted. With this
model for the
, describe the MLEs of the
and an overall
likelihood ratio test of the null
vs the alternative r=0.
Exercise 6: Discuss the extent to which we can include both parents' transmission data in a single analysis like the preceding one. Similarly, can we include transmission data for more than one affected child in the same family in an analysis like the above one?
References
G.H. Hardy, Mendelian properties in a mixed population, Science, vol. 27, 1908, pp.49-50.
Mourant et al, Blood groups and disease : a study of associations of diseases with blood groups and other polymorphisms; New York: Oxford University Press, 1978.
Warren J. Ewens & Richard S. Spielman, The Transmission/Disequilibrium Test: History, Subdivision, and Admixture, American Journal of Human Genetics, 57:455-465, 1995.