next up previous
Next: The transmission disequilibrium Up: Stat 260: Statistics Previous: Hardy-Weinberg Equilibrium.

Rudiments of human disease gene mapping.

Disease associations involving red blood cell antigens

First came observations of associations between diseases and certain blood types.

Aird et al (1954): Data were collected from hospital patients in London. The blood groups of patients admitted for peptic ulcers was compared with a control group of patients admitted for other reasons. The data shows a relatively higher frequency of blood type O compared to that of blood type A among peptic ulcer cases than in the controls.

The ``Incidence ratio'' (Woolf (1954)), now called odds ratio or relative risk (RR) is calculated as follows:

Similar associations were found involving stomach cancer (excess of A). Indeed a lot of diseases have now been found to have associations with different blood types. Finding them was a popular pursuit in the 1950s. See, for example, Mourant et al (1978).

Disease associations involving HLA antigens

Many diseases have been shown to be associated with particular alleles at one or more HLA (human leucocyte antigen) loci.

Svejgaard et al (1979): This study is concerned with association between the HLA region and idiopathic Addison's disease. The data shows an association between this disease and allele 8 of the HLA-B locus.

Note: present here means presence of the antigen, so that B8 present means genotype B8/B8 or B8/BX, X different from 8.

There are many more examples. Between 50 and 100 diseases show a high RR with one or more alleles at an HLA locus. Pick your favourite auto-immune disease - there's a good chance it has such an association! For example...

Danish HLA and Disease Registry

A genetic model for these population associations

We have a marker locus (e.g. the ABO or HLA locus) with alleles , ...and
a disease susceptibility (DS) locus with alleles , .... Suppose that one allele D say, at confers susceptibility or resistance to the disease, and tends to occur with marker allele M in some population. Group the remaining alleles at into d, and the others at into m. We postulate that our observed disease- association is a result of gametic association in the population.

Gametic association in the population

This is where the combinations DM, Dm, dM and dm appear in gametes in frequencies incompatible with independence between D and M in the following table:

Association is non-independence of haplotypic frequencies, which means:

Here are population haplotype frequencies.

From gametic association to marker-disease gene linkage

If marker genotypes or alleles are associated with those of a disease susceptibility locus, this may be an indication of linkage between the marker and the DS locus. Why? By reasoning backwards from the following story:

General theoretical ``fact''

If two loci and are closely linked, and a new allele D say, appears at , it will tend to remain in the population (or disappear) together with the allele M say, at on the chromosome on which it appeared. This tendency will generally diminish with each subsequent generation, as recombination will tend to separate D and M, and hence reduce the population association. That was just what our derivation of convergence to linkage equilibrium showed. Of course that was theory: the conclusion followed within the model.

The need for caution

The term linkage disequilibrium is more frequently used than gametic association. However, it seems to presuppose linkage, which is usually what we want to demonstrate. Hence the desirability of avoiding the term.

Falk and Rubinstein's (1987) idea

If we are worried about reasons apart from linkage causing gametic association, then ideally we'd like to take our cases and controls from exactly the same population. Falk & Rubinstein had the idea of using the unborn ``complement'' siblings of the cases as a control group:

If two parents are a/b and c/d and their affected child is a/c at a marker locus, then b/d is a suitable genotype for an unaffected ``control''. Clearly this ``matching'' should reduce the impact of population structure.

Using data from the 9th HLA Workshop on families with a single IDDM child and typed parents, they found:

They defined . Note that the number of ``controls'' is not equal to the number of cases. This is because not all of the parents were available for typing which meant that the genotypes of some ``controls'' could not be determined.

Exercise 3: Does the exclusion of the ``controls'' that could not be determined create bias?

They also considered the alleles separately:

Here + means DR3 (resp. DR4) together with any other allele, so - means notDR3/notDR3 (resp. notDR4/notDR4).

The task ahead of us

Association between genotypes or alleles of the marker locus and disease status may be because of linkage. How do we turn this into a formal test?

Aim: to use disease-marker associations to test the hypothesis : marker unlinked to any disease susceptibility locus. (The alternative may be ``linked'' or ``on top of''; the latter has emerged as more popular in the context of genome scans).

To get this going we need three things:

 
                        		 a)   		 A disease model

b) A model for genes in the population

c) A sampling model

Concerning a) it is most often assumed that the disease model involves a single susceptibility locus (or sometimes two loci). We define some notation (below) which allows us to talk about disease models with any number of loci.

Concerning b), we usually see lots of independence assumptions (like random mating, Hardy-Weinberg equilibrium, linkage equilibrium, ...). We'll try to avoid such assumptions.

Concerning c), we usually hear silence, so I'll try and make some noise.

Some notation

We will suppose that we have L unlinked disease susceptibility loci, , where has alleles, , . Then we write multilocus genotypes in the form:

In this notation there is the possibility of specifying the paternal and maternal contributions by taking the i subscripts as referring to the paternal contribution and the j subscripts as referring to the maternal contribution at each locus. Also we define the multilocus penetrances as follows:

where , . In this notation we have the following expression for the probabilility of being affected:

Where appropriate we will also suppose a marker locus with any number of alleles .

Association is suggestive of linkage

Next we prove a little theorem that tells us that association between the genotypes at a marker locus and disease status is grounds for suspicion of linkage. We use the notation to mean independent of, a given bar to denote conditioning, g to mean (complete) genotype and to mean genotype at locus . For notational simplicity, we have just one DS locus, but it will be clear that the argument is general.

Let be a marker locus and a disease susceptibility locus. Assume the following two conditions:

which means independence of the table of joint genotypes at and :

Then affectedness .

Proof:

For the next step we note that

which gives the result

This concludes the proof.

We use this result to justify thinking in reverse, i.e. taking population association of either marker alleles, genotypes or phenotypes with disease status to be suggestive of linkage between the marker and a disease susceptibility locus.

Beyond this observation is the question of how to rigorously establish linkage. Next week the affected sib-pairs method will be discussed, which predates by about 20 years the method to be discussed this week. This week we will derive the transmission disequilibrium test (TDT) for linkage. This is a new method (5 years old) which has received a lot of attention.



next up previous
Next: The transmission disequilibrium Up: Stat 260: Statistics Previous: Hardy-Weinberg Equilibrium.



Simon Cawley
Mon Apr 20 20:03:22 PDT 1998