Definition

Consider a pair of genes A and B, each has 2 alleles A/a and B/b, respectively. When applying Mendelian inheritance on the population level, one can predict the relative frequency of all 4 combinations (i.e. haplotype), given the relative frequency of each allele.

Allele Relative Frequency
A \(p_A\)
a \(p_a = 1 - p_A\)
B \(p_B\)
b \(p_b = 1 - p_B\)
Haplotype Mendelian-predicted Population Frequency
AB \(p_A \cdot p_B\)
Ab \(p_A \cdot p_b\)
aB \(p_a \cdot p_B\)
ab \(p_a \cdot p_b\)

Sometimes the haplotypes occur more or less frequently than expected, because they are not independently inherited, as in Mendelian inheritance. As you may recall from your Statistics class, \(p_{AB} = p_A \cdot p_B\) if and only if \(A\) and \(B\) are independent. So when \(p_{AB} \neq p_A \cdot p_B\), they are dependent, or equivalently in genetics language, there is a linkage between \(A\) and \(B\), causing them more likely to stay with each other. There are a number of factors affecting such association, the most obvious of them (for the sake of your imagination) is probably their physical distance on the genome: the closer they are, the less likely they are separated during genetic recombination.

Measures of LD

Directly stemming from the definition above, LD is quantified by the difference between the observed probability of co-occurence and the predicted one under the independence assumption

\[ D_{AB} = p_{AB} - p_A \cdot p_B \] The quantity \(D\) above is difficult to interpret, the sign and magnitude depends on the allele frequencies, making it hard to compare across different markers. There is certainly a need for a normalized version of \(D\).

Normalized LD

\[ D' = D/ D_{max} \\ D_{max} = \begin{cases} \max\{-p_A p_B, -(1 - p_A)(1 - p_B)\} \; \text{if } D < 0 \\ \min\{p_A(1-p_B), p_B(1- p_A)\} \; \text{if } D > 0 \end{cases} \]

\(r^2\) - correlation coefficient for bi-allelic loci

For biallelic loci (2 allele per loci, as the example above), LD can be quantified by the coefficient \(r^2\) [1]

\[ r^2 = \frac{D^2}{p_A(1-p_A) p_B(1 - p_B)} \]

This coefficient turns out to have an interesting characteristic, because the quantity \(\chi^2 = n r^2\), \(n\) being the sample size (the number of gametes) was found to follow the \(\chi_{(1)}^2\) distribution asymptotically under the hypothesis \(r^2 = 0\) [3]. Thus statistical testing is readily available.

LD of multiple allelic loci or multiple loci

Unfortunately the matter quickly becomes complicated with additional alleles. For a complete description of correlation between multiple loci of multiple-allelic, one needs more than one values of the \(r^2\) coefficient [4]. Special methods have been proposed to help quantify the relationship between 2 multi-allelic loci [3] (using \(R^2\), an approximation of \(r^2\) for the multi-allelic case), or to quantify the linkage effect of multiple loci [5].

References

[1] “Linkage disequilibrium,” Wikipedia, Aug. 2018.

[2] J. M. VanLiere and N. A. Rosenberg, “Mathematical properties of the r2 measure of linkage disequilibrium,” Theoretical population biology, vol. 74, no. 1, pp. 130–137, Aug. 2008.

[3] D. V. Zaykin, A. Pudovkin, and B. S. Weir, “Correlation-Based Inference for Linkage Disequilibrium With Multiple Alleles,” Genetics, vol. 180, no. 1, pp. 533–545, Sep. 2008.

[4] D. Weissman, “Linkage disequilibrium with multiple alleles and loci,” Biology Stack Exchange. https://biology.stackexchange.com/questions/15743/linkage-disequilibrium-with-multiple-alleles-and-loci, Nov-2014.

[5] M. Kirkpatrick, T. Johnson, and N. Barton, “General Models of Multilocus Evolution,” Genetics, vol. 161, no. 4, pp. 1727–1750, Aug. 2002.