Javascript required
Skip to content Skip to sidebar Skip to footer

When Continuous Variation Occurs in a Phenotypic Trait the Trait is Likely Due to

  • Journal List
  • J Physiol
  • v.554(Pt 1); 2004 Jan 1
  • PMC1664744

J Physiol. 2004 Jan 1; 554(Pt 1): 40–45.

Identifying genes and genetic variation underlying human diseases and complex phenotypes via recombination mapping

Ulrich Broeckel

1Department of Cardiovascular Medicine, Department of Physiology, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA

Nicholas J Schork

2Department of Psychiatry, School of Medicine, University of California San Diego, San Diego, CA, USA

Received 2003 Jul 10; Accepted 2003 Oct 16.

Abstract

Understanding the mechanisms by which DNA and DNA variation influence diseases, naturally occurring phenotypic variation, and complex biological systems, has been one of the major tasks associated with contemporary human genetics research. The identification and characterization of specific genetic variations that influence particular human diseases and phenotypes is complicated by the fact that most diseases and phenotypes are influenced by many genetic and environmental factors. Thus, the identification of any particular phenotypically relevant factor might be hampered as other relevant factors may obscure its individual effect. Over the years numerous methods and study designs have been described to identify disease causing genes and mutations. One in particular – meiotic or recombination mapping – has received considerable attention over the last 50 years, and has been used widely with varying degrees of success. This review describes the motivation behind, and problems associated with, recombination mapping, in terms of both linkage mapping and linkage disequilibrium mapping.

Recombination mapping: basic principles

The intuition behind recombination mapping is fairly straightforward: one determines which genetic variants ('alleles') individuals within a family possess at a number of particular ('marker locus') sites on each chromosome (i.e. 'genotypes' them at a number of genomic polymorphic loci) and then assesses evidence as to whether or not a particular variant appears to be coinherited (or 'cosegregates' with) a particular trait or disease. If evidence for such cosegregation is found, then one can infer that a locus that actually influences the expression of the trait or disease resides near (or at) the locus harbouring the variants showing evidence for cosegregation with the trait or disease in question. Further genotyping can be pursued in the genomic vicinity of the marker showing 'linkage' to the trait or disease in question in order to identify loci with alleles exhibiting closer cosegregation with the trait or disease until the 'causal' or 'functional' locus is found.

The biological phenomena at the foundation of this gene discovery strategy is recombination of homologous chromosomes during meiosis (hence the labels 'meoitic', 'recombination', or, less formally but more frequently used, 'linkage' mapping). During meiosis homologous chromosomes pair up and exchange material. The probability of a recombination event occurring between loci far apart on a single chromosome is larger than for loci closer together. Hence, alleles at loci near each other are generally inherited (or cosegregate) together, making it possible to track influential loci transmitted from generation to generation by using alleles at neighbouring loci (i.e. the neighbouring locus alleles act as surrogates for the presence of alleles at other loci). By studying inheritance patterns of a trait or disease as well as marker alleles at loci across the entire genome, one can 'scan' the genome for loci that might influence that trait or disease in the absence of a priori information about the location of relevant genes.

Figure 1 provides a graphical representation of some of the concepts discussed. The alleles in the shaded areas are part of an ancestral chromosomal segment harbouring a mutation that has been transmitted to individuals in ensuing generations. Because of recombination events occurring within the genealogical links connecting the common ancestor who introduced the mutation into the pedigree to the two nuclear families representing individuals in the latest generations, the size of the common ancestral chromosomal segment (or haplotype) harbouring the mutation that is possessed by all individuals receiving the mutation is reduced (i.e. the reduced shaded region). Thus, the alleles in the boxed areas cosegregate with the mutation within the two families in the latest generation, but only the alleles within the shaded areas cosegregate with the mutation across the families. This distinction is important for putting recombination-based mapping strategies into specific contexts.

An external file that holds a picture, illustration, etc.  Object name is tjp0554-0040-f1.jpg

Graphical representation of the inheritance of patterns of alleles at neighbouring loci associated with an individual chromosome segment (i.e. 'haplotypes')

The locus with alleles '–' and '*' represents a locus harbouring a disease allel/disease mutation ('*')and a non-disease allele ('–'). The slightly shaded chromosome harbours the disease causing mutation. The two offspring in the first generation both inherited the mutation and, because no recombination event occurred to shuffle alleles on the parental chromosome harbouring the mutation, they also received all the alleles possessed by the parent with the mutation at loci in the vicinity of the site of the mutation (i.e. the offspring were transmitted the 'haplotype' TGC*TAC). The dotted lines represent generations passing. The two families in the most recent generations both have parents who have inherited the mutation. However, because of recombinations event occurring in the line of descent from their common ancestor possessing the mutation, they received different alleles at the two outermost loci away from the mutation, but still were transmitted, intact, a 'core' ancestral haplotype encompassing the mutation (i.e. GC*TA). Note that within each of these families, the offspring who were transmitted the mutation inherited longer haplotypes (i.e. AGC*TAC for the 'left' family and TGC*TAG for the 'right' family).

Recombination mapping: linkage versus linkage disequilibrium

The technical and mathematical details of recombination mapping are beyond the scope of this review (e.g. non-parametric methods versus parametric methods; variance components models, etc.). We encourage the interested reader to invest in the books by Ott (1999) and/or Rao & Province (1999) for details. However, there is a broad distinction between methods that exploit within family associations (as discussed above), which are generally associated with standard 'linkage mapping' studies, and those that exploit across family associations, which are generally referred to as 'linkage disequilibrium mapping' studies. The distinction carries with it some practical consequences: the size of the chromosomal segments that are shared among individuals within a family that have been transmitted a mutation are larger than those shared across members of different families, due to the greater number of meiotic events that separate individuals in those different families. Thus, one may need a denser set of markers to assess across family associations (i.e. to identify the small shared chromosomal segments and alleles at loci that neighbour the site of a mutation that influences a trait or disease) than is needed to assess within family associations. These concepts will be taken up later in the discussion on the motivation for the 'haplotype map' initiative and the section discussing study populations.

Heritability: genes at work

Before embarking on a mapping study to identify individual genes that might influence a trait or disease (i.e. 'phenotype'), it is important to assess whether or not a trait has an obvious genetic basis. Although this might sound like a trivial exercise, it is important to evaluate a phenotype critically as a first step in estimating the likelihood of success for a genetic mapping project. A number of methods have been proposed to obtain information about the overall genetic makeup of a disease or phenotype. For example, estimating the concordance of a phenotype between twin pairs can provide evidence for the genetic basis of a phenotype, as the comparison of the frequency of the trait among monozygotic twins (who share all their genes) and dizygotic twins (who share on average half of their genes) allows estimation and distinction of the overall contribution of genetic and shared environmental factors to the phenotype (MacGregor et al. 2000). Twin studies have been utilized over many years and have demonstrated, for example, that genetic factors contribute significantly to cardiovascular diseases such as hypertension, diabetes, or coronary artery disease (Marenberg et al. 1994; McCaffery et al. 1999; Hyttinen et al. 2003). Familial clustering of a phenotype, or the resemblance of relatives of any degree (e.g. siblings, cousins, etc.) with regard to a phenotype can provide further evidence that genetic factors contribute to a phenotype. One has to keep in mind that, depending on the phenotype in question, this type of analysis has to be interpreted cautiously as familial resemblance can be caused by shared familial environmental factors, rather than overt, inherited genetic factors (Guo, 1998).

Overtly Mendelian versus polygenic (or 'complex') phenotypes

In the simplest case, a single gene or allele may unequivocally determine the expression of a disease or phenotype. These phenotypes are likely to show clear and obvious patterns of inheritance in families (i.e. exhibit simple Mendelian inheritance patterns) and thus make gene mapping efforts fairly easy and straightforward (see, e.g. mapping studies on cystic fibrosis (Collins & Morton, 1998) and neurofibromatosis (Barker et al. 1987)). Unfortunately, the number of individuals affected by each single overtly 'Mendelian' or 'monogenic' disease and phenotype is quite rare (Lifton & Jeunemaitre, 1993). For the vast majority of phenotypes, such as cancer, blood pressure level, and height, there are likely many genes, genetic variants or alleles, and environmental factors that contribute to their expression. These phenotypes have a multifactorial aetiology and thus are often referred to as 'complex' phenotypes (Lander & Schork, 1994).

As can be expected, while there has been great success in identifying genes for simple Mendelian phenotypes using recombination mapping strategies, success in identifying genes contributing to complex phenotypes has not been as forthcoming. The identification of the genes contributing to complex phenotypes is made especially difficult by the fact that the contribution of any one gene or genetic variant to the phenotype might be obscured or confounded by the others. Thus, any one genetic variant might not show strict coinheritance with the phenotype in a family or pedigree, since different family members can express the disease for different reasons.

Phenotypic definition

One way to enhance or facilitate the identification of the genetic determinants of a phenotype is to refine the definition of the phenotype. Thus, misdiagnoses, ignoring important disease sequelae and aetiological factors, etc. will simply corrupt a gene mapping effort. This fact is basically accepted and is often used as an excuse for the failure of gene mapping studies (see, e.g. Risch & Botstein, 1996). Frequently, the selection of individuals with the phenotype is typically based on previously established criteria. For example in studies aiming to identify genes for hypertension, blood pressure criteria of >140/90 mmHg is often used to identify affected individuals. This definition is based on epidemiological studies showing that individuals with blood pressure higher than the proposed cut-off have an increased risk for cardiovascular disease (Kannel, 2000). However, looking back, these diagnostic criteria change considerably over time as new clinical and pathophysiological insights into the nature of hypertension have been obtained. In fact, it is now known that there are, for example, many subtypes of hypertension, each with its own unique genetic determinants (Lifton, 1993).

One option to deal with phenotypic definition issues is to measure phenotypes, which are assumed to be functionally related or potentially underlying a broader phenotype (such as a disease). Thus, measuring 'intermediate phenotypes' or 'endophenotypes' is widely used (such as catecholamine levels in hypertension, or cholesterol fractions in atherosclerosis), as they may have stronger genetic determinants than the more remote phenotype they relate to.

It also should be noted that in many cases the definition of a disease 'dichotomizes' what is actually a 'quantitative' or 'continuous' physiological measurement (e.g. hypertension status dichomotomizes blood pressure level). While for clinical purposes this may be useful, in genetic studies it may reduce important variation and hence reduce the underlying genetic 'signals' (Korczak & Goldstein, 1997). As a gene might affect not only one phenotype, but rather an entire network of correlated biological systems, the joint analysis of multiple phenotypes can also increase the power to detect genes (Stoll et al. 2001). Finally, it should be noted that with every measurement and definition of a disease, there is an associated measurement error or diagnostic inaccuracy. While it is difficult to estimate the effect of this phenomenon precisely, measurement errors obviously reduce the power of a mapping experiment, as does misclassification. The importance of a clear definition of the phenotype should not be underestimated, as it will critically influence the power of any genetic study.

The case for studying isolated populations

To conduct a mapping study, one needs either families or individuals whose genealogical links are either known or can be inferred. In this light it is important to keep in mind that the selection of the study population or set of individuals to contrast is as important as defining the phenotype. For complex trait gene mapping analysis, it has been argued that populations that are relatively young and isolated have advantages. The reasons for this are intuitive. First, there is likely to be greater environmental homogeneity among the people in those populations (e.g. rural China, Iceland, etc.), thus minimizing the number of environmental factors that influence a phenotype and thereby dampen relevant genetic effects. Second, there is likely to be greater genetic homogeneity. That is, such populations may have less genetic variation in them (i.e. have a more restricted 'gene pool') that influence the trait in question. Third, because younger populations are likely to have fewer meiotic events separating the individuals in them, the size of the chromosomal segments that are shared by individuals with the same phenotype – because those segments harbour variants that influence the expression of that trait – are likely to be larger. While a number of different events such as bottlenecks or the rate of population growth shape as well the extend of linkage disequilibrium (or non-random association of alleles at neighbouring loci and the alleles' ultimate coinheritance), there is support for the notion that the 'linkage disquilibrium' is stronger in these populations (Jorde et al. 2001).

In some instances the family structure or genealogical relationships of these population is also available, making genetic studies that much easier (see, e.g. the example of the Hutterites; Newman et al. 2001). There are also obvious limitations in the use of such populations. For example it might not be possible to study many diseases in the population because of their low frequencies and the limited number of individuals in those populations. In addition, if a gene has been localized or identified in one population, this gene might not necessarily play a role in other populations. Overall, isolated populations represent exceptional situations and in most cases the study population is chosen from non-isolated populations.

Genetic maps and the haplotype map initiative

To facilitate gene mapping efforts based on recombination, genetic researchers are constantly developing resources and assays for the identification and use of polymorphic markers for assessing cosegregation phenomena. Knowledge of the frequency of recombination between loci that populate the genome is crucial in mapping efforts and has led to the development of complimentary 'genetic maps' (which chart recombination frequencies) to 'physical maps' (which merely tally the number of bases or nucleotides separating loci) (Lander & Weinberg, 2000). Virtually all of the material relevant to genetic and physical maps that has been gathered by geneticists has been put in the public domain in an almost unprecedented attempt to collaborate and facilitate gene discovery efforts (see, e.g. Locus Link website, dbSNP, SNP Consortium website, etc.). Of course, the release of drafts of the entire DNA sequence of humans has greatly enhanced the ability to identify polymorphic sites and assess recombination (Lander et al. 2001; Venter et al. 2001).

One particularly interesting initiative aimed at facilitating gene discovery via recombination mapping is the 'Human Haplotype Map' Initiative (Couzin, 2002). As noted above, examining evidence for shared chromosomal segments or haplotypes across individuals in different families is complicated by the fact that, in the absence of knowledge of how many meiotic events separate relevant individuals, it is difficult to predict just how large the common segments might be (i.e. how recombination may have reduced the size of an ancestral chromosomal segment harbouring a trait-influencing mutation). The Hap-Map Initiative (as it is known) is a large-scale effort to determine, empirically, the sizes and 'block-like structure' of common, apparently conserved, chromosomal segments across any set of (arbitrarily related) individuals. With such knowledge of block sizes, one could assess evidence for 'across family associations' between DNA markers and, e.g. a disease, and potentially not have to collect family materials, since the information about the genealogical links between individuals would merely be reflected in the conserved haplotypes. Thus, ideally, one could walk through the entire genome and merely tally how often, e.g. diseased and non-diseased, individuals carry a certain chromosomal segment or haplotype. Those haplotypes that show pronounced differential frequency between diseased and non-diseased individuals likely harbour the relevant disease-influencing mutations. Although a great oversimplification, this description captures the main elements of linkage disequilibrium mapping and the motivation for the Hap-Map Initiative.

Modern sequence and functional analysis

Ultimately, once a marker or a set of markers are identified that appear to have alleles that cosegregate with a phenotype, an actual assessment of variations in the DNA sequence in the relevant region of the genome is necessary. While in some cases a single variation in a particular gene might causally influence a phenotype, in some instances a number of different variants in the same gene or its regulatory elements might influence a phenotype (Loots et al. 2000; Rioux et al. 2001). It is therefore imperative to survey all polymorphic sites in a genomic region and assess their putative 'functionality.' As sequencing still remains expensive, and there exist technological constraints with very high-throughput sequencing, this task can be laborious.

Ultimately, the proof that a particular genetic variant impacts a phenotype requires the analysis of that variation in ways that go well beyond mapping. For example, in vitro assays may be appropriate for assessing the effect of polymorphisms in regulatory regions via cell culture-based transfection studies that allow indirect measurement of transcriptional activation (Trinklein et al. 2003). In addition animal models can be used (e.g. knocking out gene or inserting a gene) to assess the phenotypic effect of certain variants.

Conclusions

The identification of genes underlying complex diseases in humans via recombination mapping is a major challenge, involving geneticists (for the overall study design) and molecular biologists (for developing assays and polymorphic markers); bioinformaticists (to manipulate and store DNA sequence information); statisticians (to develop mathematical constructs for assessing cosegregation phenomena); clinicians (to make diagnosis and partition diseases into homogeneous categories); and epidemiologists (to understand the population prevalence of a disease, assess environmental contributions to disease, and collect families). This brief review omits many associated uncertainties and complicating issues. However, some examples of genes which have been identified via large-scale recombination mapping efforts include the NOD2/CARD 15 as a susceptibility gene for Crohn's disease (Hugot et al. 2001; Ogura et al. 2001), the presenilin genes that influence Alzheimer's disease (Schellenberg et al. 2000), and the Calpain-10 gene that influences type II diabetes (Horikawa et al. 2000).

Ultimately, the knowledge and understanding of how genes and genetic variation influence complex phenotypes will change not only the way we might diagnose and treat disease, but it will also give us an unprecedented view on inheritance and the history of human species.

Acknowledgments

U.B. is funded in part by grants from the National Heart Lung and Blood Institute (Hypertension SCOR program, HL54998; the SCOR program on Ischaemic Heart Disease HL 65203; and HL074321). N.J.S. is funded in part by NIH grants associated with the NHLBI Family Blood Pressure Program ('FBPP; ' HL64777-01); the NHLBI hypertension SCOR program (HL54998); the NIH Pharmacogenetics Network (HL69758-01); and the NIMH Consortium on the Genetics of Schizophrenia ('COGS; ' MH06557-01A1).

References

  • Barker D, Wright E, Nguyen K, Cannon L, Fain P, Goldgar D, Bishop DT, Carey J, Baty B, Kivlin J, et al. Gene for von Recklinghausen neurofibromatosis is in the pericentromeric region of chromosome 17. Science. 1987;236:1100–1102. [PubMed] [Google Scholar]
  • Collins A, Morton NE. Mapping a disease locus by allelic association. Proc Natl Acad Sci U S A. 1998;95:1741–1745. [PMC free article] [PubMed] [Google Scholar]
  • Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomics research. Nature. 2003;422:835–847. [PubMed] [Google Scholar]
  • Couzin J. Human genome. HapMap launched with pledges of $100 million. Science. 2002;298:941–942. [PubMed] [Google Scholar]
  • Guo SW. Inflation of sibling recurrence-risk ratio, due to ascertainment bias and/or overreporting. Am J Hum Genet. 1998;63:252–258. [PMC free article] [PubMed] [Google Scholar]
  • Horikawa Y, Oda N, Cox NJ, Li X, Orho-Melander M, Hara M, Hinokio Y, Lindner TH, Mashima H, Schwarz PE, et al. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet. 2000;26:163–175. [PubMed] [Google Scholar]
  • Hugot JP, Chamaillard M, Zouali H, et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature. 2001;411:599–603. [PubMed] [Google Scholar]
  • Hyttinen V, Kaprio J, Kinnunen L, Koskenvuo M, Tuomilehto J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes. 2003;52:1052–1055. [PubMed] [Google Scholar]
  • Jorde LB, Watkins WS, Bamshad MJ. Population genomics: a bridge from evolutionary history to genetic medicine. Hum Mol Genet. 2001;10:2199–2207. [PubMed] [Google Scholar]
  • Kannel WB. Elevated systolic blood pressure as a cardiovascular risk factor. Am J Cardiol. 2000;85:251–255. [PubMed] [Google Scholar]
  • Korczak JF, Goldstein AM. Sib-pair linkage analyses of nuclear family data: quantitative versus dichotomous disease classification. Genet Epidemiol. 1997;14:827–832. [PubMed] [Google Scholar]
  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed] [Google Scholar]
  • Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. [PubMed] [Google Scholar]
  • Lander ES, Weinberg RA. Genomics: journey to the center of biology. Science. 2000;287:1777–1782. [PubMed] [Google Scholar]
  • Lifton RP. Genetic factors in hypertension. Curr Opin Nephrol Hypertens. 1993;2:258–264. [PubMed] [Google Scholar]
  • Lifton RP, Jeunemaitre X. Finding genes that cause human hypertension. J Hypertens. 1993;11:231–236. [PubMed] [Google Scholar]
  • Loots GG, Locksley RM, Blankespoor CM, et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross–species sequence comparisons. Science. 2000;288:136–140. [PubMed] [Google Scholar]
  • MacGregor AJ, Snieder H, Schork NJ, Spector TD. Twins. Novel uses to study complex traits and genetic diseases. Trends Genet. 2000;16:131–134. [PubMed] [Google Scholar]
  • Marenberg ME, Risch N, Berkman LF, Floderus B, de Faire U. Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med. 1994;330:1041–1046. [PubMed] [Google Scholar]
  • McCaffery JM, Pogue-Geile MF, Debski TT, Manuck SB. Genetic and environmental causes of covariation among blood pressure, body mass and serum lipids during young adulthood: a twin study. J Hypertens. 1999;17:1677–1685. [PubMed] [Google Scholar]
  • Newman DL, Abney M, McPeek MS, Ober C, Cox NJ. The importance of genealogy in determining genetic associations with complex traits. Am J Hum Genet. 2001;69:1146–1148. [PMC free article] [PubMed] [Google Scholar]
  • Ogura Y, Bonen DK, Inohara N, et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature. 2001;411:603–606. [PubMed] [Google Scholar]
  • Ott J. Baltimore: Johns Hopkins University Press; 1999. Analysis of Human Genetic Linkage. [Google Scholar]
  • Rao D, Province MA. Genetic Disection of Complex Traits. Advances in Genetics. Academic Press; 1999. [Google Scholar]
  • Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, Cohen Z, Delmonte T, Kocher K, Miller K, Guschwan S, et al. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet. 2001;29:223–228. [PubMed] [Google Scholar]
  • Risch N, Botstein D. A manic depressive history. Nat Genet. 1996;12:351–353. [PubMed] [Google Scholar]
  • Schellenberg GD, D'Souza I, Poorkaj P. The genetics of Alzheimer's disease. Curr Psychiatry Rep. 2000;2:158–164. [PubMed] [Google Scholar]
  • Stoll M, Cowley AW, Jr, Tonellato PJ, Greene AS, Kaldunski ML, Roman RJ, Dumas P, Schork NJ, Wang Z, Jacob HJ. A genomic-systems biology map for cardiovascular function. Science. 2001;294:1723–1726. [PubMed] [Google Scholar]
  • Trinklein ND, Aldred SJ, Saldanha AJ, Myers RM. Identification and functional analysis of human transcriptional promoters. Genome Res. 2003;13:308–312. [PMC free article] [PubMed] [Google Scholar]
  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed] [Google Scholar]

Articles from The Journal of Physiology are provided here courtesy of The Physiological Society


lavaterpronow.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1664744/