Through the Human Genome Project, a newfound wealth of knowledge was revealed, a blue print for human variability. However, researchers quickly realized that the human genome was a very basic structure, and the protein coding genes could not completely explain the vast differences seen among humans. Researchers consequently began to search for other parameters that contributed to human variability. In addition to examining the protein-coding regions of DNA, researchers began to analyze the structure, function, and products of the entire human genome. Multiple disciplines quickly developed and flourished, ushering in a new phase of discovery—the era of omics.
Following the completion of the Human Genome Project interest reignited in the already established discipline of epigenetics. Originally termed by British researcher, Conrad Waddington in the early-1940s, epigenetics describes the mechanisms through which undifferentiated cells develop into differentiated cell types such as myocytes, neurons, adipocytes, etc.16,17 It is now well established that DNA undergoes reversible, non-encoded modifications, which ultimately influence phenotype.18 Multiple factors, including developmental stage, age, and environment have been linked to epigenetic changes. Research has also revealed that these modifications can be maintained and propagated to daughter cells.
Within the past two decades, efforts have focused on examining the epigenomes (epigenetic changes) of the nearly 200 cell types contained within the human body. Researchers have identified several epigenetic modifications affecting transcription, including DNA methylation, histone modification, nucleosome/chromatin packaging, and RNA transcripts (Figure 88–4).19 DNA methylation describes the addition of a methyl group to cytosine, creating 5-methylcytosine (5 MeC).20 Less commonly, cytosine can also undergo hydroxymethylation. Numerous studies have demonstrated that hyper-methylation represses transcription at promoter regions.21 Histones, which are the scaffolding proteins that support DNA packaging, similarly undergo methylation, as well as acetylation, phosphorylation, and ubiquitination. These changes alter the structure of the histone tail, which restricts or facilitates the transcription machinery’s access to nucleotides. Nucleosome positioning within the chromatin has also been demonstrated to influence transcription. Akin to histones, the packaging of the nucleosomes exposes certain areas of DNA, which influences binding of transcription factors such as enhancers, silencers, and insulators.
Epigenetic modifications. Methyl groups bind to both DNA and histones, altering their structure (methylation). Histones also undergo acetylation, phosphorylation, and ubiquitination, which influences coiling of DNA around histones. These modifications consequently influence gene transcription. (Reproduced with permission from National Human Genome Research Institute.)
With renewed interest in epigenetics, The National Human Genome Research Institute established two large-scale research projects to examine the human epigenome. In 2003, the ENCyclopedia of DNA Elements (ENCODE) Project was created, with the objective of identifying all functional elements: RNA-transcribed regions, protein-coding regions, transcription-factor-binding sites, chromatin structure, and DNA methylation.22,23 At its inception, 35 international research groups examined 30 million bases of human DNA, equivalent to about 1% of the genome. In the second phase of the project, researchers analyzed 1640 genome wide data sets, from 147 cell types.
Through the ENCODE project an intricate regulatory system was revealed.24,25 Among the notable findings was pervasive transcription, meaning that the majority of the genome was transcribed. Areas outside of the protein coding regions that were previously thought to be silent were found to undergo transcription. Researchers further revealed that the majority of the human genome (80.4%) participated in a biochemical function. Much of which was thought to be inert DNA, was found to contain regulators of expression, including RNA elements and transcription factor binding sites. Analysis further demonstrated regulatory elements acting both locally (cis) and distally (trans).
In 2008, the National Human Genome Research Institute initiated the Roadmap Epigenomics Program (REP) to more thoroughly characterize the human epigenome (Figure 88–5).26 Specifically, the project sought to examine the temporal changes of the epigenome from stem cells to mature cells in human tissues. The project further aimed to examine the epigenetic changes associated with diseased states. In 2015, the REP published its preliminary integrative analysis of 127 reference human epigenomes.27
Tissue and cell types utilized to examine the human epigenome in the Roadmap Epigenome Project. (Reproduced with permission from Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, et al: Integrative analysis of 111 reference human epigenomes, Nature 2015 Feb 19;518(7539):317-330.)
REP analysis revealed several epigenetic modifications including histone marks, DNA methylation, DNA accessibility, and RNA expression. As confirmed by previous studies, REP demonstrated that low DNA methylation states were associated with high accessibility for transcription, whereas hypermethylated states were associated with low accessibility. Notably, on average 68% of the reference epigenome was found to be quiescent. Further analysis revealed that embryonic-stem-cell-derived cells and pluripotent cells often exhibited methylation near regulatory elements, whereas differentiated cells exhibited methylation loss. Finally, the REP demonstrated specific epigenetic modifications that are associated with diseased states, including enrichment of enhancers, promoters, and open chromatin.
Following the Human Genome Project, international efforts further collaborated to catalog human genomes of various populations, forming the International HapMap Project (2002–2007). This project sought to identify and catalog human genomes to assist with linking genetic variants to disease states.28,29 To accomplish this objective the project mapped single nucleotide polymor—phisms (SNPs) of 1184 individuals across 11 populations. Similar to fingerprints, representative SNPs (tag SNPs) were recognized as unique identifiers, allowing for identification of areas linked to individual genes (Figure 88–6).
SNPs, haplotypes, and corresponding tag SNPs. Within short segments of chromosomes, single nucleotide polymorphisms (SNPs) have been identified through sequencing. These variations can be utilized as unique identifiers. Within a series of SNPs, specifically designated polymorphisms known as tag SNPs, can be further utilized to serve as surrogate markers for a specific chromosome. (Reproduced with permission from International HapMap Consortium: The International HapMap Project, Nature. 2003 Dec 18;426(6968):789-796.)
The database created by the HapMap project has subsequently been utilized to conduct genome wide association studies (GWAS) between healthy and diseased populations. GWAS have allowed researchers to grossly sift through entire genomes and identify allele variants associated with diseases. Thus far, GWAS have facilitated the isolation of genes associated with cancer, diabetes, obesity, and dyslipidemia.30,31,32 Additional GWAS have examined autoimmune diseases such as Crohn disease, ulcerative colitis, and psoriasis. Despite their success, GWAS have demonstrated limitations. Tag SNPs do not precisely localize involved genes; they only serve as general landmarks. Furthermore, disease processes are multifaceted, involving numerous interactions between genes and gene products rather than a single locus.
To better understand human variability and the complexity of disease states, researchers have also examined the human transcriptome. In 2000, the National Human Genome Research Institute initiated the Mammalian Gene Collection (MGC). This project was tasked with creating a database containing at least one complimentary DNA (cDNA) per gene for both human and mouse genomes.33 The MGC later broadened its initial objective, further constructing rat and cow cDNA databases (completed 2009). The MGC and similar international consortiums anticipated that these databases would facilitate functional and comparative genomics. Specifically, transcript analysis would provide insight into the process of gene transcription, protein expression, and the networks of communication occurring at the molecular level.
Within the human transcriptome, approximately 80,000 transcript products have been identified originating from the 20,000 to 25,000 protein coding genes.34 This disproportionate relationship of genes to transcription products, spurred considerable research focusing on transcription initiation and termination.35,36,37 Researchers have since discovered that an individual gene may contain several transcription start sites (TSS), which can then produce various transcript products. Selection of a specific start site has been found to depend on multiple factors, including a cell’s developmental phase, cell-cell signaling, and tissue type. Variation in transcription has also been linked to the terminal processing of transcripts through the addition of adenosine monophosphate moieties. This modification, referred to as polyadenylation, influences processing and nuclear transport of transcripts and has been linked to various developmental phases of the cell. After transcription occurs, transcription products undergo additional processing, where certain areas are spliced out to create various alternative spliced products.
Numerous studies have compared the human transcriptomes during healthy and diseased states.38 Transcriptomes of neurodegenerative diseases (eg, Alzheimer’s disease), malignancies (eg, breast, prostate cancer), and respiratory diseases (eg, asthma, COPD) are a few of the many disease states that have been analyzed.39 Transcriptome analysis of brain tissue from patients with Alzheimer’s disease has revealed alternative promoter regions and transcription start sites.40 In various malignancies, transcript studies have also identified abnormal fusion transcripts and alternative splicing (Table 88–2). It is theorized that these alternate transcript products result in cellular dysfunction and disease. Although incredibly informative, transcriptomics has limitations. Specifically, transcript products may be extremely fragile and of such small concentrations that current technologies cannot characterize them. Furthermore, transcriptomics primarily identifies genomic metabolites and has limited capabilities to identify the functional properties of transcript products.
Table 88–2Transcription derangements identified in cancers. ||Download (.pdf) Table 88–2Transcription derangements identified in cancers.
|Cancer Type ||Analysis Type ||Results |
|Hodgkin lymphoma ||PE WT ||Identification of gene fusions, among which fusions CIITA-involving |
|Non-Hodgkin lymphoma ||PE poly-A+ ||Detection of 109 genes with multiple somatic mutations, including those involved in histone modifications |
|MDS ||FR small RNA ||Discovery of novel miRNA differentially expressed in tumor |
|Breast cancer ||FR poly-A+ ||Alternative splicing and alterations in gene expression (ie, LOX, ATP5L, GALNT3 and MME) have been identified in modulated ERBB2 overexpressing mammary cells |
| ||PE poly-A+ ||Identification of 3 known and 24 novel fusion transcripts (including VAPB-IKZF3) |
| ||SE, PE poly-A+ ||Discovery of gene fusions in breast cancer transcriptomes with BRCAI mutations, including novel in-frame WWC1-ADRBK2 fusion in HCC3153 cell line and ADNP-C20orf132 in a primary tumor |
| ||FR poly-A+ ||Investigation of EMT-associated alternative splicing events regulated by different classes of splicing factors (RBFOX, MBNL, CELF, hnRNP, or ESRP) |
|Prostate cancer ||SE poly-A+ ||Detection of transcription-induced chimeras in prostate adenocarcinoma |
| ||PE WT ||Discovery and charcterization of seven novel cancer-specific gene fusions (four involving non-ETS) |
| ||PE poly-A+ ||Identification of 121 unannotated prostate cancer-associated ncRNA transcripts, including the characterization of PCAT-1 |
| ||FR poly-A+ ||25 Previously undescribed alternative splicing events involving known exons, and high-quality singlenucleotide discrepancies, have been detected in prostate cancer cell line LNCaP |
|Melanoma ||PE poly-A+ ||Identification of 11 novel gene fusions, 12 readthrough transcripts, somatic mutations and unannotated splice variants |
| ||FR poly-A+ ||Somatic CNVs affecting gene expression and new potential genes and pathways involved in tumorigenesis have been identified in seven human metastatic melanoma cell lines |
|Ovarian cancer ||PE poly-A+ ||Discovery of the first gene fusions in ovarian cancer through a novel computational method |
|Sarcoma ||PE poly-A+ ||Detection of novel gene fusions in sarcoma through a novel computational method |
| ||FR ribodepletion ||Evidence of a closer relationship between gene expression levels and protein expression in a human osteosarcoma cell line |
|Oral carcinoma ||MP WT ||Association of allelic imbalance with copy number mutations and with differential gene expression |
|Hepatocellular carcinoma ||SE WT ||Characterization of HBV-related HCC transcriptome, including identification of exon-level expression changes and novel splicing variants |
After the completion of the Human Genome Project, researchers also focused efforts towards examining the protein products of the human genome. In 2009, the Human Proteome Organization (HUPO) announced an international effort to expand upon previous studies examining the human proteome in healthy and diseased states.41 In September 2010, HUPO officially initiated the Human Proteomic Project (HPP) to construct a comprehensive library of human proteins. The HPP was also tasked with examining protein expression, splice variants, post-translational modifications, and localization of proteins in cells, tissues, and organs. Furthermore, the project planned to analyze proteins during all developmental stages of adult life and under various physiologic and pathologic conditions. Nearly five years after the induction of the HPP, the first draft of the human proteome was published in 2014.42 Ongoing efforts continue internationally examining each chromosome to complete a more comprehensive human proteome database.
In the last two decades, research has extended beyond the human chromosome to examine the microorganisms that inhabit the human body, establishing the discipline of microbiomics. One of the first projects examining the human microbiome was the France Human Intestinal Metagenome Initiative (HIMI) established in 2005. The National Institutes of Health (NIH) soon thereafter followed, initiating the Human Microbiome Project (HMP) in 2007.43 At its inception, the project’s initial objective was to examine the microbiomes of the mouth, gastrointestinal tract (stool), skin, and vagina. These databases could then be utilized to identify dysfunction, develop treatments, and possibly prevent illnesses linked to dysbiosis. The HMP project also included an initiative to examine ethical, legal, and social implications associated with genomics. Approximately, one-year after the establishment of the HMP, the International Human Microbiome Consortium formed in 2008 to further foster collaboration worldwide.
Microbiomes of multiple disease processes have since been analyzed44,45 Most research to date has focused on the microbiomes of the gut and skin.46 Numerous studies have repeatedly shown a strong association between diseased states and alterations in the gut microbiome.47 For example, in obesity and the Crohn disease, there are associated changes in the diversity and composition of gut flora. Environmental stressors have also been shown to induce virulent behavior of bacteria in the gut. Microbiomic research suggests that there are numerous symbiotic and dysbiotic relationships between the human body and microorganisms which appear to directly influence human health and disease.
As omic research develops, it becomes increasingly apparent that the human genome exists within a highly integrated functional complex. Researchers are only beginning to understand the intricacies of the human genome, epigenome, transcriptome, proteome, and microbiome. Omic research is in its formidable years, and growing exponentially. Each discovery raises new questions and hypotheses, giving rise to an ever-expanding discipline.