The term genome refers to the entire chromosomal DNA including the genes responsible for an organism. The proteome refers to all of the proteins responsible for an organism. Genes exert their influence directly through producing proteins or indirectly through regulating protein-coding genes. Each gene produces a unique protein, referred to as a polypeptide. Some proteins are made of two or more polypeptides and a significant proportion of genes use alternative splicing to produce more than one form of the same polypeptide. In addition to genes that encode proteins, there are genes that transcribe RNA without encoding protein. The human genome within each cell consists of approximately 3 billion bases in the form of 23 pairs of chromosomes of which 22 pairs are homologous (their sequences are similar) referred to as autosomes, and the remaining pair contain the sex chromosomes which, in the male, consists of X and Y and, in the female, two X chromosomes. Each chromosome is a long molecule made of DNA. DNA is comprised of only four bases: adenine (A), guanine (G), cytosine (C), and thymidine (T) (see Chap. 6). The sequence of these four bases determines all inherited characteristics. The average length of a chromosome is about 135,000,000 base pairs (bp). The longest chromosome, chromosome 1, has more than 250,000,000 bp. The smallest, chromosome 21, has only 50,000,000 bp. The 23 chromosomes together contain a total of 3 billion bp (Table 10–1). The chromosomes contain the genes which themselves are discrete units with a start and stop point and vary in size from 10,000 to more than 2,000,000 bp. The estimated average size of a gene is about 20,000 bp. Most genes encode RNA transcripts which exert their ultimate influence through regulating protein-coding genes but themselves do not code for a messenger RNA (mRNA) or protein. Only a minority of RNA transcripts encode proteins. The genes are nonrandomly distributed along chromosome into gene-rich and gene-poor regions. The ends of chromosome are referred to as telomeres. The centromeres are those regions of chromosome that attach to the mitotic apparatus during cell division. Both telomeres and centromeres are comparatively gene-poor regions. Embedded in the DNA sequence of each chromosome are genes that encode for proteins (2% of the DNA)1 and others which encode for all of the RNAs. Interspersed between the sequences coding for proteins and RNAs are sequences to which factors bind to regulate when and how much of a gene is transcribed into protein or RNA. These sequences are referred as regulatory DNA elements. The factors binding to these elements are proteins referred to as transcription factors. These proteins have specific DNA-binding sites enabling them to attach to the regulatory DNA element to exert their control over transcription.
TABLE 10–1. The Human Genome