Image by Micah Baldwin
The human genome is present in virtually every cell of our bodies, and contains the complete set of instructions to build a human being. The first effort to read that instruction book—the Human Genome Project—wrapped up in 2001. Even then, it was clear that our genome was a large, complex, and puzzling thing. Fourteen years later, we’re still working to unravel all of its mysteries. Here’s a whirlwind tour of what we know so far.
DNA: Deoxyribonucleic acid, the material that carries genetic information and is present in virtually every cell in our bodies. The structure of DNA is a double helix: a sugar-phosphate backbone and nucleotides (“bases”) that pair up — adenosone with thymine, guanine with cytosine — between the backbones to encode information.
Chromosome: A thread-like structure comprising DNA and the scaffolding proteins that package it. In humans, chromosomes are either autosomes (numbered 1 through 22) or allosomes (the sex chromosomes X and Y). Everyone has two copies of each autosome and either two copies of X (females) or one X and one Y (males).
Gene: A structure in DNA that encodes proteins, the building blocks of cells. Structural elements of genes include the promoter (where RNA polymerase binds), exons (which encode for amino acids), introns (“spacers” between exons that are spliced out before translation), and untranslated regions (UTRs, which are transcribed by RNA polymerase, but not made into protein).
The Big Picture
The human genome comprises 3.2 billion base pairs, spread across 22 autosomes and two sex chromosomes. The autosomes are generally ordered by size; chromosome 1 is the largest (about 250 million base pairs), while chromosomes 21 and 22 are the smallest (48 and 51 million, respectively). Amusingly, the two sex chromosomes are dramatically different in size: chromosome X is 155 million base pairs (about the size of chromosome 7), but chromosome Y is just 59 million.
There’s also a tiny, often-overlooked chromosome in mitochondria, the energy-producing organelles found in human cells. The mitochondrial genome is miniscule in size (16,500 base pairs), but a single cell might have as many as 2,000 copies of it. Unlike autosomes and sex chromosomes, the mitochondrial genome is only inherited from the mother. Between that and the multiple-copies, it can give rise to some odd patterns of genetic inheritance.
Most of us picture chromosomes as the X-shaped things we learned about when studying mitosis in high school biology. That’s how they look under a light microscope during metaphase, when two sister chromatids (the original and its shiny new copy) are joined together at the centromere, a region of highly repetitive DNA sequence where proteins bind to pull sister chromatids apart.
Because the DNA replication machinery can’t copy all the way to the end of the molecule, chromosomes also have special structures at each end called telomeres. These are stretches of a six-letter sequence (TTAGGG, in humans) repeated over and over again. They’re essentially disposable bases, and they have to be, because a DNA strand gets progressively shorter every time a cell divides. The telomere-shortening process is so uniform that, by counting their size, it’s possible to estimate the number of times a cell has divided, and from that, the approximate age of the person.
Genes and Functional Elements
There are about 20,000 known genes in our genome that encode proteins (i.e. make messenger RNA that’s translated into protein). The fraction of bases that eventually encode protein sequence is exceedingly small: about 1.5%. The rest of the genome, the non-coding genome, nevertheless contains many other types of elements that can regulate things happening in a cell. Many of the elements you’ve probably heard about—promoters, untranslated regions (UTRs), splice sites, exons, and introns—are structures that help govern transcription (making messenger RNA) and translation (making proteins). We’ve discovered, however, that there are many other kinds of noncoding elements that help regulate when and how proteins are made:
- Transcription factor binding sites are short, specific base sequences that are recognized and bound by the proteins that drive transcription. For example, the sequence TATAAA is usually found in the gene promoter (upstream of the gene) and likely helps position RNA polymerase II—the enzyme that makes messenger RNA from DNA—to start in the right place.
- Enhancers are big stretches of noncoding DNA that help drive the activity of certain genes. These regions are believed to have binding sites for transcription factors and other proteins. Often, they are near the genes whose activity they enhance, but they can also be located thousands of base pairs away.
- Repressors are elements that do the opposite: they prevent genes from being transcribed. Usually this is accomplished by recruiting proteins that either bind or make chemical modifications to DNA so that it’s inaccessible to the transcription machinery.
- Noncoding RNA genes are transcribed into various kinds of functional RNAs, such as transfer RNA (tRNA; matches amino acids to specific codons) and ribosomal RNA (rRNA aids in translation). There are also about 800 genes that encode micro-RNAs, which are very short sequences (18-24 nucleotides long) that can block messenger RNA from being translated into proteins. They do this by binding complementary sequences in the untranslated region of the target mRNA.
If you counted the bases in all of the genes and other functional elements I’ve described so far, you’d come well short of 3.2 billion. Even if we understood all of the elements above perfectly well (which we don’t), it begs the question, what the heck does the rest of the genome do?
Honestly, we don’t know. I think that a lot of it will probably turn out to have no function whatsoever. Other parts might have a function that we simply don’t know about.
The Genome and Genetic Diseases
Get ready, because I’m about to make this relevant to speculative fiction.
When people hear the phrase “genetic disease,” the examples that often come to mind are severe inherited disorders, like sickle-cell disease, cystic fibrosis, and Huntington’s disease. Most of these are caused by very rare mutations in the coding region of a gene. This makes sense, because a mutation that disrupts or alters protein sequence is understandably capable of having a severe, immediate effect. Yet the vast majority of human traits that are “heritable” (i.e. have a genetic factor) are not so simply explained.
Many researchers, myself included, think that the genetic variation behind these is outside of the known coding regions. Think about it: a subtle change to a regulatory element could easily have an effect on a human being. For this mental exercise, let’s use the low density lipoprotein receptor (LDLR) gene. It makes a protein that transports LDL (the carrier of most cholesterol) out of the blood. Severe mutations in the coding region of LDLR cause an autosomal dominant hypercholesterolemia, a severe lipid disease. Instead, picture a subtle change in a regulatory element that influences the LDLR gene activity. It might not cause a severe, obvious effect. Over the 70+ years of the average human lifespan, however, even a very minor change can have long-term ramifications.
Now, picture the same scenario, but change “transports LDL” to “prevents magic use” or “protects against becoming a zombie.” There’s your SF/F story.
Did You Like This Article?
1,124 total views, 2 views today