With fewer human genes than expected, where do all those proteins get their instructions?
The Human Genome Project's discovery that the human body runs on an instruction manual of a mere 5,000 or so genes--compared to the worm's 19,000, the fruit fly's 13,000, and the tiny mustard relative Arabidopsis thaliana's 25,000--placed humanity on an even playing field with these other, supposedly simpler, organisms. It was a humbling experience, but humility quickly gave way to awe with the realization that the human genome might encode 100,000 to 200,000 proteins. Scientists base this number on the analysis of DNA sequences--called expressed sequence tags, or ESTs--that are reverse-transcribed from mRNAs. The question is, where is the information for all those extra proteins?
The disparity between gene and protein number challenges the classic `one gene/one polypeptide' paradigm that has governed molecular biology for decades. Still, some researchers decided long ago that the paradigm was simplistic--to them news of the paucity of protein-encoding genes in the human genome wasn't surprising. "It seems that gene number is not as important as gene interactions," says Xiangyin Kong, an investigator at the Shanghai Research Center of Biotechnology at the Chinese Academy of Sciences.
The human genome sequence, particularly its repetitive nature, also rejuvenated interest in genetic mechanisms that underlie the evolution of biological complexity. Eukaryotes, with their myriad cell organelles and other compartments, are far more complex than the evolutionarily much older bacteria and archaeans. How does a large eukaryotic organism like a plant or animal extract the most information from its genome? Recent research reveals a number of strategies for genome economy, and some are indeed surprising.
INTRONS, EXONS, AND ALTERNATE SPLICING
Many genes in multicellular eukaryotes are discontinuous, comprised of exon sequences that encode the actual protein, and introns, whose transcripts are spliced out from a much longer initial mRNA. This discovery spawned the idea that a `split' gene can encode more than one protein by splicing together alternate exon combinations. In effect, the cell shuffles exons, assembling a winning protein hand from assorted cards in its genetic deck. At least 10,000 human genes mix and match exons that encode specific protein regions, or domains. Even exons from different genes can come together. Tissue plasminogen activator, for example, is cobbled together from gene parts that encode plasminogen, epidermal growth factor and fibronectin.
The human genome displays considerable internal redundancy--genes comprising families and superfamilies--so the number of sequences that are truly different is much lower than the total. Throughout the human genome, …