AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

Pseudogenes: are they "junk" or functional DNA?

Annual Review of Genetics

| January 01, 2003 | Balakirev, Evgeniy S.; Ayala, Francisco J. | This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

Key Words neutral evolution, evolutionary rate, DNA polymorphism, Drosophila, potogene

Abstract Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles. We first review the Drosophila literature and then extend the discussion to the various functional features identified in the pseudogenes of other organisms. A pseudogene that has arisen by duplication or retroposition may, at first, not be subject to natural selection if the source gene remains functional. Mutant alleles that incorporate new functions may, nevertheless, be favored by natural selection and will have enhanced probability of becoming fixed in the population. We agree with the proposal that pseudogenes be considered as potogenes, i.e., DNA sequences with a potentiality for becoming new genes.

 
CONTENTS 
 
INTRODUCTION 
PSEUDOGENES IN DROSOPHILA 
  Evolutionary Conservation 
  Pseudogene Expression 
  Regulation of Gene Expression 
  The [beta]-esterase Cluster of D. melanogaster: Expression, 
   Evolutionary 
  Conservation, Recombination, and Gene Conversion 
EVOLUTIONARY CONSERVATION OF PSEUDOGENES 
PSEUDOGENE TRANSCRIPTION AND EXPRESSION 
REGULATION OF GENE EXPRESSION 
PSEUDOGENES AS RESERVOIRS FOR GENERATING 
 GENETIC DIVERSITY 
  Antibody Diversity 
  Antigenic Variation 
  Other Gene Systems 
RECOMBINATION AND SEQUENCE POLYMORPHISM 
CONCLUSIONS 

INTRODUCTION

Pseudogenes have been defined as nonfunctional sequences of genomic DNA that are originally derived from functional genes, but exhibit such degenerative features as premature stop codons and frameshift mutations that prevent their expression (97, 107, 132, 145, 164, 226, 231, 237). Pseudogenes are thought to arise by tandem duplication of genes, with ensuing loss of function as a result of gradual accumulation of disabling mutations (107, 132, 164, 237). "Processed" pseudogenes lack introns, and presumably arise by reverse transcription of processed mRNA, followed by integration into the genome (150, 226, 227, 231). Partially processed pseudogenes, generated by an RNA-mediated mechanism, containing or not the complete coding region, have been described in tomato (210) and human (60, 62, 194, 209). Chimeric nonfunctional genes combining parts of a functional gene and its pseudogene have been found in lemurs (108) and humans (129).

If the presence of one copy of the gene suffices for the needs of the organism, it is assumed that pseudogene mutations (disabling or not) will not be subject to purifying selection and will all have equal probability of becoming fixed in the population (85, 87, 116, 130, 131). It follows that pseudogenes will generally degenerate, owing to the rapid accumulation of recurrent mutations, and melt into the background of the surrounding DNA (e.g., 88). That process has indeed been detected in bacterial genomes (7). However, genomes of eukaryotes contain many pseudogenes that appear to have avoided full degeneration (96, 145). A large-scope analysis of pseudogene distribution based on complete genome draft sequences shows that there is less pressure to delete pseudogenes in eukaryotes than in prokaryotes (96).

We review a variety of pseudogene features in diverse organisms. We begin by reviewing the Drosophila literature. We then extend the discussion to other organisms, starting with cases of evolutionary conservation and proceeding to explore the various functions detected in putative pseudogenes, such as gene expression, regulation of gene expression, generation of genetic diversity (antibody, antigenic, and other), and impact on gene conversion and recombination.

We review those pseudogenes for which there is functional or evolutionary information, excluding from consideration some pseudogenes that have been characterized only by their sequence, such as tRNA (53, 197), U4 RNA (186), glutathione S-transferase (218, 219), and histone H2B (1).

PSEUDOGENES IN DROSOPHILA

Pseudogenes are rare in Drosophila (98, 161) relative to some other animals, especially vertebrates (145). DNA sequence evolution in many of the pseudogenes found in Drosophila manifests functional constraints, reflected in lower than expected (if the pseudogenes were not subject to selection) intraspecific variability and interspecific divergence; significant heterogeneity of nucleotide variability and divergence along the sequence; higher rate of substitution at synonymous than at nonsynonymous nucleotide positions; conservation of important functional regions; transcriptional activity; and codon bias (12, 16, 56, 110, 111, 162, 167, 208). Moreover, two Adh Drosophila genes originally identified as pseudogenes (70, 110) have later been considered to be novel functional genes (19, 136).

Evolutionary Conservation

Snyder et al. (203) studied a small multigene family encoding the larval cuticle proteins (LCP) of Drosophila melanogaster, located within a 9-kb region of the right arm of the second chromosome, at cytogenetic map position 44D. One of the five sequences was thought to be a pseudogene ([psi] Lcp) because of several structural features and the absence of detectable transcripts. The disabling features included a 35-bp deletion eliminating the TATA box, two premature stop codons at positions 23 and 72 (162), a mutated splicing acceptor sequence, and several substitutions and indels that would prevent secretion of a functional protein (203). Another cuticle gene cluster located at 65A on the left arm of the third chromosome of D. melanogaster includes an intronless pseudogene, Lcp-[a.sup.[psi]], with a 21-bp frame-shifting deletion in the signal-peptide coding region and without a consensus polyadenylation site (42).

The nucleotide polymorphism of [psi] Lcp in D. melanogaster shows extensive sequence length variations that also result in premature stop codons in D. simulans, supporting the contention that [psi] Lcp is not a functional gene (162). As expected in a pseudogene not subject to selection, the rates of synonymous and nonsynonytutus substitutions are equal, and the overall nucleotide divergence between D. melanogaster and D. simulans is extremely high. However, the within-species nucleotide variation of [psi] Lcp ([pi] = 0.001 [+ or -] 0.001) is lower than for many functional genes. The HKA test (105) reveals that the [psi] Lcp polymorphism in D. melanogaster is significantly lower than expected, given the amount of divergence between D. melanogaster and D. simulans. The low level of intraspecific polymorphism of [psi] Lcp has been attributed to background selection (162).

Currie & Sullivan (56) have characterized an intronless phosphoglyceromutase pseudogene (Pglym87) located on the right arm of chromosome 3 at position 87B4,5, which most likely originated from the retroposition of a Pglym78 transcript, a gene located on the left arm of chromosome 3 at bands 78A/B. RNase protection experiments and primer extension analyses have failed to detect any transcript from Pglym87, although the Pglym87 open reading frame remains intact except for deletion of the first two codons. The structural properties suggest that Pglym87 is a pseudogene, whereas the intact reading frame and codon bias suggest that it might be a functional gene (56).

A processed Adh pseudogene ([psi] Adh) has been found in D. yakuba and D. teissieri (melanogaster species subgroup) (110, 111). The pseudogene is located on the third chromosome, whereas the functional Adh gene is on the second chromosome of the species. [psi] Adh exhibits a peculiar pattern of evolution for a pseudogene, including retention of reading frame, codon bias, and a higher rate of substitution at synonymous than at nonsynonymous nucleotide positions. Long & Langley (136) have proposed that [psi] Adh is a functional gene, which they called jingwei. They obtained a ratio 27:4 silent:replacement polymorphism, a pattern typical of active genes; detected transcripts of the gene and suggested that it arose by retroposition from the alcohol dehydrogenase gene, followed by recruitment of additional 5' exons and introns from an unrelated gene.

An unprocessed Adh pseudogene ([psi] Adh) has been detected in the repleta group of Drosophila (70; reviewed in 208). Some evidence suggests that the [psi] Adh is a pseudogene, since mutations have rendered the gene incapable of being translated into a functional alcohol dehydrogenase (208). However, the molecular evolution of [psi] Adh exhibits characteristics that are atypical of pseudogenes: (a) the rate of evolution is substantially slower in the exons of [psi] Adh than in the intergenic region, and only slightly faster than the rate of exons of functional Adh genes; (b) codon bias is retained in most species studied; (c) silent substitutions in [psi] Adh significantly exceed replacement substitutions. The Ks/Ka ratio (silent:replacement substitutions) ranges from 10 to 14, in pairwise interspecific comparisons involving sequences from seven species. These ratios greatly depart from unity, which is the value expected for a pseudogene (assuming no selective constraints), and are only slightly lower than those obtained from equivalent comparisons of the functional Adh genes (208). The repleta group [psi] Adh may be a chimeric functional gene (19).

A pseudogene has been described in the [alpha]-esterase gene cluster, which appears to be a bona fide pseudogene, in D. melanogaster (Dm[alpha]E4a-[psi]), where it has multiple inactivating mutations that are fixed in natural populations (177, 178, 185). The [alpha]-cluster includes 11 carboxyl/cholinesterase genes (10 active genes and 1 pseudogene) and locates within cytological region 84D3-E2 on chromosome 3R of D. melanogaster. The [alpha]-esterases have diverged substantially from one another (amino acid similarity ranging from 37% to 66%). No gene conversion or intergenic recombination has been detected between the genes and pseudogene. However, the D. simulans and D. yakuba orthologs of Dm[alpha]E4a-[psi] do not have inactivating mutations and appear to be functional (178).

The Amylase multigene family of D. pseudoobscura is located within a series of highly polymorphic inversions on the third chromosome. Four pseudogenes have been reported in three gene arrangements (ST, SC, and TL): ST Amy3-[psi], with a premature stop codon shortening the protein to 31.6% of its normal length (36); and TL Amy2-[psi], TL Amy3-[psi], and SC Amy3-[psi], with large deletions in their coding regions (160). The divergence among the four pseudogene sequences reveals a retardation of sequence evolution in the SC and ST arrangements, whereas the rates of substitution in the two TL pseudogenes do not depart from neutral expectations. Gene conversion is most likely responsible for slowing down the divergence of SC Amy3-[psi], and ST Amy3-[psi] (160). In the closely related D. miranda, an Amylase pseudogene (Amy3), located on the secondary sex chromosome X2, contains two large deletions (one 445 bp and the other 872 bp long) (205).

If pseudogenes are devoid of function, their pattern of nucleotide substitution is expected to reflect the pattern of spontaneous point mutations, which would make pseudogenes ideal models to study neutral evolution at the molecular level (131). Of the four D. melanogaster Amy pseudogenes, only two (TL Amy2-[psi] and TL Amy3-[psi] seem to behave as bona fide pseudogenes, whereas in the other two, SC Amy3-[psi] and ST Amy3-[psi], sequence evolution has been retarded, most likely by homogenization caused by gene conversion. Clearly, caution should be exercised when using pseudogenes as exemplars for determining patterns and rates of neutral nucleotide substitution.

Pseudogene Expression

A genomic region encompassing the Cecropin locus (Cec, located at 99E on the right ann of the third chromosome of D. melanogaster) includes three functional genes and two pseudogenes (Cec[psi]1 and Cec[psi]2) (126). The pseudogenes have diverged considerably from the functional Cec genes (only 50% of the residues remain identical), have no consensus splice signals, and contain multiple premature stop codons and deletions. However, both Cec[psi]1 and Cec[psi]2 have retained a promoter-like region with a TATA box and capping site homology (126). The two pseudogenes are also highly diverged between themselves. But nonsynonymous polymorphism is lower than synonymous polymorphism in the coding region of both pseudogenes, which suggests functional constraint on amino acid replacement changes, whereas the level of silent variation is the same (for Cec[psi]1) or higher (for Cec[psi]2) than in the functional genes. Both pseudogenes have conserved transcriptional signals and splice sites, and present an open reading frame; also, correctly spliced transcripts have been detected for both pseudogenes. Cec[psi]1 and Cec[psi]2 may be either active genes with some null alleles or young pseudogenes (167).

Regulation of Gene Expression

The X chromosome Stellate (Ste) gene of D. melanogaster (at 47.5 and band 12E) is homologous to a moderately repeated sequence that maps to the Suppressor of Stellate locus [Su(Ste)] (133) located on the long arm of the Y chromosome (93), which contains tandemly repeated pseudogene sequences with premature stop codons and frameshift mutations (17). Nevertheless, some of the…

Related articles from newspapers, magazines, journals, and more
Foveolin is abundantly and specifically expressed in superficial gastric...
Magazine article from: Gut Oien, K.A. McGregor, F. Butler, S. Ferrier, R.K. Downie, I. Bryce, S. Burns, S. Keith, W.N. April 1, 2004 700+ words
...Through previous large scale gene expression profiling we identified...extracellular surface. Gene expression in tissues was profiled...gastric carcinomas, and evolutionary conservation suggest that this gene is...
Zipping up hormone action. (zipper transcription factors in gene...
Magazine article from: BioScience Miller, Julie Ann November 1, 1990 700+ words
...which plant hormones influence gene expression, they have already discovered an astonishing evolutionary conservation between plants and animals, reports...wounding rather than by ABA. Because gene expression is often controlled by the binding...
Evolutionary conservation leads to hypothesis on Toll-like receptor function.
Newspaper article from: Proteomics Weekly March 10, 2003 700+ words
2003 MAR 10 - (NewsRx.com & NewsRx.net) -- "The origin of the Toll-like family of receptors predates the evolutionary split between the plant and animal kingdoms. These receptors are remarkably conserved across the taxonomic kingdoms and have fundamental roles in triggering immune responses. How
Gene Expression Profiling Life Science.
Press release article from: M2 Presswire March 3, 2009 700+ words
...March 2009-Research and Markets: Gene Expression Profiling Life Science(C)1994...researchandmarkets.com/research/7fcde2/gene_expression_pr) has announced the addition of the "Gene Expression Profiling Life Science Dashboard...
Research and Markets: Gene Expression Profiling Life Science.
Press release article from: Business Wire March 3, 2009 700+ words
...researchandmarkets.com/research/04af06/gene_expression_pr) has announced the addition of the "Gene Expression Profiling Life Science Dashboard Series 2" report to their offering. Gene expression profiling methods enable the quantification...
Gene Logic's Gene Expression Databases Spur Major Drug Discovery Efforts - 24...
Press release article from: PR Newswire February 11, 2000 700+ words
...Nasdaq: GLGC), the leading provider of gene expression databases, today announced that 24 drug...associated genes using Gene Logic's gene expression databases. Gene expression, which is the degree to which genes in...
New Review Examines the Markets for Gene Expression Reagents and Small Lab...
Press release article from: M2 Presswire April 18, 2007 700+ words
...Review Examines the Markets for Gene Expression Reagents and Small Lab Equipment...has announced the addition of Gene Expression Reagents Markets to their offering. The area of gene expression reagents is one of the newest and...
Gene Expression Markup Language (GEML(TM)) - a Data format to Facilitate the...
Press release article from: PR Newswire September 14, 2000 700+ words
...Intended to Expedite and Streamline Gene Expression Data Analysis and Management KIRKLAND...availability of a new file format, called Gene Expression Markup Language (GEML), to facilitate...interchange of data from DNA chip and other gene expression technologies into a consistent ...
New Review Examines the Markets for Gene Expression Reagents and Small Lab...
Press release article from: Business Wire April 19, 2007 700+ words
...has announced the addition of Gene Expression Reagents Markets to their offering. The area of gene expression reagents is one of the newest and...research and development. The term gene expression reagents refers to chemicals that...
Gene Logic Completes Atlas of Normal Human Gene Expression.
Press release article from: PR Newswire April 7, 2000 700+ words
...Nasdaq: GLGC), the leading provider of gene expression databases, has completed the world's first comprehensive survey of human gene expression across the 40 major normal tissue types. Gene expression analysis, which measures the degree...
For more facts and information, see all results
©2010 Gale, a part of Cengage Learning. All rights reserved. About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA

The AccessMyLibrary advertising network includes: womensforum.com GlamFamily