Richard Gayle
Book, Worm March 10, 2000
I enjoy reading science fiction when I am in the mood for entertainment. The best science fiction makes one think about the world in a novel way. What rules would autonomous robots follow to work effectively? Could technology be used to physically recreate the Hindu pantheon on another planet? Could you produce dinosaurs from the ancient blood of fossilized mosquitoes?
The science fiction I like the most usually has a strong science foundation. And one of my favorite authors is Greg Bear. He has a new book out called Darwin's Radio that was just reviewed in Nature. It is not often Nature reviews a science fiction novel but this book sounds like it has taken fairly solid scientific observations and extrapolated from them. While the exact details of the book (i.e. that the creation of new species may be directed by specific processes; that there are endogenous retroviruses waiting to be activated and produce wholesale change in a genome) may not be correct; like Jurassic Park, it is close enough to current scientific speculation to be thought-provoking.
Almost 30 years ago Niles Eldredge and Stephen Jay Gould first published their ideas on 'punctuated equilibrium'. They postulated that an isolated group, existing at the fringes of the species' range, could sometimes evolve very rapidly (in geologic terms this would be thousands of years). The resulting new species might then overwhelm the environmental niches of the parent species and take over.
Initially, punctuated equilibrium was denigrated by many paleontologists (My favorite epigram: evolution by jerks). It seemed to go against a central dogma of evolution: that species came into being by the gradual accumulation of genetic changes. Punctuated equilibrium said this could happen so fast that in geologic terms it was instantaneous. There was no mechanism for how this might occur. A recent observation may provide such a mechanism, may demonstrate that Darwin's Radio is not too far off and may suggest that we are looking in the wrong place for novel genes.
In this week's Science, Eric Lander and Robert Weinberg provide a vivid history of genetics. Darwin proposed Natural Selection without any knowledge of Mendelian inheritance. Mendelian inheritance was examined without any knowledge of the genetic code. cDNA sequences are decoded without any knowledge of how the genome is structured. Can genomic structures lead to knowledge of evolutionary processes? Thanks to an amazingly good choice made 40 years ago, we may have a clearer answer to that question.
In the early 1960's Sydney Brenner, in discussions with Francis Crick and Max Perutz, hit upon the idea of developing a new model organism. Something more complex than the viruses and bacteria of the day, less complex than Drosophila. He choose a small nematode, Caenorhabditis elegans, that had many useful attributes. It reproduces rapidly, has a small number of cells, and is tiny, allowing a huge number to be grown easily.
The first metazoan genome completely sequenced was that of C. elegans. We probably know more about this creature, how it develops and why, than any other multicellular species. However, we now have in hand genomic sequences of several other species, that can be compared with C. elegans. While Nature was publishing the review of Darwin's Radio, a paper in Science appeared that provides some interesting insights, not only into the organization of a worm's genome, but into a possible mechanism for rapid genomic change.
About 20% of all the genes of C. elegans are similar to ones found in yeast, indicating that these genes are quite ancient. Another 30 % of the worm's genes are found in insects and vertebrates, indicating that they came into existence after the split from yeast but before the split of C. elegans and higher eukaryotes. So 50 % of all of C. elegans' genes are found in other metazoans, including humans. The remaining 50% appear to be specific to worms and of more recent origin than the other 50%.
This paper examines the superfamilies that the different sequences fall into. Protein superfamilies describe groups whose overall-folding pattern is similar. For example, many proteins have similar structures, even if they have different activities.Often they have similar structures because of conservation in the gene sequences. We have mined the TNF receptor superfamily for a wide range of novel proteins, all by looking for telltale similarities in their sequences.
How are gene superfamilies created? Perhaps a gene is duplicated by any of several known mechanisms. One gene then retains the original function, allowing the other one to experience independent, divergent evolution, eventually gaining a new function. Since the protein products of both genes are derived from the same primordial one, they will maintain a similar folding pattern, producing 2 members of a gene family. The process can then begin again. Given enough gene duplication and selection, new species might develop.
This proposal presumes that gene duplication can occur anywhere in the genome, that any gene anywhere can be duplicated. However, this paper discusses a somewhat different mechanism for the generation of the superfamilies, one that has significant ramifications, not only for how life evolves but where we should be searching for novel genes. It postulates that most significant gene duplication and superfamily creation occurs only in very specific regions of the chromosome; that expression of genes from these regions is tightly controlled; and that these regions serve as the R&D sites of the chromosome, with the most useful genes eventually moving to more dispersed locations on the chromosome.
When we look at nuclei under a microscope, the chromosomes are in two different forms. One, termed euchromatin, is diffuse and extended during gene expression. The other, heterochromatin, which stains much more intensely, remains condensed with little gene expression. Heterochromatin and euchromatin are replicated during different stages of a cell cycle. Moving a gene from heterochromatin to euchromatin (either by mutational events or by directed processes) usually results in its derepression, with high level expression resulting. The opposite occurs when active genes are inserted into heterochromatin. Heterochromatic regions are often formed following simple gene duplication, with resulting reduction in expression of the duplicated gene.
Many heterochromatic regions have not been intensely studied because of the belief that little of consequence could be found there. Most of these regions are not genetically stable, making cloning difficult. Only a few genes are expressed, and those only at low levels. Almost all adaptive mutations, affecting phenotype, are in euchromatin, making genetic analysis of heterochromatin very difficult. Heterochromatic regions are filled with transposons, pseudo-genes, ancestral retroviruses, highly repetitive DNA a veritable hodgepodge of genomic structure. Heterochromatin is the very reservoir of 'junk' DNA that is often ignored. This view may be utterly false.
When examining the various superfamilies found in C. elegans, an interesting observation was made. If gene duplication was possible anywhere in the genome, with no regard to the underlying structure, the oldest superfamilies should have the greatest number of members, the greatest diversity. They have been around longer, giving them more time to be duplicated and to find new uses. By this same logic, genes families that have only recently come into existence should have few family members. Well, exactly the opposite was seen.
The oldest gene superfamilies, ones that are present in both of yeast and worms, have relatively few members, particularly compared to worm-specific genes. Those superfamilies consisting of genes that developed recently, after the worm lineage split from other metazoans, have many more members. And not only are there more, but they were much more diverse in their genomic structures. These family members are present in large numbers, in different orientations on the chromosomes, have different exon structures a veritable hodgepodge of genomic structure.
The oldest superfamilies are dispersed throughout the entire worm genome. The newest superfamilies tend to reside close to each other on a chromosome. Older superfamily members are generally expressed at much higher levels than those of more recent superfamilies. The newer superfamilies are grossly underrepresented in EST libraries, which examine expressed genes. Finally, mutations characterized in C. elegans are almost always in the genes of the older superfamilies, not in genes from more recent superfamilies. New, worm-specific superfamilies are found bunched together, with lots of surrounding pseudo-genes, inversions, altered exons. They are not expressed highly nor do any adaptive mutations map to them.
Low expression, few phenotypic mutations, lots of 'junk' DNA. What does that sound like? Could the newest superfamilies be found in heterochromatic regions? High expression, lots of adaptive mutations and few repeats. Could the oldest superfamilies be found in euchromatic regions? Might the heterochromatin be the site of the newest genes, undergoing rapid changes and rearrangement, yet kept under tight control? The provocative hypothesis of this paper is that novel genes are created in heterochromatin. Lots of rearrangements, duplications etc. but none of it is allowed to be expressed at high levels. In these areas, cDNAs could be taken up, gene duplications could occur, retroviruses could be incorporated, all sorts of selfish DNA sequences could be spawned, but few of these could produce protein. If something novel is created in the heterochromatin, that has an adaptive affect, then it is moved out into the euchromatin, where its genomic structure would be much more stable. Sounds like a process to explain punctuated equilibrium.
The genomic structure of the newest genes in C. elegans has similar properties to specific regions of a chromosome. Are these directly related? As it turns out, heterochromatic regions of C. elegans are not well defined cytologically, so it is not easy to tell which superfamilies fall in heterochromatin. However, it might be worth examining heterochromatic regions in other species. Are there gems hidden in the junk DNA? The very complexity of these regions may make this difficult but the low level expression of one important gene could have huge effects.
Are the nascent genes of a new species in heterochromatin? Chimpanzees and humans are said to be at least 96% identical. But this is only comparing the single copy sequences found in euchromatin. These comparisons almost always ignore the repetitive, junk DNA most likely to be found in the heterochromatic regions. But, it may be that in this very DNA lie the genes responsible for major differences between us and other primates.
A good hypothesis leads us to fruitful scientific investigations. If this hypothesis is true, there could be a large number of novel sequences hiding in an organism's DNA. The areas of greatest change in a genome could be in the heterochromatin. A multitude of genes could be present in the tightly regulated heterochromatin waiting to get expressed, if the right conditions came about. Perhaps a new species might be created more through the low level expression of novel genes in the heterochromatin than those genes seen in the euchromatin.
So, instead of searching EST libraries based on genes expressed in euchromatin, maybe we should examine libraries based on heterochromatic sequences. These might be the ones that are human-specific, genes that have only recently appeared but that could be of great present, or future, utility. These could be the genes that really separate us from our cousins, the great apes. Or, as expounded in Darwin's Radio, they could be the fountainhead for the creation of a new species. I will follow the developments of this interesting hypothesis. In the meantime, I will spend the weekend huddled over Darwin's Radio, hoping that the mechanisms it describes really are ONLY fiction.