My View

Richard Gayle

Genomic Reformer May 12, 2000

Sometimes coming up with a title for these columns is pretty easy, and sometimes it is more difficult. This was one of the difficult ones, mainly because the topic lent itself to so many easy titles. This week's topic deals with certain mobile elements called LINEs. In researching them, I found articles entitled 'Do all SINEs lead to LINEs?', 'Drawing the LINE', 'SINEs of the LINEs'. Everyone was making a play off of the acronym. So being different, I chose another track.

Now, often the title just comes to me as I write. The old right brain subtly adds it two cents worth and slips in a title. But with it already preloaded with these puns from other articles, it was having a hard time. However, coming up with useful titles is something we all have to deal with when writing scientific articles.

Not being of the school that makes a title a statement (e.g. The prismatic receptor, ponceR, is the source of eternal youth), I started a descriptive one for this week's topic. 'Our current understanding of LINEs and their ability to remake and reform the human genome.' Too long. 'LINEs - remakers and reformers of the human genome.' Too awkward. 'LINEs - suffragettes of the human genome.' Too obscure. But this did lead me to 'Genomic Reformer', which works much better. Kind of grabs you and creates a desire to learn more, right? Well, at least it is short.

I talked last week about group II introns and their ability to move around the genome. They might be the origin of spliceosomal introns. However, there is another group of mobile elements that appear to figure prominently in the mammalian genome. These are two groups of retrotransposable, non-LTR elements called LINEs and SINEs (Jargon Alert: transposable elements are able to insert themselves into new locations in the genome; retro - means that they go through an RNA intermediate and are copied into DNA by a reverse transcriptase; LTR - long terminal repeat usually seen surrounding the site of a transposition; LINE - Long INterspersed Element; SINE - Short INterspersed Element.)

There are a huge number of copies of SINEs and LINEs in the human genome. Over 30% of the human genome is comprised of retrotransposed sequences. About 15% are composed of LINE sequences, with L1 being the most common, with over 100,000 copies. Most of the rest are SINE sequences, with Alu being the most common. Both SINEs and LINEs carry their own promoters but only LINES have the necessary coding sequence to provide for their own transposition. It appears that SINEs are paired with particular LINEs and are unable to retrotranspose on their own.

An active LINE sequence found on the genome contains two open reading frames, ORF 1, with an unknown function and ORF 2, which has endonuclease and reverse transcriptase activities similar to group II introns. The genomic LINE is surrounded by very short repeats, left over from the retrotransposition, and a telltale poly-A track. They do not normally contain introns.

L1 ORF2 is not very efficient in copying either the 5' end or 3' end of the LINE, and their insertion into the chromosome produces further alterations. Virtually all of the L1 sequences in the human genome are damaged, with 5' truncations and other changes. It is estimated that only 50-60 of the 100,000 copies are actually active and are capable of independent retrotransposition.

Last year, an article in Science described the ability of these retrotransposons to shuffle exons. It turns out that the poly-A site in LINEs is very poor, at least in humans. This allows them to more easily hide from selection. If a LINE inserts into an exon, its presence will obviously affect the original gene's expression. This sort of insertional mutagenesis has been seen by natural selection and can be selected against if harmful. However, if the LINE inserted into an intron, it could remain there as long as it did not alter too greatly the ability of the intron to be spliced out (i.e. All the normal exons would be together in the final mRNA transcript.). The retrotransposon could still be functional, since it has its own promoter. A weak poly-A would allow continued transcription through the surrounding sequence until a legitimate poly-A site was reached. A strong poly-A here would result in premature termination of the mRNA transcript and probable loss of function, producing the same result as an insertional mutation

A poor poly-A site means that the LINE's own poly-A is not used and the transcript may continue until it finds another poly-A site, most likely the normal one for the original gene (see figure). What this means is that when a LINE is reverse copied into DNA, the mRNA transcript that is used continues quite a bit of 3' sequence, often encoding another exon. Upon insertion, this exon will have been copied into another region, potentially resulting in a novel exon being placed in a new setting. (A fuller explanation can be found here.)

Now this process was demonstrated using an engineered L1 element in Hela cells. How effective are L1 sequences in vivo in transducing extra 3' sequences? A recent paper in Genome Research provides an initial answer. They looked in the genome databases for L1 sequences that contained extra 3' sequences. They found a lot. In several cases, they were able to take the 3' sequence, and work backwards to the original location of the sequence (Remember, the original sequence will not have a poly-A tract.). In most cases, there was an L1 sequence present here. So, the original L1 picked up some 3' sequence and moved it to another location. From their estimates, some 25 Mbases, or over 1% of the entire human genome, are made up of these retrotransposed 3' sequences. To put this in perspective, this is about the same amount of the genome that is occupied by functional exons. A lot of DNA to be moved around.

The high mobility of these 3' sequences, coupled with the overall sloppiness of the L1 retrotransposition, means that there is a large amount of plasticity in the genome due to just these elements. The possibility of forming new juxtaposition of sequences, creating novel function is obvious. The fact that these elements provide a promoter increases the likelihood of meaningful expression. The potential ability of LINEs to remold the genome is mind-boggling. Luckily most of them are inactive, but were they always so?

But wait, there's more, as the infomercials say. The ORFs produced by LINEs can do more that simply retrotranspose their own coding sequences in cis, that is, insert the RNA sequences that code for themselves. They can also act in trans, inserting other non-transposon mRNA sequences into the genome. I mentioned their ability to do this with SINEs, but a recent paper in Nature Genetics demonstrates their ability to take normal mRNA transcripts and insert them into the chromosome.

There are many instances of processed pseudogenes present in the human genome. These genes lack introns and often have a poly-A tract present at the 3' end. There have been several proposals dealing with the mechanism for how a normal mRNA transcript can be inserted into the DNA of the genome, but this paper provides the most likely pathway.

They used a selectable marker, neo, that is expressed on the antisense strand, contains an intron and is on a temperature-sensitive plasmid. This was transfected into cells, along with another vector that supplied the L1 sequence. Following growth at the restrictive temperature, they could select for neo+ clones that had the gene incorporated in the chromosome. In most cases, it looked exactly like a processed pseudogene: loss of the intron, poly-A tract, short repeats at the ends.

Using mutants of the L1 sequence, they showed that both ORFs were necessary for this retrotransposition of neo. They also showed that other possible processes, using retroviruses for instance, do not result in the canonical processed pseudogene format. And it is important to note that many of these processed genes are fully functional.

So, the LINEs appear to play a very important part in determining how a genome is put together. They do more than simply alter the genome by insertional mutagenesis. LINEs have a very strong role in creating new DNA sequences, and in juxtaposing sequences together. The amount of DNA that they have been involved with is comparable to the total amount of coding DNA in the genome.

What sort of selective advantage would be present by having such large tracts of DNA being reformed by mobile elements will have to be a topic of a future paper. At the moment I am trying to get the vision of cDNA copies being inserted into my chromosome out of my head. When I went to school, DNA was supposed to be the stable vessel for information archiving. It changed very slowly. There would be some intrinsic mutation rate due to the inaccuracies of DNA polymerase. But wholesale changes in large amounts of DNA was unthinkable as a standard process. However, it is looking like reverse transcriptase activities may be a very important in the evolution of animals, not just in its destructive forms (i.e. retroviruses) but in its constructive forms (i.e. LINEs). Perhaps we should enshrine this molecule as the most important protein in the pantheon of enzymes!