DNA is copied into RNA in a process called genetic transcription. To transcribe means to "put down something in writing." The information in DNA is transcribed—or rewritten—into a smaller version (RNA) that can be used by the cell.
Understand the basic steps in the transcription of DNA into RNA
Describe the role of RNA polymerase
Understand the difference between pre-RNA and mRNA
Describe RNA post-translational modification and its purpose
Transcription takes place in the nucleus. It uses DNA as a template to make an RNA (mRNA) molecule. During transcription, a strand of mRNA is made that is complementary to a strand of DNA. Figure 1 shows how this occurs.
Figure 1. Overview of Transcription. Transcription uses the sequence of bases in a strand of DNA to make a complementary strand of mRNA. Triplets are groups of three successive nucleotide bases in DNA. Codons are complementary groups of bases in mRNA.
You can also watch this more detailed video about transcription.
Steps of Transcription
Transcription takes place in three steps: initiation, elongation, and termination. The steps are illustrated in Figure 2.
Figure 2. Transcription occurs in the three steps—initiation, elongation, and termination—all shown here.
Step 1: Initiation
Initiation is the beginning of transcription. It occurs when the enzyme RNA polymerase binds to a region of a gene called the promoter. This signals the DNA to unwind so the enzyme can ‘‘read’’ the bases in one of the DNA strands. The enzyme is now ready to make a strand of mRNA with a complementary sequence of bases.
Step 2: Elongation
Elongation is the addition of nucleotides to the mRNA strand. RNA polymerase reads the unwound DNA strand and builds the mRNA molecule, using complementary base pairs. There is a brief time during this process when the newly formed RNA is bound to the unwound DNA. During this process, an adenine (A) in the DNA binds to an uracil (U) in the RNA.
Step 3: Termination
Termination is the ending of transcription, and occurs when RNA polymerase crosses a stop (termination) sequence in the gene. The mRNA strand is complete, and it detaches from DNA.
This video provides a review of these steps. You can stop watching the video at 5:35. (After this point, it discusses translation, which we'll discuss in the next outcome.)
This section will expand upon the specific role of RNA polymerases during transcription. Read on to learn the role of RNA polymerases at each stage of transcription.
Initiation of Transcription
Unlike the prokaryotic polymerase that can bind to a DNA template on its own, eukaryotes require several other proteins, called transcription factors, to first bind to the promoter region and then help recruit the appropriate polymerase.
The Three Eukaryotic RNA Polymerases
The features of eukaryotic mRNA synthesis are markedly more complex those of prokaryotes. Instead of a single polymerase comprising five subunits, the eukaryotes have three polymerases that are each made up of 10 subunits or more. Each eukaryotic polymerase also requires a distinct set of transcription factors to bring it to the DNA template.
RNA polymerase I is located in the nucleolus, a specialized nuclear substructure in which ribosomal RNA (rRNA) is transcribed, processed, and assembled into ribosomes (Table 1). The rRNA molecules are considered structural RNAs because they have a cellular role but are not translated into protein. The rRNAs are components of the ribosome and are essential to the process of translation. RNA polymerase I synthesizes all of the rRNAs except for the 5S rRNA molecule. The "S" designation applies to "Svedberg" units, a nonadditive value that characterizes the speed at which a particle sediments during centrifugation.
Table 1. Locations, Products, and Sensitivities of the Three Eukaryotic RNA Polymerases
Product of Transcription
All rRNAs except 5S rRNA
All protein-coding nuclear pre-mRNAs
5S rRNA, tRNAs, and small nuclear RNAs
RNA polymerase II is located in the nucleus and synthesizes all protein-coding nuclear pre-mRNAs. Eukaryotic pre-mRNAs undergo extensive processing after transcription but before translation (Figure 3). For clarity, this module's discussion of transcription and translation in eukaryotes will use the term "mRNAs" to describe only the mature, processed molecules that are ready to be translated. RNA polymerase II is responsible for transcribing the overwhelming majority of eukaryotic genes.
Figure 3. Eukaryotic mRNA contains introns that must be spliced out. A 5′ cap and 3′ poly-A tail are also added.
RNA polymerase III is also located in the nucleus. This polymerase transcribes a variety of structural RNAs that includes the 5S pre-rRNA, transfer pre-RNAs (pre-tRNAs), and small nuclear pre-RNAs. The tRNAs have a critical role in translation; they serve as the adaptor molecules between the mRNA template and the growing polypeptide chain. Small nuclear RNAs have a variety of functions, including "splicing" pre-mRNAs and regulating transcription factors.
A scientist characterizing a new gene can determine which polymerase transcribes it by testing whether the gene is expressed in the presence of a particular mushroom poison, α-amanitin (Table 1). Interestingly, α-amanitin produced by Amanita phalloides, the Death Cap mushroom, affects the three polymerases very differently. RNA polymerase I is completely insensitive to α-amanitin, meaning that the polymerase can transcribe DNA in vitro in the presence of this poison. In contrast, RNA polymerase II is extremely sensitive to α-amanitin, and RNA polymerase III is moderately sensitive. Knowing the transcribing polymerase can clue a researcher into the general function of the gene being studied. Because RNA polymerase II transcribes the vast majority of genes, we will focus on this polymerase in our subsequent discussions about eukaryotic transcription factors and promoters.
Structure of an RNA Polymerase II Promoter
Eukaryotic promoters are much larger and more complex than prokaryotic promoters, but both have a TATA box. For example, in the mouse thymidine kinase gene, the TATA box is located at approximately -30 relative to the initiation (+1) site (Figure 4). For this gene, the exact TATA box sequence is TATAAAA, as read in the 5′ to 3′ direction on the nontemplate strand. The thermostability of A–T bonds is low and this helps the DNA template to locally unwind in preparation for transcription.
Figure 4. A generalized promoter of a gene transcribed by RNA polymerase II is shown. Transcription factors recognize the promoter. RNA polymerase II then binds and forms the transcription initiation complex.
The mouse genome includes one gene and two pseudogenes for cytoplasmic thymidine kinase. Pseudogenes are genes that have lost their protein-coding ability or are no longer expressed by the cell. These pseudogenes are copied from mRNA and incorporated into the chromosome. For example, the mouse thymidine kinase promoter also has a conserved CAAT box (GGCCAATCT) at approximately -80. This sequence is essential and is involved in binding transcription factors. Further upstream of the TATA box, eukaryotic promoters may also contain one or more GC-rich boxes (GGCG) or octamer boxes (ATTTGCAT). These elements bind cellular factors that increase the efficiency of transcription initiation and are often identified in more "active" genes that are constantly being expressed by the cell.
Transcription Factors for RNA Polymerase II
The complexity of eukaryotic transcription does not end with the polymerases and promoters. An army of basal transcription factors, enhancers, and silencers also help to regulate the frequency with which pre-mRNA is synthesized from a gene. Enhancers and silencers affect the efficiency of transcription but are not necessary for transcription to proceed. Basal transcription factors are crucial in the formation of a preinitiation complex on the DNA template that subsequently recruits RNA polymerase II for transcription initiation.
The names of the basal transcription factors begin with "TFII" (this is the transcription factor for RNA polymerase II) and are specified with the letters A–J. The transcription factors systematically fall into place on the DNA template, with each one further stabilizing the preinitiation complex and contributing to the recruitment of RNA polymerase II.
The processes of bringing RNA polymerases I and III to the DNA template involve slightly less complex collections of transcription factors, but the general theme is the same. Eukaryotic transcription is a tightly regulated process that requires a variety of proteins to interact with each other and with the DNA strand. Although the process of transcription in eukaryotes involves a greater metabolic investment than in prokaryotes, it ensures that the cell transcribes precisely the pre-mRNAs that it needs for protein synthesis.
The Evolution of Promoters
The evolution of genes may be a familiar concept. Mutations can occur in genes during DNA replication, and the result may or may not be beneficial to the cell. By altering an enzyme, structural protein, or some other factor, the process of mutation can transform functions or physical features. However, eukaryotic promoters and other gene regulatory sequences may evolve as well. For instance, consider a gene that, over many generations, becomes more valuable to the cell. Maybe the gene encodes a structural protein that the cell needs to synthesize in abundance for a certain function. If this is the case, it would be beneficial to the cell for that gene's promoter to recruit transcription factors more efficiently and increase gene expression.
Scientists examining the evolution of promoter sequences have reported varying results. In part, this is because it is difficult to infer exactly where a eukaryotic promoter begins and ends. Some promoters occur within genes; others are located very far upstream, or even downstream, of the genes they are regulating. However, when researchers limited their examination to human core promoter sequences that were defined experimentally as sequences that bind the preinitiation complex, they found that promoters evolve even faster than protein-coding genes.
It is still unclear how promoter evolution might correspond to the evolution of humans or other higher organisms. However, the evolution of a promoter to effectively make more or less of a given gene product is an intriguing alternative to the evolution of the genes themselves.
Promoter Structures for RNA Polymerases I and III
In eukaryotes, the conserved promoter elements differ for genes transcribed by RNA polymerases I, II, and III. RNA polymerase I transcribes genes that have two GC-rich promoter sequences in the –45 to +20 region. These sequences alone are sufficient for transcription initiation to occur, but promoters with additional sequences in the region from –180 to –105 upstream of the initiation site will further enhance initiation. Genes that are transcribed by RNA polymerase III have upstream promoters or promoters that occur within the genes themselves.
Elongation and Termination
Following the formation of the preinitiation complex, the polymerase is released from the other transcription factors, and elongation is allowed to proceed as it does in prokaryotes with the polymerase synthesizing pre-mRNA in the 5′ to 3′ direction. As discussed previously, RNA polymerase II transcribes the major share of eukaryotic genes, so this section will focus on how this polymerase accomplishes elongation and termination.
Although the enzymatic process of elongation is essentially the same in eukaryotes and prokaryotes, the DNA template is more complex. When eukaryotic cells are not dividing, their genes exist as a diffuse mass of DNA and proteins called chromatin. The DNA is tightly packaged around charged histone proteins at repeated intervals. These DNA–histone complexes, collectively called nucleosomes, are regularly spaced and include 146 nucleotides of DNA wound around eight histones like thread around a spool.
For polynucleotide synthesis to occur, the transcription machinery needs to move histones out of the way every time it encounters a nucleosome. This is accomplished by a special protein complex called FACT, which stands for "facilitates chromatin transcription." This complex pulls histones away from the DNA template as the polymerase moves along it. Once the pre-mRNA is synthesized, the FACT complex replaces the histones to recreate the nucleosomes.
The termination of transcription is different for the different polymerases. Unlike in prokaryotes, elongation by RNA polymerase II in eukaryotes takes place 1,000–2,000 nucleotides beyond the end of the gene being transcribed. This pre-mRNA tail is subsequently removed by cleavage during mRNA processing. On the other hand, RNA polymerases I and III require termination signals. Genes transcribed by RNA polymerase I contain a specific 18-nucleotide sequence that is recognized by a termination protein. The process of termination in RNA polymerase III involves an mRNA hairpin similar to rho-independent termination of transcription in prokaryotes.
pre-RNA and mRNA
After transcription, eukaryotic pre-mRNAs must undergo several processing steps before they can be translated. Eukaryotic (and prokaryotic) tRNAs and rRNAs also undergo processing before they can function as components in the protein synthesis machinery.
The eukaryotic pre-mRNA undergoes extensive processing before it is ready to be translated. The additional steps involved in eukaryotic mRNA maturation create a molecule with a much longer half-life than a prokaryotic mRNA. Eukaryotic mRNAs last for several hours, whereas the typical E. coli mRNA lasts no more than five seconds.
Pre-mRNAs are first coated in RNA-stabilizing proteins; these protect the pre-mRNA from degradation while it is processed and exported out of the nucleus. The three most important steps of pre-mRNA processing are the addition of stabilizing and signaling factors at the 5′ and 3′ ends of the molecule, and the removal of intervening sequences that do not specify the appropriate amino acids. In rare cases, the mRNA transcript can be "edited" after it is transcribed.
While the pre-mRNA is still being synthesized, a 7-methylguanosine cap is added to the 5′ end of the growing transcript by a phosphate linkage. This moiety (functional group) protects the nascent mRNA from degradation. In addition, factors involved in protein synthesis recognize the cap to help initiate translation by ribosomes.
3′ Poly-A Tail
Once elongation is complete, the pre-mRNA is cleaved by an endonuclease between an AAUAAA consensus sequence and a GU-rich sequence, leaving the AAUAAA sequence on the pre-mRNA. An enzyme called poly-A polymerase then adds a string of approximately 200 A residues, called the poly-A tail. This modification further protects the pre-mRNA from degradation and signals the export of the cellular factors that the transcript needs to the cytoplasm.
Eukaryotic genes are composed of exons, which correspond to protein-coding sequences (ex-on signifies that they are expressed), and intervening sequences called introns (intron denotes their intervening role), which may be involved in gene regulation but are removed from the pre-mRNA during processing. Intron sequences in mRNA do not encode functional proteins.
The discovery of introns came as a surprise to researchers in the 1970s who expected that pre-mRNAs would specify protein sequences without further processing, as they had observed in prokaryotes. The genes of higher eukaryotes very often contain one or more introns. These regions may correspond to regulatory sequences; however, the biological significance of having many introns or having very long introns in a gene is unclear. It is possible that introns slow down gene expression because it takes longer to transcribe pre-mRNAs with lots of introns. Alternatively, introns may be nonfunctional sequence remnants left over from the fusion of ancient genes throughout evolution. This is supported by the fact that separate exons often encode separate protein subunits or domains. For the most part, the sequences of introns can be mutated without ultimately affecting the protein product.
All of a pre-mRNA's introns must be completely and precisely removed before protein synthesis. If the process errs by even a single nucleotide, the reading frame of the rejoined exons would shift, and the resulting protein would be dysfunctional. The process of removing introns and reconnecting exons is called splicing (Figure 5). Introns are removed and degraded while the pre-mRNA is still in the nucleus. Splicing occurs by a sequence-specific mechanism that ensures introns will be removed and exons rejoined with the accuracy and precision of a single nucleotide. The splicing of pre-mRNAs is conducted by complexes of proteins and RNA molecules called spliceosomes.
Figure 5. Pre-mRNA splicing involves the precise removal of introns from the primary RNA transcript. The splicing process is catalyzed by protein complexes called spliceosomes that are composed of proteins and RNA molecules called snRNAs. Spliceosomes recognize sequences at the 5′ and 3′ end of the intron.
Errors in splicing are implicated in cancers and other human diseases. What kinds of mutations might lead to splicing errors?
Think of different possible outcomes if splicing errors occur. Mutations in the spliceosome recognition sequence at each end of the intron, or in the proteins and RNAs that make up the spliceosome, may impair splicing. Mutations may also add new spliceosome recognition sites. Splicing errors could lead to introns being retained in spliced RNA, exons being excised, or changes in the location of the splice site.
Note that more than 70 individual introns can be present, and each has to undergo the process of splicing—in addition to 5′ capping and the addition of a poly-A tail—just to generate a single, translatable mRNA molecule.
Figure 6. Trypanosoma brucei is the causative agent of sleeping sickness in humans. The mRNAs of this pathogen must be modified by the addition of nucleotides before protein synthesis can occur. (credit: modification of work by Torsten Ochsenreiter)
The trypanosomes are a group of protozoa that include the pathogen Trypanosoma brucei, which causes sleeping sickness in humans (Figure 6). Trypanosomes, and virtually all other eukaryotes, have organelles called mitochondria that supply the cell with chemical energy. Mitochondria are organelles that express their own DNA and are believed to be the remnants of a symbiotic relationship between a eukaryote and an engulfed prokaryote. The mitochondrial DNA of trypanosomes exhibit an interesting exception to The Central Dogma: their pre-mRNAs do not have the correct information to specify a functional protein. Usually, this is because the mRNA is missing several U nucleotides. The cell performs an additional RNA processing step called RNA editing to remedy this.
Other genes in the mitochondrial genome encode 40- to 80-nucleotide guide RNAs. One or more of these molecules interacts by complementary base pairing with some of the nucleotides in the pre-mRNA transcript. However, the guide RNA has more A nucleotides than the pre-mRNA has U nucleotides to bind with. In these regions, the guide RNA loops out. The 3′ ends of guide RNAs have a long poly-U tail, and these U bases are inserted in regions of the pre-mRNA transcript at which the guide RNAs are looped. This process is entirely mediated by RNA molecules. That is, guide RNAs—rather than proteins—serve as the catalysts in RNA editing.
RNA editing is not just a phenomenon of trypanosomes. In the mitochondria of some plants, almost all pre-mRNAs are edited. RNA editing has also been identified in mammals such as rats, rabbits, and even humans. What could be the evolutionary reason for this additional step in pre-mRNA processing? One possibility is that the mitochondria, being remnants of ancient prokaryotes, have an equally ancient RNA-based method for regulating gene expression. In support of this hypothesis, edits made to pre-mRNAs differ depending on cellular conditions. Although speculative, the process of RNA editing may be a holdover from a primordial time when RNA molecules, instead of proteins, were responsible for catalyzing reactions.
RNA Post-Translational Modification
The genes that a eukaryotic cell turns "on" largely determine its identity and properties. For instance, a photoreceptor cell in your eye can detect light because it expresses genes for light-sensitive proteins, as well as as genes for neurotransmitters that allow signals to be relayed to the brain.
In eukaryotic cells like photoreceptors, gene expression is often controlled primarily at the level of transcription. However, that doesn't mean transcription is the last chance for regulation. Later stages of gene expression can also be regulated, including the following:
RNA processing, such as splicing, capping, and addition of a poly-A tail
Messenger RNA (mRNA) translation and lifetime in the cytosol
Protein modifications, such as addition of chemical groups or removal of amino acids
In the sections below, we’ll discuss some common types of gene regulation that occur after an RNA transcript has been made.
Regulation of RNA processing
When a eukaryotic gene is transcribed in the nucleus, the primary transcript (freshly made RNA molecule) isn't yet considered a messenger RNA. Instead, it's an "immature" molecule called a pre-mRNA.
The pre-mRNA has to go through some modifications to become a mature mRNA molecule that can leave the nucleus and be translated. These include splicing, capping, and addition of a poly-A tail, all of which can potentially be regulated – sped up, slowed down, or altered to result in a different product.
Most pre-mRNA molecules have sections that are removed from the molecule, called introns, and sections that are linked or together to make the final mRNA, called exons. This process is called splicing.
In the process of alternative splicing, different portions of an mRNA can be selected for use as exons. This allows either of two (or more) mRNA molecules to be made from one pre-mRNA.
Figure 7. Image modified from "Eukaryotic Post-transcriptional Gene Regulation," by OpenStax College, Biology (CC BY 3.0).
Alternative splicing is not a random process. Instead, it's typically controlled by regulatory proteins. The proteins bind to specific sites on the pre-mRNA and "tell" the splicing factors which exons should be used. Different cell types may express different regulatory proteins, so different exon combinations can be used in each cell type, leading to the production of different proteins.
Small regulatory RNA
Once an mRNA has left the nucleus, it may or may not be translated many times to make proteins. Two key determinants of how much protein is made from an mRNA are its "lifespan" (how long it floats around in the cytosol) and how readily the translation machinery, such as the ribosome, can attach to it.
A recently discovered class of regulators, called small regulatory RNAs, can control mRNA lifespan and translation. Let's see how this works.
microRNAs (miRNAs) were among the first small regulatory RNAs to be discovered. A miRNA is first transcribed as a long RNA molecule, which forms base pairs with itself and folds over to make a hairpin. Next, the hairpin is chopped up by enzymes, releasing a small double-stranded fragment of about 20 nucleotides. One of the strands in this fragment is the mature miRNA, which binds to a specific protein to make an RNA-protein complex.
Figure 8. Image modifed from "miRNA biogenesis," by Narayanese (CC BY-SA 3.0). The modified image is licensed under a CC BY-SA 3.0 license.
The miRNA directs the protein complex to "matching" mRNA molecules (ones that form base pairs with the miRNA). When the RNA-protein complex binds:
If the miRNA and its target match perfectly, an enzyme in the RNA-protein complex will typically chop the mRNA in half, leading to its breakdown.
If the miRNA and its target have some mismatches, the RNA-protein complex may instead bind to the mRNA and keep it from being translated.
These are not the only ways that miRNAs inhibit expression of their targets, and scientists are still investigating their many modes of action.
In Summary: RNA Post-Translational Modification
Gene expression can be regulated at various stages after an RNA transcript has been produced. Some transcripts can undergo alternative splicing. This regulated process makes different mRNAs and proteins from the same initial RNA transcript. Some mRNAs are targeted by small regulatory RNAs, including miRNAs, which can cause mRNA degradation or block translation. A protein's activity may be regulated after translation by mechanisms such as proteolysis ("snipping out" of pieces) and addition of chemical groups.
Check Your Understanding
Answer the question(s) below to see how well you understand the topics covered in the previous section. This short quiz does not count toward your grade in the class, and you can retake it an unlimited number of times.
Use this quiz to check your understanding and decide whether to (1) study the previous section further or (2) move on to the next section.
"Liang et al., \"Fast evolution of core promoters in primate genomes,\" Molecular Biology and Evolution 25 (2008): 1239-44."
"R. W. and Sontheimer, E. J. (2009). Origins and mechanisms of miRNAs and siRNAs. Cell, 136(4), 642-655. http://dx.doi.org/10.1016/j.cell.2009.01.035."
"Liang et al., \"Fast evolution of core promoters in primate genomes,\" Molecular Biology and Evolution 25 (2008): 1239–44."
"R. W. and Sontheimer, E. J. (2009). Origins and mechanisms of miRNAs and siRNAs. Cell, 136(4), 642–655. http://dx.doi.org/10.1016/j.cell.2009.01.035."