CHAPTER 8 NUCLEOTIDES AND NUCLEIC ACIDS The discussion in this chapter reinforces or introduces five principles: Nucleic acids are both repositories and functional expressions of biological information. Biological information is one of the required conditions for life, a blueprint for each species transmitted from one generation to the next. RNA can be a functional expression of that information, directing the synthesis of proteins or in some cases acting directly as a signal or a reaction catalyst. The transmission of biological information relies on molecular complementarity. Chromosomes are the largest molecules in any cell. They are polymers composed of a small set of common nucleotides, with information embedded in the nucleotide sequence. The common nucleotides in RNA and DNA are organized so that two strands of nucleic acid can maintain a complementary and uniform structure over vast molecular distances. This extended potential for both variable sequence and complementarity, and thus information storage and transmission, is a property shared by no other class of biological molecule. Biological information is subject to natural damage and change. DNA damage is a constant, and it results in occasional mutation — the raw material for evolution. Biological information can be accessed, interpreted, and altered in the laboratory. The information embedded in nucleic acids is of singular importance to biochemistry and molecular biology. The techniques for sequencing, synthesizing, and altering nucleic acids are continually advancing. Nucleoside triphosphates occupy a central role in cellular metabolism, serving as an energy currency and as important regulatory signals. ATP is the ultimate product of catabolic pathways, providing fuel for anabolic pathways. This chapter provides an overview of the chemical nature of the nucleotides and nucleic acids found in most cells, as well as the tools used to study them. A more detailed examination of the function of nucleic acids is the focus of Part III of this text. 8.1 Some Basic Definitions and Conventions The amino acid sequence of every protein in a cell, and the nucleotide sequence of every RNA, is specified by a nucleotide sequence in the cell’s DNA. A segment of a DNA molecule that contains the information required for the synthesis of a functional biological product, whether protein or RNA, is referred to as a gene. A cell typically has many thousands of genes, and DNA molecules, not surprisingly, tend to be very large. The storage of biological information and the transmission of that information from one generation to the next are the only known functions of DNA. RNAs have a broader range of functions, and several classes are found in cells. Ribosomal RNAs (rRNAs) are components of ribosomes, the complexes that carry out the synthesis of proteins. Messenger RNAs (mRNAs) are intermediaries, carrying information for the synthesis of a protein from one or a few genes to a ribosome. Transfer RNAs (tRNAs) are adapter molecules that faithfully translate the information in mRNA into a specific sequence of amino acids. In addition to these major classes, there are many RNAs (noncoding or ncRNAs) with a wide variety of special functions, described in depth in Part III. Nucleotides and Nucleic Acids Have Characteristic Bases and Pentoses A nucleotide has three characteristic components: (1) a nitrogenous (nitrogen-containing) base, (2) a pentose, and (3) one or more phosphates (Fig. 8-1). The molecule without a phosphate group is called a nucleoside. The nitrogenous bases are derivatives of two parent compounds, pyrimidine and purine. The bases and pentoses of the common nucleotides are heterocyclic compounds. FIGURE 8-1 Structure of nucleotides. (a) General structure showing the numbering convention for the pentose ring. This is a ribonucleotide. In deoxyribonucleotides the —OH group on the 2′ carbon (in red) is replaced with —H. (b) The parent compounds of the pyrimidine and purine bases of nucleotides and nucleic acids, showing the numbering conventions. KEY CONVENTION The carbon atoms and nitrogen atoms in the parent structures are conventionally numbered to facilitate the naming and identification of the many derivative compounds. The convention for the pentose ring follows rules outlined in Chapter 7, but in the pentoses of nucleotides and nucleosides the carbon numbers are given a prime (′) designation to distinguish them from the numbered atoms of the nitrogenous bases. The base of a nucleotide is joined covalently (at N-1 of pyrimidines and N-9 of purines) in an N-β -glycosyl bond to the 1′ carbon of the pentose, and the phosphate is esterified to the 5′ carbon. The N-β -glycosyl bond is formed by removal of the elements of water (a hydroxyl group from the pentose and hydrogen from the base), as in O-glycosidic bond formation. Both DNA and RNA contain two major purine bases, adenine (A) and guanine (G), and two major pyrimidines. In both DNA and RNA, one of the pyrimidines is cytosine (C), but the second common pyrimidine is not the same in both: it is thymine (T) in DNA and uracil (U) in RNA. Only occasionally does thymine occur in RNA or uracil in DNA. The structures of the five major bases are shown in Figure 8-2, and the nomenclature of their corresponding nucleotides and nucleosides is summarized in Table 8-1. FIGURE 8-2 Major purine and pyrimidine bases of nucleic acids. Some of the common names of these bases reflect the circumstances of their discovery. Guanine, for example, was first isolated from guano (bird manure), and thymine was first isolated from thymus tissue. TABLE 8-1 Nucleotide and Nucleic Acid Nomenclature Base Nucleoside Nucleotide Nucleic acid Purines Adenine Adenosine Deoxyadenosine Adenylate Deoxyadenylate RNA DNA Guanine Guanosine Guanylate RNA Deoxyguanosine Deoxyguanylate DNA Pyrimidines Cytosine Cytidine Deoxycytidine Cytidylate Deoxycytidylate RNA DNA Thymine Thymidine or deoxythymidine Thymidylate or deoxythymidylate DNA Uracil Uridine Uridylate RNA Note: “Nucleoside” and “nucleotide” are generic terms that include both ribo- and deoxyribo- forms. Also, ribonucleosides and ribonucleotides are here designated simply as nucleosides and nucleotides (e.g., riboadenosine as adenosine), and deoxyribonucleosides and deoxyribonucleotides as deoxynucleosides and deoxynucleotides (e.g., deoxyriboadenosine as deoxyadenosine). Both forms of naming are acceptable, but the shortened names are more commonly used. Thymine is an exception; “ribothymidine” is used to describe its unusual occurrence in RNA. Nucleic acids have two kinds of pentoses. The recurring deoxyribonucleotide units of DNA contain 2′-deoxy-D-ribose, and the ribonucleotide units of RNA contain D-ribose. In nucleotides, both types of pentoses are in their β -furanose (closed five- membered ring) form (Fig. 8-3a). As Figure 8-3b shows, the pentose ring is not planar but occurs in one of a variety of conformations generally described as “puckered.” FIGURE 8-3 Conformations of ribose. (a) In solution, the straight-chain (aldehyde) and ring (β -furanose) forms of free ribose are in equilibrium. RNA contains only the ring form, β - -ribofuranose. Deoxyribose undergoes a similar interconversion in solution, but in DNA it exists solely as β -2′- deoxy- -ribofuranose. (b) Ribofuranose rings in nucleotides can exist in four different puckered conformations. In all cases, four of the five atoms are nearly in a single plane. The fi h atom (C-2′ or C-3′) is on either the same (endo) side or the opposite (exo) side of the plane relative to the C-5′ atom. KEY CONVENTION Although DNA and RNA seem to have two distinguishing features — different pentoses and the presence of uracil in RNA and thymine in DNA — it is the pentoses that uniquely define the identity of a nucleic acid. If the nucleic acid contains 2′-deoxy-D- ribose, it is DNA by definition, even if it contains uracil. Similarly, if the nucleic acid contains D-ribose, it is RNA, regardless of its base composition. The presence of uracil or thymine is not a defining characteristic. Figure 8-4a gives the structures and names of the four major deoxyribonucleotides, the structural units of DNAs, and Figure 8- 4b shows the four major ribonucleotides, the structural units of RNAs. Deoxyribonucleotides are also referred to as deoxyribonucleoside 5′-monophosphates, deoxynucleotides, and deoxynucleoside triphosphates; ribonucleotides are also called ribonucleoside 5′-monophosphates.
FIGURE 8-4 Deoxyribonucleotides and ribonucleotides of nucleic acids. All nucleotides are shown in their free form at pH 7.0. The nucleotide units of (a) DNA and (b) RNA are shown. For each nucleotide, the more common name is given, followed by the complete name in parentheses and symbols used to represent them. All abbreviations assume that the phosphate group is at the 5′ position. The nucleoside portion of each molecule is shaded in light red. In this and the following illustrations, the ring carbons are not shown. Although nucleotides bearing the major purines and pyrimidines are most common, both DNA and RNA also contain some minor bases (Fig. 8-5). In DNA the most common of these are methylated forms of the major bases; in some viral DNAs, certain bases may be hydroxymethylated or glucosylated. Altered or unusual bases in DNA molecules o en have roles in regulating or protecting the genetic information. Hundreds of different modified bases are also found in RNAs, especially in rRNAs and tRNAs (see Fig. 8-25 and Fig. 26-22). The modifications are usually introduced by enzymes that act a er the RNA or DNA is synthesized. FIGURE 8-5 Some minor purine and pyrimidine bases, shown as the nucleosides. (a) Minor bases of DNA. 5-Methylcytidine occurs in the DNA of animals and higher plants, N6 -methyladenosine in bacterial DNA, and 5-hydroxymethylcytidine in the DNA of animals and of bacteria infected with certain bacteriophages. (b) Some minor bases of tRNAs. Inosine contains the base hypoxanthine. Note that pseudouridine, like uridine, contains uracil; they are distinct in the point of attachment to the ribose — in uridine, uracil is attached through N-1, the usual attachment point for pyrimidines; in pseudouridine, uracil is attached through C-5. KEY CONVENTION The nomenclature for the minor bases can be confusing. Like the major bases, many minor bases have common names — hypoxanthine, for example, shown as its nucleoside inosine in Figure 8-5. When an atom in the purine ring or the pyrimidine ring is substituted, the usual convention (used here) is simply to indicate the ring position of the substituent by its number — for example, 5-methylcytosine, 7-methylguanine, and 5- hydroxymethylcytosine (shown as the nucleosides in Fig. 8-5). The element to which the substituent is attached (N, C, O) is not identified. The convention changes when the substituted atom is exocyclic (i.e., not within the ring structure), in which case the type of atom is identified, and the ring position to which it is attached is denoted with a superscript. The amino nitrogen attached to C-6 of adenine is N6; similarly, the carbonyl oxygen and amino nitrogen at C-6 and C-2 of guanine are O6 and N2, respectively. Examples of this nomenclature are N6- methyladenosine and N2-methylguanosine (Fig. 8-5).
Cells also contain nucleotides with phosphate groups in positions other than on the 5′ carbon (Fig. 8-6). Ribonucleoside 2′,3′-cyclic monophosphates are isolatable intermediates, and ribonucleoside 3′-monophosphates are end products of the hydrolysis of RNA by certain ribonucleases. Other variations are adenosine 3′,5′-cyclic monophosphate (cAMP) and guanosine 3′,5′-cyclic monophosphate (cGMP), considered at the end of this chapter. FIGURE 8-6 Some adenosine monophosphates. Adenosine 2′- monophosphate, 3′-monophosphate, and 2′,3′-cyclic monophosphate are formed by enzymatic and alkaline hydrolysis of RNA. Phosphodiester Bonds Link Successive Nucleotides in Nucleic Acids The successive nucleotides of both DNA and RNA are covalently linked through phosphate-group “bridges,” in which the 5′- phosphate group of one nucleotide unit is joined to the 3′- hydroxyl group of the next nucleotide, creating a phosphodiester linkage (Fig. 8-7). Thus the covalent backbones of nucleic acids consist of alternating phosphate and pentose residues, and the nitrogenous bases may be regarded as side groups joined to the backbone at regular intervals. The backbones of both DNA and RNA are hydrophilic. The hydroxyl groups of the sugar residues form hydrogen bonds with water. The phosphate groups, with a pKa near 0, are completely ionized and negatively charged at pH 7, and the negative charges are generally neutralized by ionic interactions with positive charges on proteins, metal ions, and polyamines. FIGURE 8-7 Phosphodiester linkages in the covalent backbone of DNA and RNA. The phosphodiester bonds (one of which is shaded in the DNA) link successive nucleotide units. The backbone of alternating pentose and phosphate groups in both types of nucleic acid is highly polar. The 5′ and 3′ ends of the macromolecule may be free or may have an attached phosphoryl group. KEY CONVENTION All the phosphodiester linkages in DNA and RNA have the same orientation along the chain (Fig. 8-7), giving each linear nucleic acid strand a specific polarity and distinct 5′ and 3′ ends. By definition, the 5′ end lacks a nucleotide attached at the 5′ position, and the 3′ end lacks a nucleotide attached at the 3′ position. Other groups (most o en one or more phosphates) may be present on one or both ends. The 5′→ 3′ orientation of a strand of nucleic acid refers to the ends of the strand and the orientation of individual nucleotides, not the orientation of the individual phosphodiester bonds linking its constituent nucleotides. The covalent backbone of DNA and RNA is subject to slow, nonenzymatic hydrolysis of the phosphodiester bonds. In the test tube, RNA is hydrolyzed rapidly under alkaline conditions, but DNA is not; the 2′-hydroxyl groups in RNA (absent in DNA) are directly involved in the process. Cyclic 2′,3′-monophosphate nucleotides are the first products of the action of alkali on RNA and are rapidly hydrolyzed further to yield a mixture of 2′- and 3′- nucleoside monophosphates (Fig. 8-8). FIGURE 8-8 Hydrolysis of RNA under alkaline conditions. The 2′ hydroxyl acts as a nucleophile in an intramolecular displacement. The 2′,3′-cyclic monophosphate derivative is further hydrolyzed to a mixture of 2′- and 3′-monophosphates. DNA, which lacks 2′ hydroxyls, is stable under similar conditions. The nucleotide sequences of nucleic acids can be represented schematically, as illustrated below by a segment of DNA with five nucleotide units. The phosphate groups are symbolized by , and each deoxyribose is symbolized by a vertical line, from C-1′ at the top to C-5′ at the bottom (but keep in mind that the sugar is always in its closed-ring β -furanose form in nucleic acids). The connecting lines between nucleotides (which pass through ) are drawn diagonally from the middle (C-3′) of the deoxyribose of one nucleotide to the bottom (C-5′) of the next. Some simpler representations of this pentadeoxyribonucleotide are pA-C-G-T-AOH, pApCpGpTpA, and pACGTA. KEY CONVENTION The sequence of a single strand of nucleic acid is always written with the 5′ end at the le and the 3′ end at the right — that is, in the 5′→ 3′ direction. A short nucleic acid is referred to as an oligonucleotide. The definition of “short” is somewhat arbitrary, but polymers containing 50 or fewer nucleotides are generally called oligonucleotides. A longer nucleic acid is called a polynucleotide. The Properties of Nucleotide Bases Affect the Three-Dimensional Structure of Nucleic Acids Free pyrimidines and purines are weakly basic compounds and thus are called bases. The purines and pyrimidines common in DNA and RNA are aromatic molecules (Fig. 8-2), a property with important consequences for the structure, electron distribution, and light absorption of nucleic acids. Electron delocalization among atoms in the ring gives most of the bonds in the ring partial double-bond character. One result is that pyrimidines are planar molecules and purines are very nearly planar, with a slight pucker. Free pyrimidine and purine bases may exist in two or more tautomeric forms depending on the pH. Uracil, for example, occurs in several readily interconverted forms called tautomers — lactam, lactim, and double lactim forms (Fig. 8-9). The structures shown in Figure 8-2 are the tautomers that predominate at pH 7.0. All nucleotide bases absorb UV light, and nucleic acids are characterized by a strong absorption at wavelengths near 260 nm (Fig. 8-10). FIGURE 8-9 Tautomeric forms of uracil. The lactam form predominates at pH 7.0; the other forms become more prominent as pH decreases. The other free pyrimidines and the free purines also have tautomeric forms, but they are more rarely encountered. FIGURE 8-10 Absorption spectra of the common nucleotides. The spectra are shown as the variation in molar extinction coefficient with wavelength. The molar extinction coefficients at 260 nm and pH 7.0 (ε260) are listed in the table. The spectra of corresponding ribonucleotides and deoxyribonucleotides, as well as the nucleosides, are essentially identical. For mixtures of nucleotides, a wavelength of 260 nm (dashed vertical line) is used for absorption measurements. The purine and pyrimidine bases are hydrophobic and relatively insoluble in water at the near-neutral pH of the cell. At acidic or alkaline pH, the bases become charged and their solubility in water increases. Hydrophobic stacking interactions in which two or more bases are positioned with the planes of their rings parallel (like a stack of coins) are one of two important modes of interaction between bases in nucleic acids. The stacking also involves a combination of van der Waals and dipole-dipole interactions between the bases. Base stacking helps to minimize contact of the bases with water, and base-stacking interactions are very important in stabilizing the three-dimensional structure of nucleic acids, as described later. The functional groups of pyrimidines and purines are ring nitrogens, carbonyl groups, and exocyclic amino groups. Hydrogen bonds involving the amino and carbonyl groups are the most important mode of complementary interaction between two (and occasionally three or four) complementary strands of nucleic acid. The most common hydrogen-bonding patterns are those defined by James D. Watson and Francis Crick in 1953, in which A bonds specifically to T (or U) and G bonds to C (Fig. 8-11). These two types of base pairs predominate in double-stranded DNA and RNA, and the tautomers shown in Figure 8-2 are responsible for these patterns.
It is this specific pairing of bases that permits the duplication of genetic information. FIGURE 8-11 Hydrogen-bonding patterns in the base pairs defined by Watson and Crick. Here, as elsewhere, hydrogen bonds are represented by three blue lines. SUMMARY 8.1 Some Basic Definitions and Conventions A nucleotide consists of a nitrogenous base (purine or pyrimidine), a pentose sugar, and one or more phosphate groups. If no phosphate is present, the combination of the pentose sugar and the nitrogenous base is a nucleoside. Nucleic acids are polymers of nucleotides, joined together by phosphodiester linkages between the 5′-hydroxyl group of one pentose and the 3′-hydroxyl group of the next. There are two types of nucleic acid: RNA and DNA. The nucleotides in RNA contain ribose, and the common pyrimidine bases are uracil and cytosine. In DNA, the nucleotides contain 2′- deoxyribose, and the common pyrimidine bases are thymine and cytosine. The primary purines are adenine and guanine in both RNA and DNA. The nitrogenous bases have a hydrophobic character and interact via base-stacking interactions. 8.2 Nucleic Acid Structure The discovery of the structure of DNA by Watson and Crick in 1953 gave rise to entirely new disciplines and influenced the course of many established ones. In this section we focus on DNA structure, some of the events that led to its discovery, and more recent refinements in our understanding of DNA. We also introduce RNA structure. As in the case of protein structure (Chapter 4), it is sometimes useful to describe nucleic acid structure in terms of hierarchical levels of complexity (primary, secondary, tertiary). The primary structure of a nucleic acid is its covalent structure and nucleotide sequence. Any regular, stable structure taken up by some or all of the nucleotides in a nucleic acid can be referred to as secondary structure. Most structures considered in the remainder of this chapter fall under the heading of secondary structure. The complex folding of large chromosomes within eukaryotic chromatin and bacterial nucleoids, or the elaborate folding of large tRNA or rRNA molecules, is generally considered tertiary structure. DNA tertiary structure is discussed in Chapter 24. RNA tertiary structure is considered briefly in this chapter and more thoroughly in Chapter 26. DNA Is a Double Helix That Stores Genetic Information DNA was first isolated and characterized by Friedrich Miescher in 1869. He called the phosphorus-containing substance “nuclein.” Not until the 1940s, with the work of Oswald T. Avery, Colin MacLeod, and Maclyn McCarty, was there any compelling evidence that DNA was the genetic material. Avery and his colleagues found that an extract of a virulent strain of the bacterium Streptococcus pneumoniae (causing disease in mice) could be used to transform a nonvirulent strain of the same bacterium into a virulent strain. They were able to demonstrate through various chemical tests that it was DNA from the virulent strain (not protein, polysaccharide, or RNA, for example) that carried the genetic information for virulence. Then in 1952, experiments by Alfred D. Hershey and Martha Chase — in which they studied the infection of bacterial cells by a virus (bacteriophage) with radioactively labeled DNA or protein — removed any remaining doubt that DNA, not protein, carried the genetic information. Another important clue to the structure of DNA came from the work of Erwin Chargaff and his colleagues in the late 1940s. Examining dozens of species, they found that the four nucleotide bases of DNA occur in different ratios in the DNAs of different organisms. However, the base composition remains constant in different tissues of the same species, and does not vary with age, environment, nutritional state, or generation. Furthermore, regardless of the species, the number of adenosine residues is equal to the number of thymidine residues (that is, A = T), and the number of guanosine residues is equal to the number of cytidine residues (G = C). From these relationships it follows that the sum of the purine residues equals the sum of the pyrimidine residues; that is, A + G = T + C. These quantitative relationships, sometimes called “Chargaff’s rules,” were a key to establishing the three- dimensional structure of DNA. To shed more light on the structure of DNA, in the early 1950s Rosalind Franklin and Maurice Wilkins used the powerful method of x-ray diffraction (see Fig. 4-30) to analyze DNA fibers. Although lacking the molecular definition of diffraction from crystals, the x-ray diffraction pattern generated from the fibers was informative (Fig. 8-12). The pattern revealed that DNA molecules are helical, with two periodicities along their long axis: a primary one of 3.4 Å and a secondary one of 34 Å. The problem then was to formulate a three-dimensional model of the DNA molecule that could account not only for the x-ray diffraction data but also for the specific A = T and G = C base equivalences discovered by Chargaff and for the other chemical properties of DNA. FIGURE 8-12 X-ray diffraction pattern of DNA fibers. The spots forming a cross in the center denote a helical structure. The heavy bands at the le and the right arise from the recurring bases. Rosalind Franklin, 1920–1958 Maurice Wilkins, 1916–2004 James Watson and Francis Crick relied on this accumulated information about DNA to set about deducing its structure. In 1953 they postulated a three-dimensional model of DNA structure that accounted for all the available data. It consists of two helical DNA chains wound around the same axis to form a right-handed double helix. (See Box 4-1 for an explanation of the right- or le - handed sense of a helical structure.) The hydrophilic backbones of alternating deoxyribose and phosphate groups are on the outside of the double helix, facing the surrounding water. The furanose ring of each deoxyribose is in the C-2′ endo conformation. The purine and pyrimidine bases of both strands are stacked inside the double helix, with their hydrophobic and nearly planar ring structures very close together and perpendicular to the long axis. The offset pairing of the two strands creates a major groove and a minor groove on the surface of the duplex (Fig. 8-13). Each nucleotide base of one strand is paired in the same plane with a base of the other strand. Watson and Crick found that the hydrogen-bonded base pairs illustrated in Figure 8-11, G with C and A with T, are those that fit best within the structure, providing a rationale for Chargaff’s rule that in any DNA, G = C and A = T. It is important to note that three hydrogen bonds can form between G and C, symbolized G≡C, but only two can form between A and T, symbolized A═T. Pairings of bases other than G with C and A with T tend (to varying degrees) to destabilize the double-helical structure. FIGURE 8-13 Watson-Crick model for the structure of DNA. The original model proposed by Watson and Crick had 10 bp, or 34 Å (3.4 nm), per turn of the helix; subsequent measurements revealed 10.5 bp, or 36 Å (3.6 nm), per turn. (a) Schematic representation, showing dimensions of the helix. (b) Stick representation showing the backbone and stacking of the bases. (c) Space-filling model. When Watson and Crick constructed their model, they had to decide at the outset whether the strands of DNA should be parallel or antiparallel — whether their 3′,5′-phosphodiester bonds should run in the same or opposite directions. An antiparallel orientation produced the most convincing model, and later work with DNA polymerases (Chapter 25) provided experimental evidence that the strands are indeed antiparallel, a finding ultimately confirmed by x-ray analysis. To account for the periodicities observed in the x-ray diffraction patterns of DNA fibers, Watson and Crick manipulated molecular models to arrive at a structure in which the vertically stacked bases inside the double helix would be 3.4 Å apart; the secondary repeat distance of about 34 Å was accounted for by the presence of 10 base pairs (bp) in each complete turn of the double helix. The structure in aqueous solution differs slightly from that in fibers, having 10.5 bp per helical turn (Fig. 8-13). As Figure 8-14 shows, the two antiparallel polynucleotide chains of double-helical DNA are not identical in either base sequence or composition. Instead they are complementary to each other. Wherever adenine occurs in one chain, thymine is found in the other; similarly, wherever guanine occurs in one chain, cytosine is found in the other. FIGURE 8-14 Complementarity of strands in the DNA double helix. The complementary antiparallel strands of DNA follow the pairing rules proposed by Watson and Crick. The base-paired antiparallel strands differ in base composition: the le strand has the composition A3T2G1C3; the right strand has A2T3G3C1. They also differ in sequence when each chain is read in the 5′→ 3′ direction. Note the base equivalences: A = T and G = C in the duplex. The DNA double helix, or duplex, is held together by hydrogen bonding between complementary base pairs (Fig. 8-11) and by base-stacking interactions. The complementarity between the DNA strands is attributable to the hydrogen bonding between base pairs; however, the hydrogen bonds do not contribute significantly to the stability of the structure. The double helix is primarily stabilized by metal cations, which shield the negative charges of backbone phosphates, and by base-stacking interactions between successive base pairs. Base-stacking interactions between successive G≡C or C≡G pairs are stronger than those between successive A═T and T═A pairs or adjacent pairs including all four bases. Because of this, DNA duplexes with higher G≡C content are more stable. The important features of the double-helical model of DNA structure are now supported by much chemical and biological evidence. Moreover, the model immediately suggested a mechanism for the transmission of genetic information. The essential feature of the model was the complementarity of the two DNA strands. As Watson and Crick were able to see, well before confirmatory data became available, this structure could logically be replicated by (1) separating the two strands and (2) synthesizing a complementary strand for each.
Because nucleotides in each new strand are joined in a sequence specified by the base-pairing rules stated above, each preexisting strand functions as a template to guide the synthesis of one complementary strand (Fig. 8-15). These expectations were experimentally confirmed, inaugurating a revolution in our understanding of biological inheritance. FIGURE 8-15 Replication of DNA as suggested by Watson and Crick. The preexisting or “parent” strands become separated, and each is the template for biosynthesis of a complementary “daughter” strand (in pink). DNA Can Occur in Different Three- Dimensional Forms DNA is a remarkably flexible molecule. Considerable rotation is possible around several types of bonds in the sugar–phosphate (phosphodeoxyribose) backbone, and thermal fluctuation can produce bending, stretching, and unpairing (melting) of the strands. Many significant deviations from the Watson-Crick DNA structure are found in cellular DNA, some or all of which may be important in DNA metabolism. These structural variations generally do not affect the key properties of DNA defined by Watson and Crick: strand complementarity, antiparallel strands, and the requirement for A═T and G≡C base pairs. Structural variation in DNA reflects three things: the different possible conformations of the deoxyribose, rotation about the contiguous bonds that make up the phosphodeoxyribose backbone (Fig. 8-16a), and free rotation about the C-1′–N-glycosyl bond (Fig. 8-16b). Because of steric constraints, purines in purine nucleotides are restricted to two stable conformations with respect to deoxyribose, called syn and anti (Fig. 8-16b). Pyrimidines are generally restricted to the anti conformation because of steric interference between the sugar and the carbonyl oxygen at C-2 of the pyrimidine. FIGURE 8-16 Structural variation in DNA. (a) The conformation of a nucleotide in DNA is affected by rotation about seven different bonds. Six of the bonds rotate freely. The limited rotation about bond 4 gives rise to ring pucker. This conformation is endo or exo, depending on whether the atom is displaced to the same side of the plane as C-5′ or to the opposite side (see Fig. 8-3b). (b) For purine bases in nucleotides, only two conformations with respect to the attached ribose units are sterically permitted: anti or syn. Pyrimidines occur in the anti conformation. The Watson-Crick structure is also referred to as B-form DNA, or B-DNA. The B form is the most stable structure for a random- sequence DNA molecule under physiological conditions and is therefore the standard point of reference in any study of the properties of DNA. Two structural variants that have been well characterized in crystal structures are the A and Z forms. These three DNA conformations are shown in Figure 8-17, with a summary of their properties. The A form is favored in many solutions that are relatively devoid of water. The DNA is still arranged in a right-handed double helix, but the helix is wider and the number of base pairs per helical turn is 11, rather than 10.5 as in B-DNA. The plane of the base pairs in A-DNA is tilted about 20° relative to B-DNA base pairs, thus the base pairs in A- DNA are not perfectly perpendicular to the helix axis. These structural changes deepen the major groove while making the minor groove shallower. The reagents used to promote crystallization of DNA tend to dehydrate it, and thus most short DNA molecules tend to crystallize in the A form. FIGURE 8-17 Comparison of A, B, and Z forms of DNA. Each structure shown here has 36 bp. The riboses and bases are shown in yellow. The phosphodiester backbone is represented as a blue rope. Blue is the color used to represent DNA strands in later chapters. The table summarizes some properties of the three forms of DNA. Z-form DNA is a more radical departure from the B structure; the most obvious distinction is the le -handed helical rotation. There are 12 bp per helical turn, and the structure appears more slender and elongated. The DNA backbone takes on a zigzag appearance. Certain nucleotide sequences fold into le -handed Z helices much more readily than others. Prominent examples are sequences in which pyrimidines alternate with purines, especially alternating C and G (that is, in the helix, alternating C≡G and G≡C pairs) or 5-methyl-C and G residues. To form the le -handed helix in Z- DNA, the purine residues flip to the syn conformation, alternating with pyrimidines in the anti conformation. The major groove is barely apparent in Z-DNA, and the minor groove is narrow and deep. Whether A-DNA occurs in cells is uncertain, but there is evidence for some short stretches (tracts) of Z-DNA in both bacteria and eukaryotes. These Z-DNA tracts may play a role (as yet undefined) in regulating the expression of some genes or in genetic recombination. Certain DNA Sequences Adopt Unusual Structures Other sequence-dependent structural variations found in larger chromosomes may affect the function and metabolism of the DNA segments in their immediate vicinity. For example, bends occur in the DNA helix wherever four or more adenosine residues appear sequentially in one strand. Six adenosines in a row produce a bend of about 18°. The bending observed with this and other sequences may be important in the binding of some proteins to DNA. A common type of DNA sequence is a palindrome. A palindrome is a word, phrase, or sentence that is spelled identically when read either forward or backward; two examples are ROTATOR and NURSES RUN. In DNA, the term is applied to regions of DNA with inverted repeats, such that an inverted, self-complementary sequence in one strand is repeated in the opposite orientation in the paired strand, as in Figure 8-18. The self-complementarity within each strand confers the potential to form hairpin or cruciform (cross-shaped) structures (Fig. 8-19). When the inverted repeat occurs within each individual strand of the DNA, the sequence is called a mirror repeat. Mirror repeats do not have complementary sequences within the same strand and thus cannot form hairpin or cruciform structures. Sequences of these types are found in almost every large DNA molecule and can encompass a few base pairs or thousands. The extent to which palindromes occur as cruciforms in cells is not known, although some cruciform structures have been demonstrated in vivo in Escherichia coli. Self-complementary sequences cause isolated single strands of DNA (or RNA) in solution to fold into complex structures containing multiple hairpins. FIGURE 8-18 Palindromes and mirror repeats. Palindromes are sequences of double-stranded nucleic acids with twofold symmetry. To superimpose one repeat (shaded sequence) on the other, it must be rotated 180° about the horizontal axis and then 180° about the vertical axis, as shown by the colored arrows. A mirror repeat, on the other hand, has a symmetric sequence within each strand. Superimposing one repeat on the other requires only a single 180° rotation about the vertical axis.
FIGURE 8-19 Hairpins and cruciforms. Palindromic DNA (or RNA) sequences can form alternative structures with intrastrand base pairing. (a) Hairpin structures involve a single DNA or RNA strand. (b) Cruciform structures involve both strands of a duplex DNA. Blue shading highlights asymmetric sequences that can pair with the complementary sequence either in the same strand or in the complementary strand. Several unusual DNA structures are formed from three or even four DNA strands. Nucleotides participating in a Watson-Crick base pair (Fig. 8-11) can form additional hydrogen bonds with a third strand, particularly with functional groups arrayed in the major groove. For example, the guanosine residue of a G≡C nucleotide pair can pair with a cytidine residue (if protonated) on a third strand (Fig. 8-20a); the adenosine of an A═T pair can pair with a thymidine residue. The N-7, O6, and N6 of purines, the atoms that participate in the hydrogen bonding with a third DNA strand, are o en referred to as Hoogsteen positions, and the non-Watson-Crick pairing is called Hoogsteen pairing, a er Karst Hoogsteen, who in 1963 first recognized the potential for these unusual pairings. Hoogsteen pairing allows the formation of triplex DNAs. The triplexes shown in Figure 8-20 (a, b) are most stable at low pH because the C≡G ∙C+ triplet requires a protonated cytosine. In the triplex, the pKa of this cytosine is >7.5, altered from its normal value of 4.2. The triplexes also form most readily within long sequences containing only pyrimidines or only purines in a given strand. Some triplex DNAs contain two pyrimidine strands and one purine strand; others contain two purine strands and one pyrimidine strand. FIGURE 8-20 DNA structures containing three or four DNA strands. (a) Base-pairing patterns in one well-characterized form of triplex DNA. The Hoogsteen pair in each case is shown in red. (b) Triple-helical DNA containing two pyrimidine strands (red and white; sequence TTCCTT) and one purine strand (blue; sequence AAGGAA). The blue and white strands are antiparallel and paired by normal Watson-Crick base-pairing patterns. The third (all-pyrimidine) strand (red) is parallel to the purine strand and paired through non- Watson-Crick hydrogen bonds. The triplex is viewed from the side, with six triplets shown. (c) Base-pairing pattern in the guanosine tetraplex structure. (d) Four successive tetraplets from a G tetraplex structure. (e) Possible variants in the orientation of strands in a G tetraplex. [Data from (b) PDB ID 1BCE, J. L. Asensio et al., Nucleic Acids Res. 26:3677, 1998; (d) PDB ID 244D, G. Laughlan et al., Science 265:520, 1994.] Four DNA strands can also pair to form a tetraplex (quadruplex), but this occurs readily only for DNA sequences with a very high proportion of guanosine residues (Fig. 8-20c, d). The guanosine tetraplex, or G tetraplex, is quite stable over a broad range of conditions. The orientation of strands in the tetraplex can vary as shown in Figure 8-20e. In the DNA of living cells, sites recognized by many sequence- specific DNA-binding proteins (Chapter 28) are arranged as palindromes, and polypyrimidine or polypurine sequences that can form triple helices are found within regions involved in the regulation of expression of some eukaryotic genes. Messenger RNAs Code for Polypeptide Chains We now turn our attention to the expression of the genetic information that DNA contains. Given that the DNA of eukaryotes is largely confined to the nucleus, whereas protein synthesis occurs on ribosomes in the cytoplasm, some molecule other than DNA must carry the genetic message from the nucleus to the cytoplasm. As early as the 1950s, RNA was considered the logical candidate: RNA is found in both the nucleus and the cytoplasm, and an increase in protein synthesis is accompanied by an increase in the amount of cytoplasmic RNA and an increase in its rate of turnover. These and other observations led several researchers to suggest that RNA carries genetic information from DNA to the protein-synthesizing machinery of the ribosome. In 1961, François Jacob and Jacques Monod presented a unified (and essentially correct) picture of many aspects of this process. They proposed the name “messenger RNA” (mRNA) for that portion of the total cellular RNA carrying the genetic information from DNA to the ribosomes.
The mRNAs are formed on a DNA template by the process of transcription. Once they reach the ribosomes, the messengers provide the templates that specify amino acid sequences in polypeptide chains. Although mRNAs from different genes can vary greatly in length, the mRNAs from a particular gene generally have a defined size. In bacteria and archaea, a single mRNA molecule may code for one or several polypeptide chains. If it carries the code for only one polypeptide, the mRNA is monocistronic; if it codes for two or more different polypeptides, the mRNA is polycistronic. In eukaryotes, most mRNAs are monocistronic. (For the purposes of this discussion, “cistron” refers to a gene. The term itself has historical roots in the science of genetics, and its formal genetic definition is beyond the scope of this text.) The minimum length of an mRNA is set in part by the length of the polypeptide chain for which it codes. For example, a polypeptide chain of 100 amino acid residues requires an RNA coding sequence of at least 300 nucleotides, because each amino acid is coded by a nucleotide triplet (this and other details of protein synthesis are discussed in Chapter 27). However, mRNAs transcribed from DNA are always somewhat longer than the length needed simply to code for a polypeptide sequence (or sequences). The additional, noncoding RNA includes sequences required to begin and end translation by the ribosome, as well as regulatory sequences. Figure 8-21 summarizes the general structure of bacterial mRNAs. FIGURE 8-21 Bacterial mRNA. Schematic diagrams show (a) monocistronic and (b) polycistronic mRNAs of bacteria. Red segments represent RNA coding for a gene product; gray segments represent noncoding RNA. In the polycistronic transcript, noncoding RNA separates the three genes. Many RNAs Have More Complex Three-Dimensional Structures Messenger RNA is only one of several classes of cellular RNA. Transfer RNAs are adapter molecules that act in protein synthesis; covalently linked to an amino acid at one end, each tRNA pairs with the mRNA in such a way that amino acids are joined to a growing polypeptide in the correct sequence. Ribosomal RNAs are components of ribosomes. There is also a wide variety of noncoding RNAs, including some (called ribozymes) that have enzymatic activity. All the RNAs are considered in detail in Chapter 26. The diverse and o en complex functions of these RNAs reflect a diversity of structure much richer than that observed in DNA molecules. The product of transcription of DNA is always single-stranded RNA. The single strand tends to assume a right-handed helical conformation dominated by base-stacking interactions (Fig. 8- 22), which are stronger between two purines than between a purine and a pyrimidine or between two pyrimidines. The purine-purine interaction is so strong that a pyrimidine separating two purines is o en displaced from the stacking pattern so that the purines can interact. Any self-complementary sequences in the molecule trigger folding into structures with more complexity. RNA can base-pair with complementary regions of either RNA or DNA. Base pairing matches the pattern for DNA: G pairs with C and A pairs with U (or with the occasional T residue in some RNAs). One difference is that base pairing between G and U residues is allowed in RNA (see Fig. 8-24) when complementary sequences in two single strands of RNA (or within a single strand of RNA that folds back on itself to align the residues) pair with each other. The paired strands in RNA or RNA- DNA duplexes are antiparallel, as in DNA. FIGURE 8-22 Typical right-handed stacking pattern of single-stranded RNA. The bases are shown in yellow, the phosphorus atoms in orange, and the riboses and phosphate oxygens in green. Green is used to represent RNA strands in succeeding chapters, just as blue is used for DNA. When two strands of RNA with perfectly complementary sequences are paired, the predominant double-stranded structure is an A-form right-handed double helix. However, strands of RNA that are perfectly paired over long regions of sequence are uncommon. The three-dimensional structures of many RNAs, like those of proteins, are complex and unique. Weak interactions, especially base-stacking interactions, help stabilize RNA structures, just as they do in DNA. Z-form helices have been made in the laboratory (under very high-salt or high-temperature conditions). The B form of RNA has not been observed. Breaks in the regular A-form helix caused by mismatched or unmatched bases in one or both strands are common and result in bulges or internal loops (Fig. 8-23). Hairpin loops form between nearby self-complementary (palindromic) sequences. Extensive base- paired helical segments are formed in many RNAs (Fig. 8-24), and the resulting hairpins are the most common type of secondary structure in RNA. Specific short base sequences (such as UUCG) are o en found at the ends of RNA hairpins and are known to form particularly tight and stable loops. Such sequences may act as starting points for the folding of an RNA molecule into its precise three-dimensional structure. Other contributions are made by hydrogen bonds that are not part of standard Watson- Crick base pairs. For example, the 2′-hydroxyl group of ribose can hydrogen-bond with other groups. Some of these properties are evident in the tertiary structure of the phenylalanine transfer RNA of yeast — the tRNA responsible for inserting Phe residues into polypeptides — and in two RNA enzymes, or ribozymes, whose functions, like those of protein enzymes, depend on their three-dimensional structures (Fig. 8-25). FIGURE 8-23 Secondary structure of RNAs. (a) Bulge, internal loop, and hairpin loop. (b) The paired regions generally have an A-form right-handed helix, as shown for a hairpin. The single UG base pair is identified with a green dot. [(b) Data from PDB ID 1GID, J. H. Cate et al., Science 273:1678, 1996.] FIGURE 8-24 Base-paired helical structures in an RNA. Shown here are (a) the secondary structure and (b) the three-dimensional structure of the P RNA component of the RNase P of Thermotoga maritima. RNase P, which also contains a protein component (not shown), functions in the processing of transfer RNAs. A complexed tRNA is also shown in (b). Separate C (catalytic) and S (specificity) domains are denoted with yellow and light red backbones in both images. The blue dots in (a) indicate non-Watson-Crick G–U base pairs (boxed inset). Note that G–U base pairs are allowed only when presynthesized strands of RNA fold up or anneal with each other. [(a) Information from N. J. Reiter et al., Nature 468:784, 2010, Fig. 2a. (b) Data from PDB ID 3Q1R, N. J. Reiter et al., Nature 468:784, 2010.] FIGURE 8-25 Three-dimensional structure in RNA. (a) Three-dimensional structure of phenylalanine tRNA of yeast. Some unusual base-pairing patterns found in this tRNA are shown in the numbered insets. Note in a hydrogen bond with a ribose 2′-hydroxyl group and in a hydrogen bond with the oxygen of a ribose phosphodiester (both shown in red). (b) A hammerhead ribozyme (so named because the secondary structure at the active site looks like the head of a hammer), derived from certain plant viruses. Ribozymes, or RNA enzymes, catalyze a variety of reactions, primarily in RNA metabolism and protein synthesis. The complex three-dimensional structures of these RNAs reflect the complexity inherent in catalysis, as described for protein enzymes in Chapter 6. (c) A segment of mRNA known as an intron, from the ciliated protozoan Tetrahymena thermophila. This intron (a ribozyme) catalyzes its own excision from between exons in an mRNA strand (discussed in Chapter 26). [Data from (a) PDB ID 1TRA, E. Westhof and M. Sundaralingam, Biochemistry 25:4868, 1986; (b) PDB ID 1MME, W. G. Scott et al., Cell 81:991, 1995; (c) PDB ID 1GRZ, B. L. Golden et al., Science 282:259, 1998.] The analysis of RNA structure and the relationship between its structure and its function remains a robust field of inquiry that has many of the same complexities as the analysis of protein structure. The importance of understanding RNA structure grows as we become increasingly aware of the large number of functional roles for RNA molecules. SUMMARY 8.2 Nucleic Acid Structure Many lines of evidence show that DNA bears genetic information. Some of the earliest evidence came from the Avery- MacLeod-McCarty experiment, which showed that DNA isolated from one bacterial strain can enter and transform the cells of another strain, endowing the second strain with some of the inheritable characteristics of the donor. The Hershey-Chase experiment showed that the DNA of a bacterial virus, but not its protein coat, carries the genetic message for replication of the virus in a host cell. Putting together the available data, Watson and Crick postulated that native DNA consists of two antiparallel chains in a right-handed double-helical arrangement. Complementary base pairs, A═T and G≡C, are formed by hydrogen bonding between chains in the helix. The base pairs are stacked perpendicular to the long axis of the double helix, 3.4 Å apart, with 10.5 bp per turn. DNA can exist in several structural forms. Two variations of the Watson-Crick form, or B-DNA, are A- and Z-DNA. Some sequence-dependent structural variations cause bends in the DNA molecule. DNA strands with appropriate sequences can form hairpin or cruciform structures or triplex or tetraplex DNA. Messenger RNA transfers genetic information from DNA to ribosomes for protein synthesis. Transfer RNA and ribosomal RNA are also involved in protein synthesis. RNA can be structurally complex; single RNA strands can fold into hairpins, double-stranded regions, or complex loops. Additional noncoding RNAs have a variety of special functions. 8.3 Nucleic Acid Chemistry
The role of DNA as a repository of genetic information depends in part on its inherent stability. The chemical transformations that do occur are generally very slow in the absence of an enzyme catalyst. The long-term storage of information without alteration is so important to a cell, however, that even very slow reactions that alter DNA structure can be physiologically significant. Processes such as carcinogenesis and aging may be intimately linked to slowly accumulating, irreversible alterations of DNA. Other, nondestructive alterations also occur and are essential to function, such as the strand separation that must precede DNA replication or transcription. In addition to providing insights into physiological processes, our understanding of nucleic acid chemistry has given us a powerful array of technologies that have applications in molecular biology, medicine, agriculture, and forensic science. We now examine the chemical properties of DNA and a few of these technologies. Double-Helical DNA and RNA Can Be Denatured Solutions of carefully isolated, native DNA are highly viscous at pH 7.0 and room temperature (25 °C). When such a solution is subjected to extremes of pH or to temperatures above 80 °C, its viscosity decreases sharply, indicating that the DNA has undergone a physical change. Just as heat and extremes of pH denature globular proteins, they also cause denaturation, or melting, of double-helical DNA. Disruption of the hydrogen bonds between paired bases and of base-stacking interactions causes unwinding of the double helix to form two single strands, completely separate from each other along the entire length or part of the length (partial denaturation) of the molecule. No covalent bonds in the DNA are broken (Fig. 8-26).
FIGURE 8-26 Reversible denaturation and annealing (renaturation) of DNA. When the temperature or pH is returned to the range in which most organisms live, the unwound segments of the two strands spontaneously rewind, or anneal, to yield the intact duplex (Fig. 8-26). However, if the two strands are completely separated, renaturation occurs in two steps. In the first, relatively slow step, the two strands “find” each other by random collisions and form a short segment of complementary double helix. The second step is much faster: the remaining unpaired bases successively come into register as base pairs, and the two strands “zipper” themselves together to form the double helix. The close interaction between stacked bases in a nucleic acid has the effect of decreasing its absorption of UV light relative to that of a solution with the same concentration of free nucleotides, and the absorption is decreased further when two complementary nucleic acid strands are paired. This is called the hypochromic effect. Denaturation of a double-stranded nucleic acid produces the opposite result: an increase in absorption called the hyperchromic effect. The transition from double-stranded DNA to the denatured, single-stranded form can thus be detected by monitoring UV absorption at 260 nm. Viral or bacterial DNA molecules in solution denature when they are heated slowly (Fig. 8-27). Each species of DNA has a characteristic denaturation temperature, or melting point (tm; formally, the temperature at which half the DNA is present as separated single strands): the higher its content of G≡C base pairs, the higher the melting point of the DNA. This is primarily because, as we saw earlier, G≡C base pairs make greater contributions to base stacking than do A═T base pairs. Thus, the melting point of a DNA molecule, determined under fixed conditions of pH and ionic strength, can yield an estimate of its base composition. If denaturation conditions are carefully controlled, regions that are rich in A═T base pairs will denature while most of the DNA remains double-stranded. Such denatured regions (called bubbles) can be visualized with electron microscopy (Fig. 8-28). In the strand separation of DNA that occurs in vivo during processes such as DNA replication and transcription, the site where strand separation is initiated is o en rich in A═T base pairs, as we shall see.
FIGURE 8-27 Heat denaturation of DNA. (a) The denaturation, or melting, curves of two DNA specimens. The temperature at the midpoint of the transition (tm) is the melting point; it depends on pH and ionic strength and on the size and base composition of the DNA. (b) Relationship between tm and the G+C content of a DNA. [(b) Data from J. Marmur and P. Doty, J. Mol. Biol. 5:109, 1962.] FIGURE 8-28 Partially denatured DNA. This DNA was partially denatured, then fixed to prevent renaturation during sample preparation. Although the shadowing method used to visualize the DNA in this electron micrograph obliterates many details, single-stranded and double-stranded regions are readily distinguishable. The arrows point to some single-stranded bubbles where denaturation has occurred. The regions that denature are highly reproducible and are rich in A═T base pairs. Duplexes of two RNA strands or of one RNA strand and one DNA strand (RNA-DNA hybrids) can also be denatured. Notably, RNA duplexes are more stable to heat denaturation than DNA duplexes. At neutral pH, denaturation of a double-helical RNA o en requires temperatures at least 20 °C higher than those required for denaturation of a DNA molecule with a comparable sequence, assuming that the strands in each molecule are perfectly complementary. The stability of an RNA-DNA hybrid is generally intermediate between that of RNA and DNA duplexes. The physical basis for these differences in thermal stability is not known. WORKED EXAMPLE 8-1 DNA Base Pairs and DNA Stability In samples of DNA isolated from two unidentified species of bacteria, X and Y, adenine makes up 32% and 17%, respectively, of the total bases. What relative proportions of adenine, guanine, thymine, and cytosine would you expect to find in the two DNA samples? What assumptions have you made? One of these species was isolated from a hot spring (64 °C). Which species is most likely the thermophilic bacterium, and why? SOLUTION: For any double-helical DNA, A = T and G = C. The DNA from species X has 32% A and therefore must contain 32% T. This accounts for 64% of the bases and leaves 36% as G≡C pairs: 18% G and 18% C. The sample from species Y, with 17% A, must contain 17% T, accounting for 34% of the base pairs. The remaining 66% of the bases are thus equally distributed as 33% G and 33% C. This calculation is based on the assumption that both DNA molecules are double-stranded. The higher the G+C content of a DNA molecule, the higher the melting temperature. Species Y, having the DNA with the higher G+C content (66%), most likely is the thermophilic bacterium; its DNA has a higher melting temperature and thus is more stable at the temperature of the hot spring. Nucleotides and Nucleic Acids Undergo Nonenzymatic Transformations
Purines and pyrimidines, along with the nucleotides of which they are a part, undergo spontaneous alterations in their covalent structure. The rate of these reactions is generally very slow, but they are physiologically significant because of the cell’s very low tolerance for alterations in its genetic information. Alterations in DNA structure that produce permanent changes in the genetic information encoded therein are called mutations. In higher organisms, much evidence suggests an intimate link between the accumulation of mutations in an individual and the processes of aging and carcinogenesis. Several nucleotide bases undergo spontaneous loss of their exocyclic amino groups (deamination) (Fig. 8-29a). For example, under typical cellular conditions, deamination of cytosine (in DNA) to uracil occurs in about one of every 107 cytidine residues in 24 hours. This rate of deamination corresponds to about 100 spontaneous events per day, on average, in a mammalian cell. Deamination of adenine and guanine occurs at about 1/100th this rate. FIGURE 8-29 Some well-characterized nonenzymatic reactions of nucleotides. (a) Deamination reactions. Only the base is shown. (b) Depurination, in which a purine is lost by hydrolysis of the N-β -glycosyl bond. Loss of pyrimidines through a similar reaction occurs, but much more slowly. The resulting lesion, in which the deoxyribose is present but the base is not, is called an abasic site or an AP site (apurinic site or, rarely, apyrimidinic site). The deoxyribose remaining a er depurination is readily converted from the β -furanose to the aldehyde form (see Fig. 8-3), further destabilizing the DNA at this position. More nonenzymatic reactions are illustrated in Figures 8-30 and 8-31. The slow cytosine deamination reaction seems innocuous enough, but it is almost certainly the reason why DNA contains thymine rather than uracil. The product of cytosine deamination (uracil) is readily recognized as foreign in DNA and is removed by a repair system (Chapter 25). If DNA normally contained uracil, recognition of uracils resulting from cytosine deamination would be more difficult, and unrepaired uracils would lead to permanent sequence changes as they were paired with adenines during replication. Cytosine deamination would gradually lead to a decrease in G≡C base pairs and an increase in A═U base pairs in the DNA of all cells. Over the millennia, cytosine deamination could eliminate G≡C base pairs and the genetic code that depends on them. Establishing thymine as one of the four bases in DNA may well have been one of the crucial turning points in evolution, making the long-term storage of genetic information possible. Another important reaction in deoxyribonucleotides is the hydrolysis of the N-β -glycosyl bond between the base and the pentose. The base is lost, creating a DNA lesion called an AP (apurinic, apyrimidinic) site or abasic site (Fig. 8-29b). Purines are lost at a higher rate than pyrimidines. As many as one in 105 purines (10,000 per mammalian cell) are lost from DNA every 24 hours under typical cellular conditions. Depurination of ribonucleotides and RNA is much slower and less physiologically significant. In the test tube, loss of purines can be accelerated by dilute acid. Incubation of DNA at pH 3 causes selective removal of the purine bases, resulting in a derivative called apurinic acid. Other reactions are promoted by radiation. UV light induces the condensation of two ethylene groups to form a cyclobutane ring. In the cell, the same reaction between adjacent pyrimidine bases in nucleic acids forms cyclobutane pyrimidine dimers. This happens most frequently between adjacent thymidine residues on the same DNA strand (Fig. 8-30). A second type of pyrimidine dimer, called a 6-4 photoproduct, is also formed during UV irradiation. Ionizing radiation (x-rays and gamma rays) can cause ring opening and fragmentation of bases as well as breaks in the covalent backbone of nucleic acids. FIGURE 8-30 Formation of pyrimidine dimers induced by UV light. (a) One type of reaction (on the le ) results in the formation of a cyclobutyl ring involving C-5 and C-6 of adjacent pyrimidine residues. An alternative reaction (on the right) results in a 6-4 photoproduct, with a linkage between C-6 of one pyrimidine and C-4 of its neighbor. (b) Formation of a cyclobutane pyrimidine dimer introduces a bend or kink into the DNA. [(b) Data from PDB ID 1TTD, K. McAteer et al., J. Mol. Biol. 282:1013, 1998.] Virtually all forms of life are exposed to energy-rich radiation capable of causing chemical changes in DNA. Near-UV radiation (with wavelengths of 200 to 400 nm), which makes up a significant portion of the solar spectrum, is known to cause pyrimidine dimer formation and other chemical changes in the DNA of bacteria and of human skin cells. We are subjected to a constant field of ionizing radiation in the form of cosmic rays, which can penetrate deep into the earth, as well as radiation emitted from radioactive elements, such as radium, plutonium, uranium, radon, 14C, and 3H. X-rays used in medical and dental examinations and in radiation therapy of cancer and other diseases are another form of ionizing radiation. It is estimated that UV and ionizing radiations are responsible for about 10% of all DNA damage caused by environmental agents. DNA also may be damaged by reactive chemicals introduced into the environment as products of industrial activity. Such products may not be injurious per se but may be metabolized by cells into forms that are. There are two prominent classes of such agents (Fig. 8-31): (1) deaminating agents, particularly nitrous acid (HNO2) or compounds that can be metabolized to nitrous acid or nitrites, and (2) alkylating agents.
FIGURE 8-31 Chemical agents that cause DNA damage. (a) Precursors of nitrous acid, which promotes deamination reactions. (b) Alkylating agents. Most generate modified nucleotides nonenzymatically. Nitrous acid, formed from organic precursors such as nitrosamines and from nitrite and nitrate salts, is a potent accelerator of the deamination of bases. Bisulfite has similar effects. Both agents are used as preservatives in processed foods to prevent the growth of toxic bacteria. They do not seem to increase cancer risks significantly when used in this way, perhaps because they are used in only small amounts and make only a minor contribution to the overall levels of DNA damage. (The potential health risk from food spoilage if these preservatives were not used is much greater.) Alkylating agents can alter certain bases of DNA. For example, the highly reactive chemical dimethylsulfate (Fig. 8-31b) can methylate a guanine to yield O6-methylguanine, which cannot base-pair with cytosine. Some alkylation of bases is a normal part of the regulation of gene expression. The enzymatic methylation of certain bases using S-adenosyl methionine is one example discussed below. The most important source of mutagenic alterations in DNA is oxidative damage. Reactive oxygen species such as hydrogen peroxide, hydroxyl radicals, and superoxide radicals arise during irradiation or (more commonly) as a byproduct of aerobic metabolism. These species damage DNA through any of a large, complex group of reactions, ranging from oxidation of deoxyribose and base moieties to strand breaks. Of these species, the hydroxyl radicals are responsible for most oxidative DNA damage. Cells have an elaborate defense system to destroy reactive oxygen species, including enzymes such as catalase and superoxide dismutase that convert reactive oxygen species to harmless products. A fraction of these oxidants inevitably escape cellular defenses, however, and are able to damage DNA. Accurate estimates for the extent of this damage are not yet available, but every day the DNA of each human cell is subjected to thousands of damaging oxidative reactions. This is merely a sampling of the best-understood reactions that damage DNA. Many carcinogenic compounds in food, water, and air exert their cancer-causing effects by modifying bases in DNA. Nevertheless, the integrity of DNA as a polymer is better maintained than that of either RNA or protein, because DNA is the only macromolecule that has the benefit of extensive biochemical repair systems. These repair processes (described in Chapter 25) greatly lessen the impact of damage to DNA. Some Bases of DNA Are Methylated Certain nucleotide bases in DNA molecules are enzymatically methylated. Adenine and cytosine are methylated more o en than guanine and thymine. Methylation is generally confined to certain sequences or regions of a DNA molecule. In some cases, the function of methylation is well understood; in others, the function remains unclear. All known DNA methylases use S- adenosylmethionine as a methyl group donor (Fig. 8-31b). E. coli has two prominent DNA methylation systems. One serves in a defense role, allowing the cell to distinguish its DNA from foreign DNA by marking its own DNA with methyl groups. The cell can then identify as foreign and destroy DNA without the methyl groups (this is known as a restriction-modification system; see p. 303). The other enzyme system methylates adenosine residues within the sequence (5′)GATC(3′) to N6-methyladenosine (Fig. 8- 5a). Methyl groups are added by the Dam (DNA adenine methylation) methylase shortly a er DNA replication, allowing the cell to distinguish newly replicated DNA from older cellular DNA (see Fig. 25-20). In eukaryotic cells, about 5% of cytidine residues in DNA are methylated to 5-methylcytidine (Fig. 8-5a). Methylation is most common at CpG sequences, producing methyl-CpG symmetrically on both strands of the DNA. The extent of methylation of CpG sequences varies by region in large eukaryotic DNA molecules, affecting DNA metabolism and gene expression. The Chemical Synthesis of DNA Has Been Automated An important practical advance in nucleic acid chemistry was the rapid and accurate synthesis of short oligonucleotides of known sequence. The methods were pioneered by H. Gobind Khorana and his colleagues in the 1970s. Refinements by Robert Letsinger and Marvin Caruthers led to the chemistry now in widest use, called the phosphoramidite method (Fig. 8-32). The synthesis is carried out with the growing strand attached to a solid support, using principles similar to those used by Merrifield for peptide synthesis (see Fig. 3-30), and is readily automated. The efficiency of each addition step is very high, allowing the routine synthesis of polymers containing 70 or 80 nucleotides and, in some laboratories, much longer strands. The availability of relatively inexpensive DNA polymers with predesigned sequences revolutionized all areas of biochemistry. FIGURE 8-32 Chemical synthesis of DNA by the phosphoramidite method. Automated DNA synthesis is conceptually similar to the synthesis of polypeptides on a solid support. The oligonucleotide is built up on the solid support (silica), one nucleotide at a time, in a repeated series of chemical reactions with suitably protected nucleotide precursors. The first nucleoside (which will be the 3′ end) is attached to the silica support at the 3′ hydroxyl (through a linking group, R) and is protected at the 5′ hydroxyl with an acid- labile dimethoxytrityl group (DMT). The reactive groups on all bases are also chemically protected. The protecting DMT group is removed by washing the column with acid (the DMT group is colored, so this reaction can be followed spectrophotometrically). The next nucleotide has a reactive phosphoramidite at its 3′ position: a trivalent phosphite (as opposed to the more oxidized pentavalent phosphate normally present in nucleic acids) with one linked oxygen replaced by an amino group or a substituted amine. In the common variant shown, one of the phosphoramidite oxygens is bonded to the deoxyribose, the other is protected by a cyanoethyl group, and the third position is occupied by a readily displaced diisopropylamino group. Reaction with the immobilized nucleotide forms a 5′,3′ linkage, and the diisopropylamino group is eliminated. In step , the phosphite linkage is oxidized with iodine to produce a phosphotriester linkage. Reactions 2 through 4 are repeated until all nucleotides are added. At each step, excess nucleotide is removed before addition of the next nucleotide. In steps and the remaining protecting groups on the bases and the phosphates are removed, and in the oligonucleotide is separated from the solid support and purified. The chemical synthesis of RNA is somewhat more complicated because of the need to protect the 2′ hydroxyl of ribose without adversely affecting the reactivity of the 3′ hydroxyl. Gene Sequences Can Be Amplified with the Polymerase Chain Reaction Genome projects, as described in Chapter 9, have given rise to online databases containing the complete genome sequences of thousands of organisms. This archive of sequence information allows researchers to greatly amplify any DNA segment they might be interested in with the polymerase chain reaction (PCR), a process conceived by Kary Mullis in 1983. Even DNA segments with unknown sequences can be amplified if the sequences flanking them are known. The amplified DNA can then be used for a multitude of purposes, as we shall see. The PCR procedure, shown in Figure 8-33, relies on DNA polymerases, enzymes that synthesize DNA strands from deoxyribonucleotides (dNTPs), using a DNA template. DNA polymerases do not synthesize DNA de novo, but instead must add nucleotides to the 3′ ends of preexisting strands, referred to as primers (see Chapter 25). In PCR, two synthetic oligonucleotides are prepared for use as replication primers that can be extended by a DNA polymerase. These oligonucleotide primers are complementary to sequences on opposite strands of the target DNA, positioned so that their 5′ ends define the ends of the segment to be amplified, and they become part of the amplified sequence. The 3′ ends of the annealed primers are oriented toward each other and positioned to prime DNA synthesis across the targeted DNA segment.
FIGURE 8-33 Amplification of a DNA segment by the polymerase chain reaction (PCR). The PCR procedure has three steps: DNA strands are separated by heating, then annealed to an excess of short synthetic DNA primers (orange) that flank the region to be amplified (dark blue); new DNA is synthesized by polymerization catalyzed by DNA polymerase. The thermostable Taq DNA polymerase is not denatured by the heating steps. The three steps are repeated for 25 or 30 cycles in an automated process carried out in a small benchtop instrument called a thermocycler. The PCR procedure has an elegant simplicity. Basic PCR requires four components: a DNA sample containing the segment to be amplified, the pair of synthetic oligonucleotide primers, a pool of deoxynucleoside triphosphates, and a DNA polymerase. There are three steps (Fig. 8-33). In step , the reaction mixture is heated briefly to denature the DNA, separating the two strands. In step , the mixture is cooled so that the primers can anneal to the DNA. The high concentration of primers increases the likelihood that they will anneal to each strand of the denatured DNA before the two DNA strands (present at a much lower concentration) can reanneal to each other. Then, in step , the primed segment is replicated selectively by the DNA polymerase, using the pool of dNTPs. The cycle of heating, cooling, and replication is repeated 25 to 30 times over a few hours in an automated process, amplifying the DNA segment between the primers until the sample is large enough to be readily analyzed or cloned (described in Chapter 9). Each replication cycle doubles the number of target DNA segment copies, so the concentration grows exponentially. The flanking DNA sequences increase in number linearly, but this effect is quickly rendered insignificant. A er 20 cycles, the targeted DNA segment has been amplified more than a millionfold (220); a er 30 cycles, more than a billionfold. Step of PCR uses a heat- stable DNA polymerase such as the Taq polymerase, isolated from a thermophilic bacterium (Thermus aquaticus) that thrives in hot springs where temperatures approach the boiling point of water. The Taq polymerase remains active a er every heating step (step ) and does not have to be replenished. This technology is highly sensitive: PCR can detect and amplify just one DNA molecule in almost any type of sample — including some ancient ones. The double-helical structure of DNA is highly stable, but as we have seen, DNA does degrade slowly over time through various nonenzymatic reactions. PCR has allowed the successful cloning of rare, undegraded DNA segments isolated from samples more than 40,000 years old. Investigators have used the technique to clone DNA fragments from the mummified remains of humans and extinct animals, such as the woolly mammoth, creating the research fields of molecular archaeology and molecular paleontology. DNA from burial sites has been amplified by PCR and used to trace ancient human migrations (see Fig. 9-31). Epidemiologists use PCR-enhanced DNA samples from human remains to trace the evolution of human pathogenic viruses. Due to its capacity to amplify just a few strands of DNA that might be present in a sample, PCR is a potent tool in forensic medicine (Box 8-1). It is also being used to detect viral infections and certain types of cancers before they cause symptoms, as well as in the prenatal diagnosis of genetic diseases. BOX 8-1 A Potent Weapon in Forensic Medicine One of the most accurate methods for placing an individual at the scene of a crime is a fingerprint. But with the advent of recombinant DNA technology (see Chapter 9), a much more powerful tool became available: DNA genotyping (also called DNA fingerprinting or DNA profiling). As first described by English geneticist Alec Jeffreys in 1985, the method is based on sequence polymorphisms, slight sequence differences among individuals — 1 in every 1,000 bp, on average. Each difference from the prototype human genome sequence (the first human genome that was sequenced) occurs in some fraction of the human population; every person has some differences from this prototype. Forensic work focuses on differences in the lengths of short tandem repeat (STR) sequences. An STR locus is a specific location on a chromosome where a short DNA sequence (usually 4 bp long) is repeated many times in tandem. The loci most o en used in STR genotyping are short — 4 to 50 repeats (16 to 200 bp for tetranucleotide repeats) — and have multiple length variants in the human population. More than 20,000 tetranucleotide STR loci have been characterized in the human genome. And more than a million STRs of all types may be present in the human genome, accounting for about 3% of all human DNA. The length of a particular STR in a given individual can be determined with the aid of the polymerase chain reaction (see Fig. 8-33). The use of PCR also makes the procedure sensitive enough to be applied to the very small samples o en collected at crime scenes. The DNA sequences flanking STRs are unique to each STR locus and are identical (except for very rare mutations) in all humans. PCR primers are targeted to this flanking DNA and are designed to amplify the DNA across the STR (Fig. 1a). The length of the PCR product then reflects the length of the STR in that sample. Because each human inherits one chromosome of each chromosome pair from each parent, the STR lengths on the two chromosomes are o en different, generating two different STR lengths from one individual. FIGURE 1 (a) STR loci can be analyzed by PCR. Suitable PCR primers (with an attached dye to aid in subsequent detection) are targeted to sequences on each side of the STR, and the region between them is amplified. If the STR sequences have different lengths on the two chromosomes of an individual’s chromosome pair, two PCR products of different lengths result. (b) The PCR products from amplification of up to 16 STR loci can be run on a single capillary acrylamide gel (a “16-plex” analysis). Determination of which locus corresponds to which signal depends on the color of the fluorescent dye attached to the primers used in the process and on the size range in which the signal appears (the size range can be controlled by which sequences — those closer to or more distant from the STR — are targeted by the designed PCR primers). Fluorescence is given in relative fluorescence units (RFU), as measured against a standard supplied with the kit. [(b) Information from Carol Bingham, Promega Corporation.] The PCR products are subjected to electrophoresis on a very thin polyacrylamide gel in a capillary tube. The resulting bands are converted into a set of peaks that accurately reveal the size of each PCR fragment and thus the length of the STR in the corresponding allele. Analysis of multiple STR loci can yield a profile that is unique to an individual (Fig. 1b). This is typically done with a commercially available kit that includes PCR primers unique to each locus, linked to colored dyes to help distinguish the different PCR products. PCR amplification enables investigators to obtain STR genotypes from less than 1 ng of partially degraded DNA, an amount that can be obtained from a single hair follicle, a small fraction of a drop of blood, a small semen sample, or samples that might be months or even many years old. When good STR genotypes are obtained, the chance of misidentification is less than 1 in 1018 (a quintillion). The successful forensic use of STR analysis required standardization, first attempted in the United Kingdom in 1995. The U.S. standard, called the Combined DNA Index System (CODIS), established in 1998, was originally based on 13 well-studied STR loci. These continue to be required in any DNA-typing experiment carried out in the United States (Table 1) and are also used internationally. The amelogenin gene is also used as a marker in the analyses. Present on the human sex chromosomes, this gene has a slightly different length on the X and Y chromosomes. PCR amplification across this gene thus generates different-sized products that can reveal the sex of the DNA donor. By mid-2019, the CODIS database contained more than 18 million STR genotypes and had assisted in nearly 500,000 forensic investigations. As the CODIS database has expanded, the chance for adventitious matches has increased. The CODIS standard was expanded in 2017 to include 20 core loci. The new loci were incorporated with international agreement to ensure compatibility. Loci utilized in commercial kits have since expanded from 16 to 24. TABLE 1 Properties of the Loci Used for the CODIS Database Locus Chromoso me Repeat motif Repeat length (range) Number of alleles seen CSF1PO 5 TAGA 5–16 20 FGA 4 CTTT 12.2–51.2 80 a b TH01 11 TCAT 3–14 20 TPOX 2 GAAT 4–16 15 VWA 12 [TCTG] [TCTA] 10–25 28 D3S1358 3 [TCTG] [TCTA] 8–21 24 D5S818 5 AGAT 7–18 15 D7S820 7 GATA 5–16 30 D8S1179 8 [TCTA] [TCTG] 7–20 17 D13S317 13 TATC 5–16 17 D16S539 16 GATA 5–16 19 D18S51 18 AGAA 7–39.2 51 D21S11 21 [TCTA] [TCTG] 12–41.2 82 Amelogenin X, Y Not applicable Data from J. M. Butler, Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers, 2nd edn, Elsevier, 2005, p. 96. Repeat lengths observed in the human population. Partial or imperfect repeats can be included in some alleles. Number of different alleles observed as of 2005 in the human population. Careful analysis of a locus in many individuals is a prerequisite to its use in forensic DNA typing. Amelogenin is a gene, of slightly different size on the X and Y chromosomes, that is used to establish gender. DNA genotyping has been used to both convict and acquit suspects, and to establish paternity with an extraordinary degree of certainty. In the United c a b c States, there have been many hundreds of postconviction exonerations based on DNA evidence. The impact of these procedures on court cases will continue to grow as standards are refined and as international STR genotyping databases grow. Even very old mysteries can be solved. In 1996, STR genotyping helped confirm identification of the bones of the last Russian czar and his family, who were assassinated in 1918. WORKED EXAMPLE 8-2 Designing Primers for the Polymerase Chain Reaction You set out to amplify the chromosomal sequence between the bases underlined below, using PCR. Only one strand is shown, but keep in mind that it is paired to a complementary strand. TGGTAGGCCGAT – – – [1,000 bp] – – – TAGCTAAGAATCTTTCTCAGAA Design single-stranded oligonucleotide primers to amplify only those sequences between (and including) the underlined bases. The optimal length for PCR primers is usually 18 to 22 nucleotides. For this example, simply write the first 6 nucleotides of each primer. SOLUTION: Le primer GTAGGC Right primer AAGATT Remember, (1) the two strands of DNA are antiparallel, (2) DNA synthesis proceeds uniquely in the 5′→ 3′ direction, (3) DNA synthesis must be directed across the region to be amplified, and (4) DNA sequences are always written in the 5′→ 3′ direction. The sequence given is thus oriented 5′→ 3′, le to right, even though no orientation guides are provided. The le primer must be complementary to the strand not shown, which is in the opposite orientation. Thus, the le primer begins at the G and is identical to the sequence shown. The right primer must direct DNA synthesis right to le , synthesizing a strand complementary to the strand provided. It will begin with an A complementary to the T, and then continue with additional nucleotides complementary to the strand shown. Although that sequence is written right to le , it is in the 3′→ 5′ direction and must be flipped to be in the conventional orientation, written 5′→ 3′, le to right. Companies that provide PCR primers expect orders to be written in the conventional 5′→ 3′ direction. Doing otherwise is a common (and expensive) mistake. The Sequences of Long DNA Strands Can Be Determined In its capacity as a repository of information, a DNA molecule’s most important property is its nucleotide sequence. Until the late 1970s, determining the sequence of a nucleic acid containing as few as 5 or 10 nucleotides was very laborious. The development of two techniques in 1977 (one by Allan Maxam and Walter Gilbert, the other by Frederick Sanger) made possible the sequencing of larger DNA molecules. Although the two methods are similar in strategy, Sanger sequencing, also known as dideoxy chain-termination sequencing, is both technically easier and more accurate (Fig. 8-34). FIGURE 8-34 DNA sequencing by the Sanger method. This method makes use of the mechanism of DNA synthesis by DNA polymerases (Chapter 25). (a) DNA polymerases require both a primer (a short oligonucleotide strand), to which nucleotides are added, and a template strand to guide the selection of each new nucleotide. In cells, the 3′- hydroxyl group of the primer reacts with an incoming deoxynucleoside triphosphate — dGTP in this example — to form a new phosphodiester bond. The Sanger sequencing procedure uses dideoxynucleoside triphosphate (ddNTP) analogs to interrupt DNA synthesis. When a ddNTP — ddATP in this example — is inserted in place of a dNTP, strand elongation is halted a er the analog is added, because the analog lacks the 3′- hydroxyl group needed for the next step. (b) Dideoxynucleoside triphosphate analogs have —H (red) rather than —OH at the 3′ position of the ribose ring. (c) The DNA to be sequenced is used as the template strand, and a short primer, radioactively (in the example here) or fluorescently labeled, is annealed to it. The result is a solution containing a mixture of labeled fragments of particular length, each ending with a C residue. The different-sized fragments, separated by electrophoresis, reveal the location of C residues. This procedure is repeated separately for each of the four ddNTPs, and the sequence can be read directly from an autoradiogram of the gel. Because shorter DNA fragments migrate faster, the fragments near the bottom of the gel represent the nucleotide positions closest to the primer (the 5′ end), and the sequence is read (in the 5′→ 3′ direction) from bottom to top. Note that the sequence obtained is that of the strand complementary to the strand being analyzed. Any protocol for DNA sequencing has two parts. One must first chemically distinguish between G, C, T, and A residues. A second strategy is then needed to determine where the four residues appear in the overall sequence. Sanger sequencing exploited new (at the time) information about the mechanism of DNA synthesis by DNA polymerase to distinguish between the four nucleotides, making use of nucleotide analogs called dideoxynucleoside triphosphates (ddNTPs) to interrupt synthesis specifically at one or another type of nucleotide. Like PCR, Sanger’s method makes use of DNA polymerases and a primer to synthesize a DNA strand complementary to the strand under analysis. Each added deoxynucleotide is complementary, through base pairing, to a base in the template strand. In the reaction catalyzed by DNA polymerase, the 3′-hydroxyl group of the primer reacts with an incoming dNTP to form a new phosphodiester bond (Fig. 8-34a). The ddNTPs interrupt DNA synthesis because they bind to the template strand but lack the 3′-hydroxyl group needed to add the next nucleotide (Fig. 8-34b). Once a ddNTP is added to a growing strand, that strand cannot be extended further. For instance, to identify C residues, a small amount of ddCTP is added to a reaction system containing a much larger amount of dCTP (along with the other three dNTPs). A competition then occurs every time the DNA polymerase encounters a G in the template strand. Usually, dC is added, and synthesis of the strand continues. Sometimes, ddC will be added instead, and the strand will be terminated at that position. Thus, a small fraction of the synthesized strands are prematurely terminated at every position where dC would normally be added, opposite each template dG. Given the excess of dCTP over ddCTP, the chance that the analog will be incorporated instead of dC is small. But enough ddCTP is present to ensure that some of the strands will be terminated at each G residue in the template. The result is a solution containing a mixture of fragments, each ending with a ddC residue. The fragments differ in length, and locating the C residues then relies on precise electrophoretic methods that allow separation of DNA strands differing in size by only one nucleotide residue. (See Fig. 3-18 for a description of gel electrophoresis.) Note that in most sequencing protocols, the sequence obtained is that of the newly synthesized strand complementary to the template strand being analyzed. When this procedure was first developed, the process was repeated separately for each of the four ddNTPs. Radioactively labeled primers allowed researchers to detect the DNA fragments generated during the DNA synthesis reactions. The sequence of the synthesized DNA strand was read directly from an autoradiogram of the resulting gel (Fig. 8-34c). Because shorter DNA fragments migrate faster, the fragments near the bottom of the gel represented the nucleotide positions closest to the primer (the 5′ end), and the sequence was read (in the 5′→ 3′ direction) from bottom to top. DNA sequencing was first automated by a variation of the Sanger method, in which each of the four ddNTPs used for a reaction was labeled with a different-colored fluorescent tag (Fig. 8-35). With this technology, all four fluorescent ddNTPs could be introduced into a single reaction. The terminated fragments, each of a different size, could be separated by electrophoresis in a single gel lane. The identity of the residue that terminated each fragment was made evident by its fluorescent color. Researchers could sequence DNA molecules containing thousands of nucleotides in a few hours, and the entire genomes of hundreds of organisms were sequenced in this way. For example, in the Human Genome Project, researchers sequenced all 3.2× 109 bp of the DNA in a human cell (see Chapter 9) in an effort that spanned nearly a decade and included contributions from dozens of laboratories worldwide. This form of Sanger sequencing is still used for routine analysis of short segments of DNA.
FIGURE 8-35 Automation of DNA sequencing reactions. In the Sanger method, each ddNTP can be linked to a fluorescent (dye) molecule that gives the same color to all the fragments terminating in that nucleotide, with a different color for each nucleotide. All four labeled ddNTPs are added to the reaction mix together. The resulting colored DNA fragments are separated by size in an electrophoretic gel in a capillary tube (a refinement of gel electrophoresis that allows faster separations). All fragments of a given length migrate through the capillary gel together in a single band, and the color associated with each band is detected with a laser beam. The DNA sequence is read by identifying the color sequences in the bands as they pass the detector and feeding this information directly to a computer. The amount of fluorescence in each band is represented as a peak in the computer output. [Data from Dr. Lloyd Smith, University of Wisconsin– Madison, Department of Chemistry.] DNA Sequencing Technologies Are Advancing Rapidly The billions of base pairs in a complete human genome can now be sequenced in a day or two, the millions in a bacterial genome in a few hours. With modest expense, a personal genomic sequence can be routinely included in each individual’s medical record. These advances have been made possible by methods sometimes referred to as next-generation, or “next-gen,” sequencing. The sequencing strategies have some similarities to the Sanger method. Innovations have allowed a miniaturization of the procedure, a massive increase in scale, and a corresponding decrease in cost. Two widely used approaches, reversible terminator sequencing and single molecule real time (SMRT) sequencing, both developed commercially, are described. In both approaches, large genomes are sequenced by first collecting the DNA from many cells of the organism or individual. The DNA is sheared at random locations to generate fragments of a particular average size. The individual fragments — many with overlapping sequences — are immobilized on a solid support, and each is sequenced in place. Fluorescent dyes linked to the nucleotides and powerful optical systems that can detect incorporation of each new nucleotide by DNA polymerase allow the sequencing process to be monitored directly. The fluorescent dyes and the design of the methods solve both problems in DNA sequencing at once, identifying each nucleotide and fixing its location in the large sequence. Individual regions of a genome may be sequenced hundreds, even thousands, of times. The entire genomic sequence is reconstructed by computer programs that align the sequences of overlapping fragments. In the reversible terminator sequencing method developed by Illumina, the genomic DNA to be sequenced is sheared so as to generate fragments a few hundred base pairs long. Synthetic oligonucleotides of known sequence are ligated to each end of each fragment, providing a point of reference on every DNA molecule. The individual fragments are then immobilized on a solid surface, and each is amplified in place by PCR to form a tight cluster of identical fragments. The solid surface is part of a channel on a flow cell that allows liquid solutions to stream over the samples. The result is a solid surface just a few centimeters wide with millions of attached DNA clusters, each cluster containing multiple copies of a single DNA sequence derived from a random genomic DNA fragment. To provide a starting point for DNA polymerase, an oligonucleotide primer is then added that is complementary to the oligonucleotides of known sequence ligated to the various fragment ends. All of these millions of clusters are sequenced at the same time, with the data from each cluster captured and stored by a computer. The actual sequencing of each cluster employs reversible terminator sequencing (Fig. 8-36). Four different modified deoxynucleotides (A, T, G, and C), each with a particular fluorescent label that identifies the nucleotide by color, are added to the sequencing reaction, along with the DNA polymerase. The labeled nucleotides also incorporate blocking groups attached to their 3′ ends that permit only one nucleotide to be added to each strand. The polymerase adds the appropriate nucleotide to the strands in each cluster, giving each cluster a color that corresponds to the added nucleotide. Next, lasers excite all the fluorescent labels, and an image of the entire surface reveals the color (and thus the identity) of the base added to each cluster. The fluorescent label and the blocking groups are then chemically or photolytically removed. The surface goes dark until the solution with labeled nucleotides and DNA polymerase is again introduced to the surface, allowing the next nucleotide to be added to each cluster. The sequencing proceeds stepwise. Read lengths obtained with this technology (that is, the length of individual DNA sequences that can be accurately determined) are typically 100 to 300 nucleotides. Read length is limited by constraints on cluster density on the flow cell surface and by small inefficiencies in the PCR reactions needed to amplify each cluster accurately. Accuracy is high, with error rates as low as 0.1%. FIGURE 8-36 Next-generation reversible terminator sequencing. (a) Blocking groups on each fluorescently labeled nucleotide prevent multiple nucleotides from being added in a single cycle. (b) Artist’s rendition of nine successive cycles from one very small part of an Illumina sequencing run. Each colored spot represents the location of a cluster of immobilized identical oligonucleotides affixed to the surface of the flow cell. The white- circled spots represent the same two clusters on the surface over successive cycles, with the sequences indicated. Data are recorded and analyzed digitally. (c) Typical flow cell used for a next-generation sequencer. Millions of DNA fragments can be sequenced simultaneously in each of the four channels. (d) A dCTP molecule modified with a fluorescent dye and a 3′ blocking group for use in reversible terminator sequencing. Both the dye and the 3′-end-blocking group can be removed, either chemically or photolytically, leaving a free 3′-OH group for addition of the next nucleotide. The modified nucleotides currently used in reversible terminator sequencing are proprietary. (e) Part of the surface of one channel during a sequencing reaction. The relatively short read lengths produced by the Illumina technology are problematic in some situations, such as the sequencing of long stretches of DNA where short sequences are repeated over and over. Pacific Biosciences has pioneered the single-molecule real time (SMRT) sequencing method that allows read lengths averaging up to 30,000 to 40,000 bp (Fig. 8-37). The SMRT technology has a lower throughput, a higher cost, and a higher error rate than the Illumina approach. However, the very long read lengths are essential in some applications, particularly to reconstruct the complete genomes of higher organisms that may contain extensive regions of repeated DNA sequences. They also facilitate the detection of genomic alterations — deletions, duplications, or rearrangements of genomic segments — that arise in some cells, such as those in cancerous tumors (see Box 24-2). FIGURE 8-37 SMRT sequencing. (a) One pore in an SMRT cell. The pore is smaller in diameter than the wavelength of visible light, so that light projected at the bottom penetrates only a short distance into the pore. (b) DNA fragments sequenced by SMRT technology. Hairpin oligonucleotides are ligated to both ends. A primer for DNA synthesis is annealed to one and sometimes both single-stranded regions in the hairpin ends. (c) DNA synthesis by a DNA polymerase immobilized within a SMRT pore. A fluorescent dye is attached to the triphosphate of the nucleoside triphosphates used. The dye is released from each nucleotide incorporated into the growing DNA strand and its wavelength recorded. SMRT sequencing utilizes SMRT cells with 150,000 pores, each pore smaller in diameter (~70 nm) than the wavelength of visible light. Attenuated light from an excitation beam penetrates only the lower 20 to 30 nm in each pore, producing a light volume small enough to accommodate just one DNA polymerase molecule (Fig. 8-37a). A single DNA polymerase is immobilized at the bottom of each pore. Genomic DNA is sheared to generate random fragments that are tens of thousands of base pairs in length. Hairpin oligonucleotides are ligated to both ends of each fragment, so that the two DNA strands are joined in one continuous circle (Fig. 8-37b). A primer is annealed to the open single-stranded DNA at one end of the fragment, and this is captured by a DNA polymerase in one of the pores to initiate DNA synthesis. Fluorescent nucleotides are introduced, with A, T, G, and C each having a specific colored dye attached to the triphosphate. When a nucleotide binds to the DNA polymerase in the pore, it is immobilized long enough to produce a brief fluorescent light pulse that can be read by a detector (Fig. 8-37c). The fluorescent dye is released with the pyrophosphate as each new phosphodiester bond is formed. The error rate is high, about 10% to 15%. However, as the DNA template is a continuous circle, one DNA polymerase can replicate the same fragment over and over. The light pulses continue without interruption as new nucleotides are added in real time, and each pore thus generates a movie of light pulses (sometimes several hours long) that corresponds to the repeated sequencing of the DNA fragment bound in that pore. A computer program deletes the known sequences of the hairpin ends. Error is reduced by compiling a consensus sequence of the fragment by automated alignment of the many repeated sequencing passes and acceptance of the most common nucleotide signal detected at each position. Translating the sequences of millions of short DNA fragments into a complex and contiguous genomic sequence requires the computerized alignment of overlapping fragments (Fig. 8-38). The number of times that a particular nucleotide in a genome is sequenced, on average, is referred to as the sequencing depth or sequencing coverage. In many cases, a sufficiently large number of random fragments are sequenced so that each nucleotide in the genome is sequenced an average of hundreds to thousands of times (100× to 1,000× coverage). Although the coverage of particular nucleotides may vary, a high level of coverage ensures that most sequencing errors will be detected and eliminated. The overlaps allow the computer to trace the sequence through a chromosome, from one overlapping fragment to another, permitting the assembly of long, contiguous sequences called contigs. In a successful genomic sequencing exercise, many contigs can extend over millions of base pairs. FIGURE 8-38 Sequence assembly. In a genomic sequence, each base pair of the genome is usually represented in multiple sequenced fragments, referred to as reads. This schematic shows how the overlaps between reads are used to assemble a contiguous segment of the genomic sequence, or contig. The numbers at the top represent base-pair positions in the genome, relative to an arbitrarily defined reference point. All sequence fragments come from a particular long contig. The reads are represented by horizontal colored bars. DNA strand segments are sequenced at random, with sequences obtained from one strand (5′ to 3′, le to right) or the other strand (5′ to 3′, right to le ) represented by blue lines or red lines, respectively. The coverage bar at the top indicates how many times the sequence at a particular position has appeared in a sequenced read, with higher numbers corresponding to increased quality of the output sequence data. The rapid evolution of DNA sequencing technologies shows no signs of slowing. As costs plummet and sensitivity increases, new applications come online to enhance medicine, forensics, archaeology, and many other fields. We present some of those applications in Chapter 9. SUMMARY 8.3 Nucleic Acid Chemistry Native DNA undergoes reversible unwinding and separation of strands (melting) upon heating or at extremes of pH. DNAs rich in G≡C pairs have higher melting points than DNAs rich in A═T pairs. DNA is a relatively stable polymer. Spontaneous reactions such as deamination of certain bases, hydrolysis of base-sugar N- glycosyl bonds, radiation-induced formation of pyrimidine dimers, and oxidative damage occur at very low rates, yet are important because of a cell’s very low tolerance for changes in its genetic material. DNA is subject to enzymatic modification of nucleotide bases at particular locations. Methylated bases are common. Oligonucleotides of known sequence can be synthesized rapidly and accurately. The polymerase chain reaction (PCR) provides a convenient and rapid method for amplifying segments of DNA if the sequences of the ends of the targeted DNA segment are known. Routine DNA sequencing of genes or short DNA segments is carried out using an automated variation of Sanger dideoxy sequencing. DNA sequences, including entire genomes, can be efficiently determined in hours or days using commercial next-gen sequencing technologies. 8.4 Other Functions of Nucleotides In addition to their roles as the subunits of nucleic acids, nucleotides have cellular functions as energy carriers, components of enzyme cofactors, and chemical messengers. Nucleotides Carry Chemical Energy in Cells The phosphate group covalently linked at the 5′ hydroxyl of a ribonucleotide may have one or two additional phosphates attached. The resulting molecules are referred to as nucleoside mono-, di-, and triphosphates (Fig. 8-39). Starting from the ribose, the three phosphates are generally labeled α , β , and γ . Hydrolysis of nucleoside triphosphates provides the chemical energy to drive many cellular reactions. Adenosine 5′- triphosphate, ATP, is by far the most widely used nucleoside triphosphate for this purpose, but UTP, GTP, and CTP are also used in some reactions. Nucleoside triphosphates also serve as the activated precursors of DNA and RNA synthesis, as described in Chapters 25 and 26 (see also Fig. 8-34). FIGURE 8-39 Nucleoside phosphates. General structure of the nucleoside 5′-mono-, di-, and triphosphates (NMPs, NDPs, and NTPs) and their standard abbreviations. In the deoxyribonucleoside phosphates (dNMPs, dNDPs, and dNTPs), the pentose is 2′-deoxy- -ribose. The energy released by hydrolysis of ATP and the other nucleoside triphosphates is accounted for by the structure of the triphosphate group. The bond between the ribose and the α phosphate is an ester linkage. The α , β and β , γ linkages are phosphoanhydrides (Fig. 8-40). Hydrolysis of the ester linkage yields about 14kJ/mol under standard conditions, whereas hydrolysis of each anhydride bond yields about 30kJ/mol. ATP hydrolysis o en plays an important thermodynamic role in biosynthesis. When coupled to a reaction with a positive free-energy change, ATP hydrolysis shi s the equilibrium of the overall process to favor product formation (recall the relationship between the equilibrium constant and free-energy change described by Eqn 6-3 on p. 182). FIGURE 8-40 The phosphate ester and phosphoanhydride bonds of ATP. Hydrolysis of an anhydride bond yields more energy than hydrolysis of the ester. A carboxylic acid anhydride and a carboxylic acid ester are shown for comparison. Adenine Nucleotides Are Components of Many Enzyme Cofactors Enzyme cofactors serving a wide range of chemical functions include adenosine as part of their structure (Fig. 8-41). They are unrelated structurally except for the presence of adenosine. In none of these cofactors does the adenosine portion participate directly in the primary function, but removal of adenosine generally results in a drastic reduction of cofactor activities. For example, removal of the adenine nucleotide (3′- phosphoadenosine diphosphate) from acetoacetyl-CoA, the coenzyme A derivative of acetoacetate, reduces its reactivity as a substrate for β -ketoacyl-CoA transferase (an enzyme of lipid metabolism) by a factor of 106. Although this requirement for adenosine has not been investigated in detail, it must involve the binding energy between enzyme and substrate (or cofactor) that is used both in catalysis and in stabilizing the initial enzyme- substrate complex (Chapter 6). In the case of β -ketoacyl-CoA transferase, the nucleotide moiety of coenzyme A seems to be a binding “handle” that helps to pull the substrate (acetoacetyl-CoA) into the active site. Similar roles may be found for the nucleoside portion of other nucleotide cofactors. FIGURE 8-41 Some coenzymes containing adenosine. The adenosine portion is shaded in light red. Coenzyme A (CoA) functions in acyl group transfer reactions; the acyl group (such as the acetyl or acetoacetyl group) is attached to the CoA through a thioester linkage to the β -mercaptoethylamine moiety. NAD+ functions in hydride transfers, and FAD, the active form of vitamin B2 (riboflavin), functions in electron transfers. Another coenzyme incorporating adenosine is 5′-deoxyadenosylcobalamin, the active form of vitamin B12 (see Box 17-2), which participates in intramolecular group transfers between adjacent carbons. Why is adenosine, rather than some other large molecule, used in these structures? The answer here may involve a form of evolutionary economy. Adenosine is certainly not unique in the amount of potential binding energy it can contribute. The importance of adenosine probably lies not so much in some special chemical characteristic as in the evolutionary advantage of using one compound for multiple roles. Once ATP became the universal source of chemical energy, systems developed to synthesize ATP in greater abundance than the other nucleotides; because it is abundant, it becomes the logical choice for incorporation into a wide variety of structures. The economy extends to protein structure. A single protein domain that binds adenosine can be used in different enzymes. Such a domain, called a nucleotide-binding fold, is found in many enzymes that bind ATP and nucleotide cofactors. Some Nucleotides Are Regulatory Molecules Cells respond to their environment by taking cues from hormones or other external chemical signals. The interaction of these extracellular chemical signals (“first messengers”) with receptors on the cell surface o en leads to the production of second messengers inside the cell, which in turn leads to adaptive changes in the cell interior (Chapter 12). O en, the second messenger is a nucleotide (Fig. 8-42). One of the most common is adenosine 3′,5′-cyclic monophosphate (cyclic AMP, or cAMP), formed from ATP in a reaction catalyzed by adenylyl cyclase, an enzyme associated with the inner face of the plasma membrane. Cyclic AMP serves regulatory functions in virtually every cell outside the plant kingdom. Guanosine 3′,5′-cyclic monophosphate (cGMP) also has regulatory functions in many cells.
FIGURE 8-42 Three regulatory nucleotides. Another regulatory nucleotide, ppGpp (Fig. 8-42), is produced in bacteria in response to a slowdown in protein synthesis during amino acid starvation. This nucleotide inhibits the synthesis of the rRNA and tRNA molecules (see Fig. 28-22) needed for protein synthesis, preventing the unnecessary production of nucleic acids. Adenine Nucleotides Also Serve as Signals ATP and ADP also serve as signaling molecules in many unicellular and multicellular organisms, including humans. In mammals, certain neurons release ATP at synapses. The ATP binds P2X receptors on the postsynaptic cell, triggering changes in membrane potential or the release of an intracellular second messenger that initiates diverse physiological processes, including taste, inflammation, and smooth muscle contraction. One important class of ATP receptors that mediate the sensation of pain is an obvious target for drug development. Extracellular ADP is a signaling molecule that acts through P2Y receptors in sensitive cell types. By preventing ADP from binding the P2Y receptors of platelets, the drug clopidogrel (Plavix) inhibits undesirable blood clotting in patients with cardiac disease. Signaling pathways are discussed in more detail in Chapter 12. SUMMARY 8.4 Other Functions of Nucleotides ATP is the central carrier of chemical energy in cells. The presence of an adenosine moiety in a variety of enzyme cofactors may be related to binding-energy requirements. Cyclic AMP, formed from ATP in a reaction catalyzed by adenylyl cyclase, is a common second messenger produced in response to hormones and other chemical signals. ATP and ADP serve as neurotransmitters in a variety of signaling pathways. Chapter Review KEY TERMS Terms in bold are defined in the glossary. deoxyribonucleic acid (DNA) ribonucleic acid (RNA) gene ribosomal RNA (rRNA) messenger RNA (mRNA) transfer RNA (tRNA) nucleotide nucleoside pyrimidine purine deoxyribonucleotides ribonucleotide phosphodiester linkage 5′ end 3′ end oligonucleotide polynucleotide tautomers base pair major groove minor groove B-form DNA A-form DNA Z-form DNA palindrome hairpin cruciform triplex DNA G tetraplex transcription monocistronic mRNA polycistronic mRNA mutation polymerase chain reaction (PCR) DNA polymerases primer Sanger sequencing sequence polymorphisms short tandem repeat (STR) reversible terminator sequencing single-molecule real-time (SMRT) sequencing sequencing depth contig nucleotide-binding fold second messenger adenosine 3′,5′-cyclic monophosphate (cyclic AMP, cAMP) PROBLEMS 1. Nucleotide Structure Which positions in the purine ring of a purine nucleotide in DNA have the potential to form hydrogen bonds but are not involved in Watson-Crick base pairing? 2. Base Sequence of Complementary DNA Strands One strand of a double-helical DNA has the sequence (5′)GCG CAATATTTCTCAAAATATTGCGC(3′). Write the base sequence of the complementary strand. What special type of sequence is contained in this DNA segment? Does the double- stranded DNA have the potential to form any alternative structures? 3. DNA of the Human Body If completely unraveled, all of a human’s DNA would be able to reach a distance of nearly 3.2× 105 km, the distance from Earth to the moon. Given that each base pair in a DNA helix extends a distance of 3.4 Å, calculate the number of base pairs found within the entirety of a human’s DNA. 4. Nucleic Acids A damaged tetranucleotide structure is shown below. (a) Name each of the nucleotides (or the type of damaged site as appropriate), proceeding from top le to lower right. (b) Indicate which end (upper le or lower right) is the 3′ end and which is the 5′ end. (c) Is this tetranucleotide DNA or RNA? 5. Distinction between DNA Structure and RNA Structure Secondary structures called hairpins may form at palindromic sequences in single strands of either RNA or DNA. The fully base-paired portions of hairpins form helices. How do RNA hairpins differ from DNA hairpins? 6. Nucleotide Chemistry The cells of many eukaryotic organisms have highly specialized systems that specifically repair G–T mismatches in DNA. The mismatch is repaired to form a G≡C, not A═T, base pair. This G–T mismatch repair mechanism occurs in addition to a more general system that repairs virtually all mismatches. Suggest why cells might require a specialized system to repair G–T mismatches. 7. Denaturation of Nucleic Acids A duplex DNA oligonucleotide in which one of the strands has the sequence TAATACGACT CACTATAGGG has a melting temperature (tm) of 59 °C. If an RNA duplex oligonucleotide of identical sequence (substituting U for T) is constructed, will its melting temperature be higher or lower? 8. Spontaneous DNA Damage Hydrolysis of the N-glycosyl bond between deoxyribose and a purine in DNA creates an apurinic (AP) site. An AP site is more thermodynamically destabilizing to a DNA molecule than is a mismatched base pair. Examine the structure of an AP site (see Fig. 8-29b) and describe some chemical consequences of base loss. 9. Prediction of Nucleic Acid Structure from Its Sequence A part of a sequenced chromosome has the sequence (on one strand) ATTGCATCCGCGCGTGCGCGCGCGATCCCGT TACTTTCCG. What is the longest part of this sequence that is likely to take up the Z conformation? 10. Nucleic Acid Identity Explain how RNA nucleotides differ from DNA nucleotides. 11. Nucleic Acid Structure Explain why the absorption of UV light by double-stranded DNA increases (the hyperchromic effect) when the DNA is denatured. 12. Solubility of the Components of DNA Draw the structures of deoxyribose, guanine, and phosphate and rate their relative solubilities in water (most soluble to least soluble). How are these solubilities consistent with the three- dimensional structure of double-stranded DNA? 13. Polymerase Chain Reaction An investigator has one strand of a chromosomal DNA whose sequence is shown. She wants to use polymerase chain reaction (PCR) to amplify and isolate the DNA fragment defined by the segment shown in boldface. Her first step is to design two PCR primers, each 20 nucleotides long, that can be used to amplify this DNA segment. The final PCR product generated from the primers should include no sequences outside the segment in boldface. 5′–––AATGCCGTCAGCCGATCTGCCTCGAGTCAATCGAT GCTGGTAACTTGGGGTATAAAGCTTACCCATGGTATCGTAGT TAGATTGATTGTTAGGTTCTTAGGTTTAGGTTTCTGGTATTG GTTTAGGGTCTTTGATGCTATTAATTGTTTGGTTTTGATTTG GTCTTTATATGGTTTATGTTTTAAGCCGGGTTTTGTCTGG GATGGTTCGTCTGATGTGCGCGTAGCGTGCGGCG–––3′ What are the sequences of the investigator’s forward primer and reverse primer? Recall that the forward primer binds to the strand of DNA running in the 3′ to 5′ direction, whereas the reverse primer binds to the opposite strand. 14. DNA Sequencing Reagents Indicate which of the modified cytidine nucleotide triphosphates shown might be used for each procedure: (a) Classical Sanger sequencing (b) Automated Sanger sequencing (c) Next-generation DNA sequencing (Illumina). Linked fluorescent dyes, where present, are highlighted.
15. Genomic Sequencing In large-genome sequencing projects, the initial data usually reveal gaps between contigs where no sequence information has been obtained. To close the gaps, DNA primers complementary to the 5′-ending strand at the end of each contig are especially useful. Explain how researchers could use these primers to close the gaps between contigs. 16. Next-Generation Sequencing In reversible terminator sequencing, how would the sequencing process be affected if the 3′-end-blocking group of each nucleotide were replaced with the 3′-H present in the dideoxynucleotides used in Sanger sequencing? 17. Sanger Sequencing Logic In the Sanger (dideoxy) method for DNA sequencing, researchers add a small amount of a dideoxynucleoside triphosphate, such as ddCTP, to the sequencing reaction along with a larger amount of the corresponding deoxynucleoside, such as dCTP. What result would researchers observe if they omitted dCTP from the sequencing reaction? 18. DNA Sequencing A researcher used the Sanger method to sequence the DNA fragment shown. The red asterisk indicates a fluorescent label. She reacted a sample of the DNA with DNA polymerase and each of the four nucleotide mixtures (in an appropriate buffer) listed. Some of the mixtures included dideoxynucleotides (ddNTPs) in relatively small amounts. 1. dATP, dTTP, dCTP, dGTP, ddTTP 2. dATP, dTTP, dCTP, dGTP, ddGTP 3. dATP, dCTP, dGTP, ddTTP 4. dATP, dTTP, dCTP, dGTP The researcher then separated the resulting DNA by electrophoresis on a polyacrylamide gel and located the fluorescent bands on the gel. The image of the gel shows the band pattern resulting from nucleotide mixture 1. Assuming that all mixtures were run on the same gel, what did the remaining lanes of the gel look like? 19. Snake Venom Phosphodiesterase An exonuclease is an enzyme that sequentially cleaves nucleotides from the end of a polynucleotide strand. Snake venom phosphodiesterase, which hydrolyzes nucleotides from the 3′ end of any oligonucleotide with a free 3′-hydroxyl group, cleaves between the 3′ hydroxyl of the ribose or deoxyribose and the phosphoryl group of the next nucleotide. It acts on single- stranded DNA or RNA and has no base specificity. This enzyme was used in sequence determination experiments before the development of modern nucleic acid sequencing techniques. What are the products of partial digestion by snake venom phosphodiesterase of an oligonucleotide with the sequence (5′)GCGCCAUUGC(3′)—OH? 20. Preserving DNA in Bacterial Endospores Bacterial endospores form when the environment is no longer conducive to active cell metabolism. The soil bacterium Bacillus subtilis, for example, begins the process of sporulation when one or more nutrients are depleted. The end product is a small, metabolically dormant structure that can survive almost indefinitely with no detectable metabolism. Spores have mechanisms to prevent accumulation of potentially lethal mutations in their DNA over periods of dormancy that can exceed 1,000 years. B. subtilis spores are much more resistant than are the organism’s growing cells to heat, UV radiation, and oxidizing agents, all of which promote mutations. a. One factor that prevents potential DNA damage in spores is their greatly decreased water content. How would this affect some types of mutations? b. Endospores have a category of proteins called small acid-soluble proteins (SASPs) that bind to their DNA, preventing formation of cyclobutane-type dimers. What causes cyclobutane dimers, and why do bacterial endospores need mechanisms to prevent their formation? 21. Oligonucleotide Synthesis As shown in the scheme of Figure 8-34, oligonucleotide synthesis involves adding modified bases, one at a time, to a growing chain. The modified bases contain an activated 3′ hydroxyl and have a dimethoxytrityl (DMT) group attached to the 5′ hydroxyl. What is the function of the DMT group on the incoming base? BIOCHEMISTRY ONLINE 22. The Structure of DNA Elucidation of the three- dimensional structure of DNA helped researchers understand how this molecule conveys information that can be faithfully replicated from one generation to the next. To see the secondary structure of double-stranded DNA, go to the Protein Data Bank website (www.rcsb.org). Use the PDB identifiers provided in parts (a) and (b) below to retrieve the structure summary for a double-stranded DNA segment. View the 3D structure using JSmol. The viewer select menu is below the right corner of the image box. Once in JSmol, you will need to use both the display menus on the screen and the scripting controls in the JSmol menu. Access the JSmol menu by clicking on the JSmol logo in the lower right corner of the image screen. Refer to the JSmol help links as needed. a. Access PDB ID 141D, a highly conserved, repeated DNA sequence from the end of the genome of HIV-1 (the virus that causes AIDS). Set the Style to Ball and Stick. Then use the scripting controls to color by element (Color > Atoms > By Scheme > Element (CPK)). Identify the sugar–phosphate backbone for each strand of the DNA duplex. Locate and identify individual bases. Identify the 5′ end of each strand. Locate the major and minor grooves. Is this a right- or le -handed helix? b. Access PDB ID 145D, a DNA with the Z conformation. Set the Style to Ball and Stick. Then use the scripting controls to color by element (Main Menu > Color > Atoms > By Scheme > Element (CPK)). Identify the sugar–phosphate backbone for each strand of the DNA duplex. Is this a right- or le -handed helix? c. To fully appreciate the secondary structure of DNA, view the molecules in stereo. From the scripting control Main Menu select Style > Stereographic > Cross-eyed viewing or Wall-eyed viewing. (If you have stereographic glasses available, select the appropriate option.) You will see two images of the DNA molecule. Sit with your nose approximately 10 inches from the screen and focus on the tip of your nose (cross-eyed) or on the opposite edges of the screen (wall-eyed). In the background you should see three images of the DNA helix. Shi your focus to the middle image, which should appear three-dimensional. (Note that only one of the authors can make this work.) DATA ANALYSIS PROBLEM 23. Chargaff’s Studies of DNA Structure The main findings of Erwin Chargaff and his coworkers (“Chargaff’s rules”) are summarized on page 270. In this problem, you will examine the data Chargaff collected in support of his conclusions. In one paper, Chargaff (1950) described his analytical methods and some early results. Briefly, he treated DNA samples with acid to remove the bases, separated the bases by paper chromatography, and measured the amount of each base with UV spectroscopy. His results are shown in the three tables below. The molar ratio is the ratio of the number of moles of each base in the sample to the number of moles of phosphate in the sample — this gives the fraction of the total number of bases represented by each particular base. The recovery is the sum of all four bases (the sum of the molar ratios); full recovery of all bases in the DNA would give a recovery of 1.0. Molar ratios in ox DNA Thymus Spleen Liver Base Prep. 1 Prep. 2 Prep. 3 Prep. 1 Prep. 2 Prep. 1 Adenine 0.26 0.28 0.30 0.25 0.26 0.26 Guanine 0.21 0.24 0.22 0.20 0.21 0.20 Cytosine 0.16 0.18 0.17 0.15 0.17 Thymine 0.25 0.24 0.25 0.24 0.24 Recovery 0.88 0.94 0.94 0.84 0.88 Molar ratios in human DNA Sperm Thymus Liver Base Prep. 1 Prep. 2 Prep. 1 Normal Carcinoma Adenine 0.29 0.27 0.28 0.27 0.27 Guanine 0.18 0.17 0.19 0.19 0.18 Cytosine 0.18 0.18 0.16 0.15 Thymine 0.31 0.30 0.28 0.27 Recovery 0.96 0.92 0.91 0.87 Molar ratios in DNA of microorganisms Yeast Avian tubercle bacilli Base Prep. 1 Prep. 2 Prep. 1 Adenine 0.24 0.30 0.12 Guanine 0.14 0.18 0.28 Cytosine 0.13 0.15 0.26 Thymine 0.25 0.29 0.11 Recovery 0.76 0.92 0.77 a. Based on these data, Chargaff concluded that “no differences in composition have so far been found in DNA from different tissues of the same species.” However, a skeptic looking at the data might say, “They certainly look different to me!” If you were Chargaff, how would you use the data to change the skeptic’s mind? b. The base composition of DNA from normal and cancerous liver cells (hepatocarcinoma) was not distinguishably different. Would you expect Chargaff’s technique to be capable of detecting a difference between the DNA of normal and cancerous cells? Explain your reasoning. As you might expect, Chargaff’s data were not completely convincing. He went on to improve his techniques, as described in his 1951 paper, in which he reported molar ratios of bases in DNA from a variety of organisms. Source A:G T:C A:T G:C Purine:pyrimidine Ox 1.29 1.43 1.04 1.00 1.1 Human 1.56 1.75 1.00 1.00 1.0 Hen 1.45 1.29 1.06 0.91 0.99 Salmon 1.43 1.43 1.02 1.02 1.02 Wheat 1.22 1.18 1.00 0.97 0.99 Yeast 1.67 1.92 1.03 1.20 1.0 Haemophilus influenzae type c 1.74 1.54 1.07 0.91 1.0 E. coli K-12 1.05 0.95 1.09 0.99 1.0 Avian tubercle bacillus 0.4 0.4 1.09 1.08 1.1 Serratia marcescens 0.7 0.7 0.95 0.86 0.9 Bacillus schatz 0.7 0.6 1.12 0.89 1.0 c. According to Chargaff, “The base composition of DNA generally varies from one species to another.” Provide an argument, based on the data presented so far, that supports this conclusion. d. According to Chargaff’s rules, “In all cellular DNAs, regardless of the species, … A + G = T + C.” Provide an argument, based on the data presented, that supports this conclusion. References Chargaff, E. 1950. Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201– 209. Chargaff, E. 1951. Structure and function of nucleic acids as cell constituents. Fed. Proc. 10:654–659.
Stems are from the chapter Problems section; correct choices are drawn from Abbreviated Solutions to Problems (Appendix B) in the same edition.
1. Nucleotide Structure Which positions in the purine ring of a purine nucleotide in DNA have the potential to form hydrogen bonds but are not involved in Watson-Crick base pairing?
2. Base Sequence of Complementary DNA Strands One strand of a double-helical DNA has the sequence (5′)GCG CAATATTTCTCAAAATATTGCGC(3′). Write the base sequence of the complementary strand. What special type of sequence is contained in this DNA segment? Does the double- stranded DNA have the potential to form any alternative structures?
3. DNA of the Human Body If completely unraveled, all of a human’s DNA would be able to reach a distance of nearly 3.2× 105 km, the distance from Earth to the moon. Given that each base pair in a DNA helix extends a distance of 3.4 Å, calculate the number of base pairs found within the entirety of a human’s DNA.
4. Nucleic Acids A damaged tetranucleotide structure is shown below. (a) Name each of the nucleotides (or the type of damaged site as appropriate), proceeding from top le to lower right. (b) Indicate which end (upper le or lower right) is the 3′ end and which is the 5′ end. (c) Is this tetranucleotide DNA or RNA?
5. Distinction between DNA Structure and RNA Structure Secondary structures called hairpins may form at palindromic sequences in single strands of either RNA or DNA. The fully base-paired portions of hairpins form helices. How do RNA hairpins differ from DNA hairpins?
6. Nucleotide Chemistry The cells of many eukaryotic organisms have highly specialized systems that specifically repair G–T mismatches in DNA. The mismatch is repaired to form a G≡C, not A═T, base pair. This G–T mismatch repair mechanism occurs in addition to a more general system that repairs virtually all mismatches. Suggest why cells might require a specialized system to repair G–T mismatches.
7. Denaturation of Nucleic Acids A duplex DNA oligonucleotide in which one of the strands has the sequence TAATACGACT CACTATAGGG has a melting temperature (tm) of 59 °C. If an RNA duplex oligonucleotide of identical sequence (substituting U for T) is constructed, will its melting temperature be higher or lower?
8. Spontaneous DNA Damage Hydrolysis of the N-glycosyl bond between deoxyribose and a purine in DNA creates an apurinic (AP) site. An AP site is more thermodynamically destabilizing to a DNA molecule than is a mismatched base pair. Examine the structure of an AP site (see Fig. 8-29b) and describe some chemical consequences of base loss.
9. Prediction of Nucleic Acid Structure from Its Sequence A part of a sequenced chromosome has the sequence (on one strand) ATTGCATCCGCGCGTGCGCGCGCGATCCCGT TACTTTCCG. What is the longest part of this sequence that is likely to take up the Z conformation?
10. Nucleic Acid Identity Explain how RNA nucleotides differ from DNA nucleotides.
11. Nucleic Acid Structure Explain why the absorption of UV light by double-stranded DNA increases (the hyperchromic effect) when the DNA is denatured.
12. Solubility of the Components of DNA Draw the structures of deoxyribose, guanine, and phosphate and rate their relative solubilities in water (most soluble to least soluble). How are these solubilities consistent with the three- dimensional structure of double-stranded DNA?
13. Polymerase Chain Reaction An investigator has one strand of a chromosomal DNA whose sequence is shown. She wants to use polymerase chain reaction (PCR) to amplify and isolate the DNA fragment defined by the segment shown in boldface. Her first step is to design two PCR primers, each 20 nucleotides long, that can be used to amplify this DNA segment. The final PCR product generated from the primers should include no sequences outside the segment in boldface. 5′–––AATGCCGTCAGCCGATCTGCCTCGAGTCAATCGAT GCTGGTAACTTGGGGTATAAAGCTTACCCATGGTATCGTAGT TAGATTGATTGTTAGGTTCTTAGGTTTAGGTTTCTGGTATTG GTTTAGGGTCTTTGATGCTATTAATTGTTTGGTTTTGATTTG GTCTTTATATGGTTTATGTTTTAAGCCGGGTTTTGTCTGG GATGGTTCGTCTGATGTGCGCGTAGCGTGCGGCG–––3′ What are the sequences of the investigator’s forward primer and reverse primer? Recall that the forward primer binds to the strand of DNA running in the 3′ to 5′ direction, whereas the reverse primer binds to the opposite strand.
14. DNA Sequencing Reagents Indicate which of the modified cytidine nucleotide triphosphates shown might be used for each procedure: (a) Classical Sanger sequencing (b) Automated Sanger sequencing (c) Next-generation DNA sequencing (Illumina). Linked fluorescent dyes, where present, are highlighted.
15. Genomic Sequencing In large-genome sequencing projects, the initial data usually reveal gaps between contigs where no sequence information has been obtained. To close the gaps, DNA primers complementary to the 5′-ending strand at the end of each contig are especially useful. Explain how researchers could use these primers to close the gaps between contigs.
16. Next-Generation Sequencing In reversible terminator sequencing, how would the sequencing process be affected if the 3′-end-blocking group of each nucleotide were replaced with the 3′-H present in the dideoxynucleotides used in Sanger sequencing?
17. Sanger Sequencing Logic In the Sanger (dideoxy) method for DNA sequencing, researchers add a small amount of a dideoxynucleoside triphosphate, such as ddCTP, to the sequencing reaction along with a larger amount of the corresponding deoxynucleoside, such as dCTP. What result would researchers observe if they omitted dCTP from the sequencing reaction?
18. DNA Sequencing A researcher used the Sanger method to sequence the DNA fragment shown. The red asterisk indicates a fluorescent label. She reacted a sample of the DNA with DNA polymerase and each of the four nucleotide mixtures (in an appropriate buffer) listed. Some of the mixtures included dideoxynucleotides (ddNTPs) in relatively small amounts.
19. dATP, dTTP, dCTP, dGTP, ddTTP
20. dATP, dTTP, dCTP, dGTP, ddGTP
21. dATP, dCTP, dGTP, ddTTP
22. dATP, dTTP, dCTP, dGTP The researcher then separated the resulting DNA by electrophoresis on a polyacrylamide gel and located the fluorescent bands on the gel. The image of the gel shows the band pattern resulting from nucleotide mixture 1. Assuming that all mixtures were run on the same gel, what did the remaining lanes of the gel look like?
23. Snake Venom Phosphodiesterase An exonuclease is an enzyme that sequentially cleaves nucleotides from the end of a polynucleotide strand. Snake venom phosphodiesterase, which hydrolyzes nucleotides from the 3′ end of any oligonucleotide with a free 3′-hydroxyl group, cleaves between the 3′ hydroxyl of the ribose or deoxyribose and the phosphoryl group of the next nucleotide. It acts on single- stranded DNA or RNA and has no base specificity. This enzyme was used in sequence determination experiments before the development of modern nucleic acid sequencing techniques. What are the products of partial digestion by snake venom phosphodiesterase of an oligonucleotide with the sequence (5′)GCGCCAUUGC(3′)—OH?
24. Preserving DNA in Bacterial Endospores Bacterial endospores form when the environment is no longer conducive to active cell metabolism. The soil bacterium Bacillus subtilis, for example, begins the process of sporulation when one or more nutrients are depleted. The end product is a small, metabolically dormant structure that can survive almost indefinitely with no detectable metabolism. Spores have mechanisms to prevent accumulation of potentially lethal mutations in their DNA over periods of dormancy that can exceed 1,000 years. B. subtilis spores are much more resistant than are the organism’s growing cells to heat, UV radiation, and oxidizing agents, all of which promote mutations. a. One factor that prevents potential DNA damage in spores is their greatly decreased water content. How would this affect some types of mutations? b. Endospores have a category of proteins called small acid-soluble proteins (SASPs) that bind to their DNA, preventing formation of cyclobutane-type dimers. What causes cyclobutane dimers, and why do bacterial endospores need mechanisms to prevent their formation?
25. Oligonucleotide Synthesis As shown in the scheme of Figure 8-34, oligonucleotide synthesis involves adding modified bases, one at a time, to a growing chain. The modified bases contain an activated 3′ hydroxyl and have a dimethoxytrityl (DMT) group attached to the 5′ hydroxyl. What is the function of the DMT group on the incoming base? BIOCHEMISTRY ONLINE