28.4 Nucleic Acids and DNA

Learning Objectives

By the end of this section, you will be able to:

  • Identify the different molecules that combine to form nucleotides
  • Identify the two types of nucleic acids and the function of each type
  • Describe how nucleotides are linked together to form nucleic acids
  • Describe the secondary structure of DNA and the importance of complementary base pairing
  • Describe how a new copy of DNA is synthesized
  • Describe how RNA is synthesized from DNA
  • Identify the different types of RNA and the function of each type of RNA
  • Describe the characteristics of the genetic code
  • Describe how a protein is synthesized from mRNA
  • Describe the causes of genetic mutations and how they lead to genetic diseases

The Key to Heredity

The blueprint for the reproduction and the maintenance of each organism is found in the nuclei of its cells, concentrated in elongated, threadlike structures called chromosomes. These complex structures, consisting of DNA and proteins, contain the basic units of heredity, called genes. The number of chromosomes (and genes) varies with each species. Human body cells have 23 pairs of chromosomes having 20,000–40,000 different genes.

Sperm and egg cells contain only a single copy of each chromosome; that is, they contain only one member of each chromosome pair. Thus, in sexual reproduction, the entire complement of chromosomes is achieved only when an egg and sperm combine. A new individual receives half its hereditary material from each parent. Calling the unit of heredity a “gene” merely gives it a name. But what really are genes and how is the information they contain expressed? One definition of a gene is that it is a segment of DNA that constitutes the code for a specific polypeptide. If genes are segments of DNA, we need to learn more about the structure and physiological function of DNA. We begin by looking at the small molecules needed to form DNA and RNA (ribonucleic acid)—the nucleotides.

Spotlight on Everyday Chemistry: The Birth of Genetic Engineering

An image showing a vial of insulin
Figure 28.4a. A vial of insulin. It has been given a trade name, Actrapid, by the manufacturer. (Photo by Mr Hyde, PDM)

 

Following the initial isolation of insulin in 1921, diabetic patients could be treated with insulin obtained from the pancreases of cattle and pigs. Unfortunately, some patients developed an allergic reaction to this insulin because its amino acid sequence was not identical to that of human insulin. In the 1970s, an intense research effort began that eventually led to the production of genetically engineered human insulin—the first genetically engineered product to be approved for medical use. To accomplish this feat, researchers first had to determine how insulin is made in the body and then find a way of causing the same process to occur in nonhuman organisms, such as bacteria or yeast cells.

Nucleotides

The repeating, or monomer, units that are linked together to form nucleic acids are known as nucleotides. The deoxyribonucleic acid (DNA) of a typical mammalian cell contains about 3 × 109 nucleotides. Nucleotides can be further broken down to phosphoric acid (H3PO4), a pentose sugar (a sugar with five carbon atoms), and a nitrogenous base (a base containing nitrogen atoms).

[latex]\mathrm{nucleic\: acids \underset{down\: into}{\xrightarrow{can\: be\: broken}} nucleotides \underset{down\: into}{\xrightarrow{can\: be\: broken}} H_3PO_4 + nitrogen\: base + pentose\: sugar} \nonumber[/latex]

If the pentose sugar is ribose, the nucleotide is more specifically referred to as a ribonucleotide, and the resulting nucleic acid is ribonucleic acid (RNA). If the sugar is 2-deoxyribose, the nucleotide is a deoxyribonucleotide, and the nucleic acid is DNA, as shown in Figure 28.4b.

 

Backbone structures of both beta-ribose (on the left) with 4 OH groups attached and beta-2-deoxyribose (on the right) with only 3 OH groups attached.
Figure 28.4b. Backbone structure of both ribose and deoxyribose (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

The nitrogenous bases found in nucleotides are classified as pyrimidines or purines. Pyrimidines are heterocyclic amines with two nitrogen atoms in a six-member ring and include uracil, thymine, and cytosine. Purines are heterocyclic amines consisting of a pyrimidine ring fused to a five-member ring with two nitrogen atoms. Adenine and guanine are the major purines found in nucleic acids (Figure 28.4c.).

Molecular structures of pyrimidine, uracil, thymine, cytosine, purine, adenine and guanine.
Figure 28.4c. The nitrogenous bases found in DNA and RNA (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

The formation of a bond between C1′ of the pentose sugar and N1 of the pyrimidine base or N9 of the purine base joins the pentose sugar to the nitrogenous base. In the formation of this bond, a molecule of water is removed. Table 28.4a. summarizes the similarities and differences in the composition of nucleotides in DNA and RNA. The numbering convention is that primed numbers designate the atoms of the pentose ring, and unprimed numbers designate the atoms of the purine or pyrimidine ring.

Table 28.4a. Composition of Nucleotides in DNA and RNA
Composition DNA RNA
purine bases adenine and guanine adenine and guanine
pyrimidine bases cytosine and thymine cytosine and uracil
pentose sugar 2-deoxyribose ribose
inorganic acid phosphoric acid (H3PO4) H3PO4

Source: “19.1: Nucleotides” In Basics of GOB Chemistry (Ball et al.), CC BY-NC-SA 4.0.

The names and structures of the major ribonucleotides and one of the deoxyribonucleotides are given in Figure 28.4d.

Top shows three molecular structures of pyrimidine nucleotides: CMP, UMP and dTMP. The bottom shows two molecular structures of purine nucleotides: AMP and GMP.
Figure 28.4d. The Pyrimidine and Purine Nucleotides (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Exercise 28.4a

Identify some of the main functional groups found in the structures of Figure 28.4d.

Check Your Answers:[1]

Source: Exercise 28.4a by Samantha Sullivan Sauer, licensed under CC BY-NC 4.0

Apart from being the monomer units of DNA and RNA, the nucleotides and some of their derivatives have other functions as well. Adenosine diphosphate (ADP) and adenosine triphosphate (ATP), shown in Figure 28.4e., have a role in cell metabolism. Moreover, a number of coenzymes, including flavin adenine dinucleotide (FAD), nicotinamide adenine dinucleotide (NAD+), and coenzyme A, contain adenine nucleotides as structural components.

Two molecular structures. On the left is adenosine diphosphate (ADP) and on the right is adenosine triphosphate (ATP)
Figure 28.4e. Structures of Two Important Adenine-Containing Nucleotides (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Nucleic Acid Structure

Nucleic acids are large polymers formed by linking nucleotides together and are found in every cell. Deoxyribonucleic acid (DNA) is the nucleic acid that stores genetic information. If all the DNA in a typical mammalian cell were stretched out end to end, it would extend more than 2 m. Ribonucleic acid (RNA) is the nucleic acid responsible for using the genetic information encoded in DNA to produce the thousands of proteins found in living organisms.

Primary Structure of Nucleic Acids

Nucleotides are joined together through the phosphate group of one nucleotide connecting in an ester linkage to the OH group on the third carbon atom of the sugar unit of a second nucleotide. This unit joins to a third nucleotide, and the process is repeated to produce a long nucleic acid chain (Figure 28.4f.). The backbone of the chain consists of alternating phosphate and sugar units (2-deoxyribose in DNA and ribose in RNA). The purine and pyrimidine bases branch off this backbone. Each phosphate group has one acidic hydrogen atom that is ionized at physiological pH. This is why these compounds are known as nucleic acids.

The molecular structure of a DNA segment showing molecules linked together. The molecules are in order from the 5' to 3' end starting with guanine followed by thymine, adenine and ending with cytosine.
Figure 28.4f. Structure of a Segment of DNA. A similar segment of RNA would have OH groups on each C2′, and uracil would replace thymine (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Like proteins, nucleic acids have a primary structure that is defined as the sequence of their nucleotides. Unlike proteins, which have 20 different kinds of amino acids, there are only 4 different kinds of nucleotides in nucleic acids. For amino acid sequences in proteins, the convention is to write the amino acids in order starting with the N-terminal amino acid. In writing nucleotide sequences for nucleic acids, the convention is to write the nucleotides (usually using the one-letter abbreviations for the bases, shown in Figure 28.4f.) starting with the nucleotide having a free phosphate group, which is known as the 5′ end, and indicate the nucleotides in order. For DNA, a lowercase d is often written in front of the sequence to indicate that the monomers are deoxyribonucleotides. The final nucleotide has a free OH group on the 3′ carbon atom and is called the 3′ end. The sequence of nucleotides in the DNA segment shown in Figure 28.4f. would be written 5′-dG-dT-dA-dC-3′, which is often further abbreviated to dGTAC or just GTAC.

Secondary Structure of DNA

The three-dimensional structure of DNA was the subject of an intensive research effort in the late 1940s to early 1950s. Initial work revealed that the polymer had a regular repeating structure. In 1950, Erwin Chargaff of Columbia University showed that the molar amount of adenine (A) in DNA was always equal to that of thymine (T). Similarly, he showed that the molar amount of guanine (G) was the same as that of cytosine (C). Chargaff drew no conclusions from his work, but others soon did.

At Cambridge University in 1953, James D. Watson and Francis Crick announced that they had a model for the secondary structure of DNA. Using the information from Chargaff’s experiments (as well as other experiments) and data from the X ray studies of Rosalind Franklin (which involved sophisticated chemistry, physics, and mathematics), Watson and Crick worked with models that were not unlike a child’s construction set and finally concluded that DNA is composed of two nucleic acid chains running antiparallel to one another—that is, side-by-side with the 5′ end of one chain next to the 3′ end of the other. Moreover, as their model showed, the two chains are twisted to form a double helix—a structure that can be compared to a spiral staircase, with the phosphate and sugar groups (the backbone of the nucleic acid polymer) representing the outside edges of the staircase. The purine and pyrimidine bases face the inside of the helix, with guanine always opposite cytosine and adenine always opposite thymine. These specific base pairs, referred to as complementary bases, are the steps, or treads, in our staircase analogy (Figure 28.4g.).

Two structures of DNA. On the left: (a) a computer-generated model of the DNA double helix. On the right (b) shows the double helix with its complementary bases
Figure 28.4g. DNA Double Helix. (a) This represents a computer-generated model of the DNA double helix. (b) This represents a schematic representation of the double helix, showing the complementary bases. (Credit: Introduction to Chemistry: GOB (v. 1.0), edited by (Ball et al.), CC BY-NC-SA 4.0)

The structure proposed by Watson and Crick provided clues to the mechanisms by which cells are able to divide into two identical, functioning daughter cells; how genetic data are passed to new generations; and even how proteins are built to required specifications. All these abilities depend on the pairing of complementary bases. Figure 28.4h. shows the two sets of base pairs and illustrates two things. First, a pyrimidine is paired with a purine in each case, so that the long dimensions of both pairs are identical (1.08 nm).

 

Two structures showing complementary base pairing. On the left (a) thymine and adenine; On the right: (b) cytosine and guanine
Figure 28.4h. Complementary Base Pairing. Complementary bases engage in hydrogen bonding with one another: (a) thymine and adenine; (b) cytosine and guanine (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

If two pyrimidines were paired or two purines were paired, the two pyrimidines would take up less space than a purine and a pyrimidine, and the two purines would take up more space, as illustrated in Figure 28.4i. If these pairings were ever to occur, the structure of DNA would be like a staircase made with stairs of different widths. For the two strands of the double helix to fit neatly, a pyrimidine must always be paired with a purine. The second thing you should notice in Figure 28.4i. is that the correct pairing enables formation of three instances of hydrogen bonding between guanine and cytosine and two between adenine and thymine. The additive contribution of this hydrogen bonding imparts great stability to the DNA double helix.

An image showing the difference in widths of possible base pairs. The top image shows the width between two pyrimidines, the middle images shows the width between a purine and a pyrimidine and the bottom image shows the width between two purines.
Figure 28.4i. Difference in Widths of Possible Base Pairs (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Infographic 28.4a. summarizes the chemical structure of DNA including the backbone, bases, hydrogen bonding, and formation of proteins from DNA and RNA.

Infographic 28.4a. Read more about “What makes up the Chemical Structure of DNA?” by Andy Brunning / Compound Interest, CC BY-NC-ND, or access a text-based summary of infographic 28.4a [New tab].

Spotlight on Everyday Chemistry: Scientist Rosalind Franklin

Rosalind Franklin was instrumental in determining the structure of DNA.  Read more about her and this discovery.

 

Infographic 28.4b. Read more about “Today in Chemistry History – Rosalind Franklin and the structure of DNA” by Andy Brunning / Compound Interest, CC BY-NC-ND, or access a text-based summary of infographic 28.4b [New tab].

Expressing Genetic Information

We previously stated that deoxyribonucleic acid (DNA) stores genetic information, while ribonucleic acid (RNA) is responsible for transmitting or expressing genetic information by directing the synthesis of thousands of proteins found in living organisms. But how do the nucleic acids perform these functions? Three processes are required: (1) replication, in which new copies of DNA are made; (2) transcription, in which a segment of DNA is used to produce RNA; and (3) translation, in which the information in RNA is translated into a protein sequence.

Replication

New cells are continuously forming in the body through the process of cell division. For this to happen, the DNA in a dividing cell must be copied in a process known as replication. The complementary base pairing of the double helix provides a ready model for how genetic replication occurs. If the two chains of the double helix are pulled apart, disrupting the hydrogen bonding between base pairs, each chain can act as a template, or pattern, for the synthesis of a new complementary DNA chain.

The nucleus contains all the necessary enzymes, proteins, and nucleotides required for this synthesis. A short segment of DNA is “unzipped,” so that the two strands in the segment are separated to serve as templates for new DNA. DNA polymerase, an enzyme, recognizes each base in a template strand and matches it to the complementary base in a free nucleotide. The enzyme then catalyzes the formation of an ester bond between the 5′ phosphate group of the nucleotide and the 3′ OH end of the new, growing DNA chain. In this way, each strand of the original DNA molecule is used to produce a duplicate of its former partner (Figure 28.4j.). Whatever information was encoded in the original DNA double helix is now contained in each replicate helix. When the cell divides, each daughter cell gets one of these replicates and thus all of the information that was originally possessed by the parent cell.

A detailed diagram of DNA replication showing an unzippered DNA strand with the various nucleotides attached along with the DNA polymerase attaching and making a copy of the strand.
Figure 28.4j. A Schematic Diagram of DNA Replication. DNA replication occurs by the sequential unzipping of segments of the double helix. Each new nucleotide is brought into position by DNA polymerase and is added to the growing strand by the formation of a phosphate ester bond. Thus, two double helixes form from one, and each consists of one old strand and one new strand, an outcome called semiconservative replications. (This representation is simplified; many more proteins are involved in replication (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Example 28.4a

A segment of one strand from a DNA molecule has the sequence 5′‑TCCATGAGTTGA‑3′. What is the sequence of nucleotides in the opposite, or complementary, DNA chain?

Solution

Knowing that the two strands are antiparallel and that T base pairs with A, while C base pairs with G, the sequence of the complementary strand will be 3′‑AGGTACTCAACT‑5′ (can also be written as TCAACTCATGGA).

What do we mean when we say information is encoded in the DNA molecule? An organism’s DNA can be compared to a book containing directions for assembling a model airplane or for knitting a sweater. Letters of the alphabet are arranged into words, and these words direct the individual to perform certain operations with specific materials. If all the directions are followed correctly, a model airplane or sweater is produced.

In DNA, the particular sequences of nucleotides along the chains encode the directions for building an organism. Just as saw means one thing in English and was means another, the sequence of bases CGT means one thing, and TGC means something different. Although there are only four letters—the four nucleotides—in the genetic code of DNA, their sequencing along the DNA strands can vary so widely that information storage is essentially unlimited.

Transcription

For the hereditary information in DNA to be useful, it must be “expressed,” that is, used to direct the growth and functioning of an organism. The first step in the processes that constitute DNA expression is the synthesis of RNA, by a template mechanism that is in many ways analogous to DNA replication. Because the RNA that is synthesized is a complimentary copy of information contained in DNA, RNA synthesis is referred to as transcription. There are three key differences between replication and transcription:

  1. RNA molecules are much shorter than DNA molecules; only a portion of one DNA strand is copied or transcribed to make an RNA molecule.
  2. RNA is built from ribonucleotides rather than deoxyribonucleotides.
  3. The newly synthesized RNA strand does not remain associated with the DNA sequence it was transcribed from.

The DNA sequence that is transcribed to make RNA is called the template strand, while the complementary sequence on the other DNA strand is called the coding or informational strand. To initiate RNA synthesis, the two DNA strands unwind at specific sites along the DNA molecule. Ribonucleotides are attracted to the uncoiling region of the DNA molecule, beginning at the 3′ end of the template strand, according to the rules of base pairing. Thymine in DNA calls for adenine in RNA, cytosine specifies guanine, guanine calls for cytosine, and adenine requires uracil. RNA polymerase—an enzyme—binds the complementary ribonucleotide and catalyzes the formation of the ester linkage between ribonucleotides, a reaction very similar to that catalyzed by DNA polymerase (figure 28.4k). Synthesis of the RNA strand takes place in the 5′ to 3′ direction, antiparallel to the template strand. Only a short segment of the RNA molecule is hydrogen-bonded to the template strand at any time during transcription. When transcription is completed, the RNA is released, and the DNA helix reforms. The nucleotide sequence of the RNA strand formed during transcription is identical to that of the corresponding coding strand of the DNA, except that U replaces T.

A detailed diagram of RNA transcription from a DNA template showing the template of the DNA strand, the various nucleotides, RNA polymerase in action and the RNA nucleotides formed.
Figure 28.4k. A Schematic Diagram of RNA Transcription from a DNA Template. The representation of RNA polymerase is proportionately much smaller than the actual molecule, which encompasses about 50 nucleotides at a time (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Example 28.4b

A portion of the template strand of a gene has the sequence 5′‑TCCATGAGTTGA‑3′. What is the sequence of nucleotides in the RNA that is formed from this template?

Solution

Four things must be remembered in answering this question: (1) the DNA strand and the RNA strand being synthesized are antiparallel; (2) RNA is synthesized in a 5′ to 3′ direction, so transcription begins at the 3′ end of the template strand; (3) ribonucleotides are used in place of deoxyribonucleotides; and (4) thymine (T) base pairs with adenine (A), A base pairs with uracil (U; in RNA), and cytosine (C) base pairs with guanine (G). The sequence is determined to be 3′‑AGGUACUCAACU‑5′ (can also be written as 5′‑UCAACUCAUGGA‑3′).

Three types of RNA are formed during transcription: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). These three types of RNA differ in function, size, and percentage of the total cell RNA (Table 28.4b.). mRNA makes up only a small percent of the total amount of RNA within the cell, primarily because each molecule of mRNA exists for a relatively short time; it is continuously being degraded and resynthesized. The molecular dimensions of the mRNA molecule vary according to the amount of genetic information a given molecule contains. After transcription, which takes place in the nucleus, the mRNA passes into the cytoplasm, carrying the genetic message from DNA to the ribosomes, the sites of protein synthesis.

Table 28.4b. Properties of Cellular RNA in Escherichia coli
Type Function Approximate Number of Nucleotides Percentage of Total Cell RNA
mRNA codes for proteins 100–6,000 ~3
rRNA component of ribosomes 120–2900 83
tRNA adapter molecule that brings the amino acid to the ribosome 75–90 14

Source: “19.3: Replication and Expression of Genetic Information” In Basics of GOB Chemistry (Ball et al.), CC BY-NC-SA 4.0.

Ribosomes are cellular substructures where proteins are synthesized. They contain about 65% rRNA and 35% protein, held together by numerous noncovalent interactions, such as hydrogen bonding, in an overall structure consisting of two globular particles of unequal size.

Molecules of tRNA, which bring amino acids (one at a time) to the ribosomes for the construction of proteins, differ from one another in the kinds of amino acid each is specifically designed to carry (Figure 28.4l.). A set of three nucleotides, known as a codon, on the mRNA determines which kind of tRNA will add its amino acid to the growing chain. Each of the 20 amino acids found in proteins has at least one corresponding kind of tRNA, and most amino acids have more than one.

Transfer RNA shown in three forms: flat highlighting each atom and nucleic acid, in 3D showing anticodon, and in space filling model with colour coded anticodon ends.
Figure 28.4l. Transfer RNA. (a) In the two-dimensional structure of a yeast tRNA molecule for phenylalanine, the amino acid binds to the acceptor stem located at the 3′ end of the tRNA primary sequence. (The nucleotides that are not specifically identified here are slightly altered analogs of the four common ribonucleotides A, U, C, and G.) (b) In the three-dimensional structure of yeast phenylalanine tRNA, note that the anticodon loop is at the bottom and the acceptor stem is at the top right. (c) This shows a space-filling model of the tRNA (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

The two-dimensional structure of a tRNA molecule has three distinctive loops, reminiscent of a cloverleaf (Figure 28.4l.). On one loop is a sequence of three nucleotides that varies for each kind of tRNA. This triplet, called the anticodon, is complementary to and pairs with the codon on the mRNA. At the opposite end of the molecule is the acceptor stem, where the amino acid is attached.

Spotlight on Everyday Chemistry: Genome Editing

The 2020 Nobel Prize in Chemistry was awarded to scientists who developed a method of genome editing.

 

Infographic 28.4c. Read more about “The 2020 Nobel Prize in Chemistry: Using genetic scissors to edit the genome” by Andy Brunning / Compound Interest, CC BY-NC-ND, or access a text-based summary of infographic 28.4c [New tab].

Protein Synthesis

One of the definitions of a gene is as follows: a segment of deoxyribonucleic acid (DNA) carrying the code for a specific polypeptide. Each molecule of messenger RNA (mRNA) is a transcribed copy of a gene that is used by a cell for synthesizing a polypeptide chain. If a protein contains two or more different polypeptide chains, each chain is coded by a different gene. We turn now to the question of how the sequence of nucleotides in a molecule of ribonucleic acid (RNA) is translated into an amino acid sequence.

How can a molecule containing just 4 different nucleotides specify the sequence of the 20 amino acids that occur in proteins? If each nucleotide coded for 1 amino acid, then obviously the nucleic acids could code for only 4 amino acids. What if amino acids were coded for by groups of 2 nucleotides? There are 42, or 16, different combinations of 2 nucleotides (AA, AU, AC, AG, UU, and so on). Such a code is more extensive but still not adequate to code for 20 amino acids. However, if the nucleotides are arranged in groups of 3, the number of different possible combinations is 43, or 64. Here we have a code that is extensive enough to direct the synthesis of the primary structure of a protein molecule.

Watch Translation (mRNA to protein) | Biomolecules | MCAT | Khan Academy on YouTube (14 mins)

Video source: Khan Academy. (2016, June 7). Translation (mRNA to protein) | Biomolecules | MCAT | Khan Academy. [Video]. YouTube.

The genetic code can therefore be described as the identification of each group of three nucleotides and its particular amino acid. The sequence of these triplet groups in the mRNA dictates the sequence of the amino acids in the protein. Each individual three-nucleotide coding unit, as we have seen, is called a codon. Protein synthesis is accomplished by orderly interactions between mRNA and the other ribonucleic acids (transfer RNA [tRNA] and ribosomal RNA [rRNA]), the ribosome, and more than 100 enzymes. The mRNA formed in the nucleus during transcription is transported across the nuclear membrane into the cytoplasm to the ribosomes—carrying with it the genetic instructions. The process in which the information encoded in the mRNA is used to direct the sequencing of amino acids and thus ultimately to synthesize a protein is referred to as translation.

Before an amino acid can be incorporated into a polypeptide chain, it must be attached to its unique tRNA. This crucial process requires an enzyme known as aminoacyl-tRNA synthetase (Figure 28.4m.). There is a specific aminoacyl-tRNA synthetase for each amino acid. This high degree of specificity is vital to the incorporation of the correct amino acid into a protein.

Reaction showing the binding of an amino acid (in red) to its tRNA (in black) with the catalyst aminoacyl-tRNA and synthetase to produce tRNA with bound amino acid.
Figure 28.4m. Binding of an Amino Acid to Its tRNA (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).
After the amino acid molecule has been bound to its tRNA carrier, protein synthesis can take place. Figures 28.4n-q depicts a schematic stepwise representation of this all-important process.
The elongation steps in protein synthesis. In this image the growing polypeptide chain is attached to the activated tRNA molecule
Figure 28.4n. The Elongation Steps in Protein Synthesis – Protein synthesis is already in progress at the ribosome. The growing polypeptide chain is attached to the tRNA that brought in the previous amino acid (in this illustration, Cys) (Credit: Introduction to Chemistry: General, Organic, and Biological (v. 1.0), edited by (Ball et al.) CC BY-NC-SA 4.0).
The elongation steps in protein synthesis. In this image, the amino acid Phe is incorporated into the polypeptide chain through a peptide linkage between the carboxyl group of Cys and the amino acid group of the Phe
Figure 28.4o. The Elongation Steps in Protein Synthesis – An activated tRNA, which has the anticodon AAA, binds to the ribosome next to the previous bound tRNA and interacts with the mRNA molecule though base pairing of the codon and anticodon. The amino acid Phe is being incorporated into the polypeptide chain by the formation of a peptide linkage between the carboxyl group of Cys and the amino acid group of the Phe. This reaction is catalyzed by the enzyme peptidyl transferase, a component of the ribosome. (Credit: Introduction to Chemistry: General, Organic, and Biological (v. 1.0), edited by (Ball et al.) CC BY-NC-SA 4.0)
The elongation steps in protein synthesis. In this image, the Cys-Phe linkage is now complete, and the growing polypeptide chain remains attached to the activated tRNA molecule for Phe.
Figure 28.4p. The Elongation Steps in Protein Synthesis – The Cys-Phe linkage is now complete, and the growing polypeptide chain remains attached to the tRNA for Phe. (Credit: Introduction to Chemistry: General, Organic, and Biological (v. 1.0), edited by (Ball et al.) CC BY-NC-SA 4.0)
The elongation steps in protein synthesis. In this image, the ribosome moves to the right along the mRNA strand. This shift brings the next codon, GUC, into its correct position on the surface of the ribosome.
Figure 28.4q. The Elongation Steps in Protein Synthesis – The ribosome moves to the right along the mRNA strand. This shift brings the next codon, GUC, into its correct position on the surface of the ribosome. Note that an activated tRNA molecule, containing the next amino acid to be attached to the chain is moving to the ribosome. Steps (b)-(d) will be repeated until the ribosome reaches a stop codon. (Credit: Introduction to Chemistry: General, Organic, and Biological (v. 1.0), edited by (Ball et al.) CC BY-NC-SA 4.0)

Early experimenters were faced with the task of determining which of the 64 possible codons stood for each of the 20 amino acids. The cracking of the genetic code was the joint accomplishment of several well-known geneticists—notably Har Khorana, Marshall Nirenberg, Philip Leder, and Severo Ochoa—from 1961 to 1964. The genetic dictionary they compiled, summarized in figure 28.4r, shows that 61 codons code for amino acids, and 3 codons serve as signals for the termination of polypeptide synthesis (much like the period at the end of a sentence). Notice that only methionine (AUG) and tryptophan (UGG) have single codons. All other amino acids have two or more codons.

The genetic code showing the first base (on the left), second base (on the top) and third base (on the right) nucleotides coming together to create a unique DNA amino acid sequence.
Figure 28.4r. The Genetic Code (Credit: Introduction to Chemistry: General, Organic, and Biological (v. 1.0), edited by (Ball et al.) CC BY-NC-SA 4.0)

Example 28.4c

A portion of an mRNA molecule has the sequence 5′‑AUGCCACGAGUUGAC‑3′. What amino acid sequence does this code for?

Solution

Use Figure 28.4r to determine what amino acid each set of three nucleotides (codon) codes for. Remember that the sequence is read starting from the 5′ end and that a protein is synthesized starting with the N-terminal amino acid. The sequence 5′‑AUGCCACGAGUUGAC‑3′ codes for met-pro-arg-val-asp.

  1. The code is virtually universal; animal, plant, and bacterial cells use the same codons to specify each amino acid (with a few exceptions).
  2. The code is “degenerate”; in all but two cases (methionine and tryptophan), more than one triplet codes for a given amino acid.
  3. The first two bases of each codon are most significant; the third base often varies. This suggests that a change in the third base by a mutation may still permit the correct incorporation of a given amino acid into a protein. The third base is sometimes called the “wobble” base.
  4. The code is continuous and nonoverlapping; there are no nucleotides between codons, and adjacent codons do not overlap.
  5. The three termination codons are read by special proteins called release factors, which signal the end of the translation process.
  6. The codon AUG codes for methionine and is also the initiation codon. Thus methionine is the first amino acid in each newly synthesized polypeptide. This first amino acid is usually removed enzymatically before the polypeptide chain is completed; the vast majority of polypeptides do not begin with methionine.

Mutations and Genetic Diseases

We have seen that the sequence of nucleotides in a cell’s deoxyribonucleic acid (DNA) is what ultimately determines the sequence of amino acids in proteins made by the cell and thus is critical for the proper functioning of the cell. On rare occasions, however, the nucleotide sequence in DNA may be modified either spontaneously (by errors during replication, occurring approximately once for every 10 billion nucleotides) or from exposure to heat, radiation, or certain chemicals. Any chemical or physical change that alters the nucleotide sequence in DNA is called a mutation. When a mutation occurs in an egg or sperm cell that then produces a living organism, it will be inherited by all the offspring of that organism.

Common types of mutations include substitution (a different nucleotide is substituted), insertion (the addition of a new nucleotide), and deletion (the loss of a nucleotide). These changes within DNA are called point mutations because only one nucleotide is substituted, added, or deleted (Figure 28.4s.). Because an insertion or deletion results in a frame-shift that changes the reading of subsequent codons and, therefore, alters the entire amino acid sequence that follows the mutation, insertions and deletions are usually more harmful than a substitution in which only a single amino acid is altered.

This image shows three types of point mutations (substitution, insertion and deletion) while comparing it to the normal sequence.
Figure 28.4s. Three Types of Point Mutations (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

The chemical or physical agents that cause mutations are called mutagens. Examples of physical mutagens are ultraviolet (UV) and gamma radiation. Radiation exerts its mutagenic effect either directly or by creating free radicals that in turn have mutagenic effects. Radiation and free radicals can lead to the formation of bonds between nitrogenous bases in DNA. For example, exposure to UV light can result in the formation of a covalent bond between two adjacent thymines on a DNA strand, producing a thymine dimer (Figure 28.4t.). If not repaired, the dimer prevents the formation of the double helix at the point where it occurs. The genetic disease xeroderma pigmentosum is caused by a lack of the enzyme that cuts out the thymine dimers in damaged DNA. Individuals affected by this condition are abnormally sensitive to light and are more prone to skin cancer than normal individuals.

This image is an example of radiation damage to DNA formed by a thymine dimer. On the left: (a) the structural formation of the thymine dimer. On the right: (b) the region of the thymine dimer damage within the DNA strand shows a disconnected portion of the DNA strand stopping any DNA replication
Figure 28.4t. An Example of Radiation Damage to DNA. (a) The thymine dimer is formed by the action of UV light. (b) When a defect in the double strand is produced by the thymine dimer, this defect temporarily stops DNA replication, but the dimer can be removed, and the region can be repaired by an enzyme repair system (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Sometimes gene mutations are beneficial, but most of them are detrimental. For example, if a point mutation occurs at a crucial position in a DNA sequence, the affected protein will lack biological activity, perhaps resulting in the death of a cell. In such cases the altered DNA sequence is lost and will not be copied into daughter cells. Nonlethal mutations in an egg or sperm cell may lead to metabolic abnormalities or hereditary diseases. Such diseases are called inborn errors of metabolism or genetic diseases. A partial listing of genetic diseases is presented in Table 28.4c., and two specific diseases are discussed in the following sections. In most cases, the defective gene results in a failure to synthesize a particular enzyme.

Table 28.4c. Some Representative Genetic Diseases in Humans and the Protein or Enzyme Responsible
Disease Responsible Protein or Enzyme
alkaptonuria homogentisic acid oxidase
galactosemia galactose 1-phosphate uridyl transferase, galactokinase, or UDP galactose epimerase
Gaucher disease glucocerebrosidase
gout and Lesch-Nyhan syndrome hypoxanthine-guanine phosphoribosyl transferase
hemophilia antihemophilic factor (factor VIII) or Christmas factor (factor IX)
homocystinuria cystathionine synthetase
maple syrup urine disease branched chain α-keto acid dehydrogenase complex
McArdle syndrome muscle phosphorylase
Niemann-Pick disease sphingomyelinase
phenylketonuria (PKU) phenylalanine hydroxylase
sickle cell anemia hemoglobin
Tay-Sachs disease hexosaminidase A
tyrosinemia fumarylacetoacetate hydrolase or tyrosine aminotransferase
von Gierke disease glucose 6-phosphatase
Wilson disease Wilson disease protein

Source: “19.5: Mutations and Genetic Diseases” In Basics of GOB Chemistry (Ball et al.), CC BY-NC-SA 4.0.

Phenylketonuria (PKU), as seen in the table above, results from the absence of the enzyme phenylalanine hydroxylase. Without this enzyme, a person cannot convert phenylalanine to tyrosine, which is the precursor of the neurotransmitters dopamine and norepinephrine as well as the skin pigment melanin (Figure 28.4u.).

A reaction showing phenylalanine converted to tyrosine by the enzyme phenylalanine hydroxylase
Figure 28.4u. Normally, phenylalanine is converted to tyrosine by the enzyme phenylalanine hydroxylase. People with PKU lack this enzyme, and thus cannot process phenylalanine, which is necessary for the production of neurotransmitters such as dopamine and norepenephrine and the pigment melanin (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

When this reaction cannot occur, phenylalanine accumulates and is then converted to higher than normal quantities of phenylpyruvate. The disease acquired its name from the high levels of phenylpyruvate (a phenyl ketone) in urine. Excessive amounts of phenylpyruvate impair normal brain development, which causes severe mental retardation (Figure 28.4v.).

A reaction showing phenylalanine along with the catalyst transaminase creating phenylpyruvate
Figure 28.4v. A buildup of phenylalanine in those with PKU results in the accumulation of phenylpyruvate, which inhibits normal brain development (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

PKU may be diagnosed by assaying a sample of blood or urine for phenylalanine or one of its metabolites. Medical authorities recommend testing every newborn’s blood for phenylalanine within 24 h to 3 weeks after birth. If the condition is detected, mental retardation can be prevented by immediately placing the infant on a diet containing little or no phenylalanine. Because phenylalanine is plentiful in naturally produced proteins, the low-phenylalanine diet depends on a synthetic protein substitute plus very small measured amounts of naturally produced foods. Before dietary treatment was introduced in the early 1960s, severe mental retardation was a common outcome for children with PKU. Prior to the 1960s, 85% of patients with PKU had an intelligence quotient (IQ) less than 40, and 37% had IQ scores below 10. Since the introduction of dietary treatments, however, over 95% of children with PKU have developed normal or near-normal intelligence. The incidence of PKU in newborns is about 1 in 12,000 in North America. Every state in the United States has mandated that screening for PKU be provided to all newborns.

Several genetic diseases are collectively categorized as lipid-storage diseases. Lipids are constantly being synthesized and broken down in the body, so if the enzymes that catalyze lipid degradation are missing, the lipids tend to accumulate and cause a variety of medical problems. When a genetic mutation occurs in the gene for the enzyme hexosaminidase A, for example, gangliosides cannot be degraded but accumulate in brain tissue, causing the ganglion cells of the brain to become greatly enlarged and nonfunctional. This genetic disease, known as Tay-Sachs disease, leads to a regression in development, dementia, paralysis, and blindness, with death usually occurring before the age of three. There is currently no treatment, but Tay-Sachs disease can be diagnosed in a fetus by assaying the amniotic fluid (amniocentesis) for hexosaminidase A. A blood test can identify Tay-Sachs carriers—people who inherit a defective gene from only one rather than both parents—because they produce only half the normal amount of hexosaminidase A, although they do not exhibit symptoms of the disease.

Recombinant DNA Technology

More than 3,000 human diseases have been shown to have a genetic component, caused or in some way modulated by the person’s genetic composition. Moreover, in the last decade or so, researchers have succeeded in identifying many of the genes and even mutations that are responsible for specific genetic diseases. Now scientists have found ways of identifying and isolating genes that have specific biological functions and placing those genes in another organism, such as a bacterium, which can be easily grown in culture. With these techniques, known as recombinant DNA technology, the ability to cure many serious genetic diseases appears to be within our grasp.

Isolating the specific gene or genes that cause a particular genetic disease is a monumental task. One reason for the difficulty is the enormous amount of a cell’s DNA, only a minute portion of which contains the gene sequence. Thus, the first task is to obtain smaller pieces of DNA that can be more easily handled. Fortunately, researchers are able to use restriction enzymes (also known as restriction endonucleases), discovered in 1970, which are enzymes that cut DNA at specific, known nucleotide sequences, yielding DNA fragments of shorter length. For example, the restriction enzyme EcoRI recognizes the nucleotide sequence shown here and cuts both DNA strands as indicated in Figure 28.4w.

EcoRI is shown to cut DNA at a specific genetic sequence
Figure 28.4w. EcoRI is a restriction endonuclease that always cuts DNA at a specific genetic sequence, as shown here (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Once a DNA strand has been fragmented, it must be cloned; that is, multiple identical copies of each DNA fragment are produced to make sure there are sufficient amounts of each to detect and manipulate in the laboratory. Cloning is accomplished by inserting the individual DNA fragments into phages (bacterial viruses) that can enter bacterial cells and be replicated. When a bacterial cell infected by the modified phage is placed in an appropriate culture medium, it forms a colony of cells, all containing copies of the original DNA fragment. This technique is used to produce many bacterial colonies, each containing a different DNA fragment. The result is a DNA library, a collection of bacterial colonies that together contain the entire genome of a particular organism.

The next task is to screen the DNA library to determine which bacterial colony (or colonies) has incorporated the DNA fragment containing the desired gene. A short piece of DNA, known as a hybridization probe, which has a nucleotide sequence complementary to a known sequence in the gene, is synthesized, and a radioactive phosphate group is added to it as a “tag.” You might be wondering how researchers are able to prepare such a probe if the gene has not yet been isolated. One way is to use a segment of the desired gene isolated from another organism. An alternative method depends on knowing all or part of the amino acid sequence of the protein produced by the gene of interest: the amino acid sequence is used to produce an approximate genetic code for the gene, and this nucleotide sequence is then produced synthetically. (The amino acid sequence used is carefully chosen to include, if possible, many amino acids such as methionine and tryptophan, which have only a single codon each.)

After a probe identifies a colony containing the desired gene, the DNA fragment is clipped out, again using restriction enzymes, and spliced into another replicating entity, usually a plasmid. Plasmids are tiny mini-chromosomes found in many bacteria, such as Escherichia coli (E. coli). A recombined plasmid would then be inserted into the host organism (usually the bacterium E. coli), where it would go to work to produce the desired protein (Figure 28.4x.).

A detailed flow chart diagram outlining the process of cloning: 1. Place cells in detergent to break them open; 2. Centrifuge to separate the plasmids from the chromosomal DNA; 3. Obtained desired gene sequence using specific restriction enzyme; 4. Cut the plasmid with the same restriction enzyme used to cut the DNA containing the desired gene sequence; 5. Combine the DN fragment with the plasmid and use DNA ligase to seal the DNA segment into place with the plasmid; 6. Place recombinant plasmid in solution of calcium chloride containing E. coli. Upon heating, the bacterium cell membrane becomes permeable, allowing the plasmid to enter; 7. As the bacterium reproduces by dividing, the plasmids are replicated. The gene sequences are transcribed into mRNA, and the mRNA is used to produce specific proteins.
Figure 28.4x. Flow chart diagram outlining the process of cloning (credit: Intro Chem: GOB (v. 1.0), CC BY-NC-SA 3.0).

Proponents of recombinant DNA research are excited about its great potential benefits. An example is the production of human growth hormone, which is used to treat children who fail to grow properly. Formerly, human growth hormone was available only in tiny amounts obtained from cadavers. Now it is readily available through recombinant DNA technology. Another gene that has been cloned is the gene for epidermal growth factor, which stimulates the growth of skin cells and can be used to speed the healing of burns and other skin wounds. Recombinant techniques are also a powerful research tool, providing enormous aid to scientists as they map and sequence genes and determine the functions of different segments of an organism’s DNA.

In addition to advancements in the ongoing treatment of genetic diseases, recombinant DNA technology may actually lead to cures. When appropriate genes are successfully inserted into E. coli, the bacteria can become miniature pharmaceutical factories, producing great quantities of insulin for people with diabetes, clotting factor for people with hemophilia, missing enzymes, hormones, vitamins, antibodies, vaccines, and so on. Recent accomplishments include the production in E. coli of recombinant DNA molecules containing synthetic genes for tissue plasminogen activator, a clot-dissolving enzyme that can rescue heart attack victims, as well as the production of vaccines against hepatitis B (humans) and hoof-and-mouth disease (cattle).

Scientists have used other bacteria besides E. coli in gene-splicing experiments and also yeast and fungi. Plant molecular biologists use a bacterial plasmid to introduce genes for several foreign proteins (including animal proteins) into plants. The bacterium is Agrobacterium tumefaciens, which can cause tumors in many plants, but which can be treated so that its tumor-causing ability is eliminated. One practical application of its plasmids would be to enhance a plant’s nutritional value by transferring into it the gene necessary for the synthesis of an amino acid in which the plant is normally deficient (for example, transferring the gene for methionine synthesis into pinto beans, which normally do not synthesize high levels of methionine).

Restriction enzymes have been isolated from a number of bacteria and are named after the bacterium of origin. EcoRI is a restriction enzyme obtained from the R strain of E. coli. The roman numeral I indicates that it was the first restriction enzyme obtained from this strain of bacteria.

Attribution & References

Except where otherwise noted, portions of this page were written by Gregory A. Anderson, while others were adapted by Gregory A. Anderson and Samantha Sullivan Sauer from “19: Nucleic Acids“, “19.1: Nucleotides“, “19.2: Nucleic Acid Structure“,”19.3: Replication and Expression of Genetic Information“, “19.4: Protein Synthesis and the Genetic Code“, and “19.5: Mutations and Genetic Diseases” In Basics of General, Organic, and Biological Chemistry (Ball et al.) by David W. Ball, John W. Hill, and Rhonda J. Scott via LibreTexts, CC BY-NC-SA 4.0./ A LibreTexts version of Introduction to Chemistry: GOB (v. 1.0), CC BY-NC 3.0. / Pages were combined and content edited to improve flow and student understanding.

  1. In addition to the alkane and phosphate components, CMP has alcohol, ether, amide and amine groups.  UMP has alcohol, ether and amides groups. dTMP has alcohol, ether and amides groups.  AMP has alcohol, ether, alkene, and amine groups.  GMP has alcohol, ether, alkene, amine and amide groups.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Organic and Biochemistry Supplement to Enhanced Introductory College Chemistry Copyright © 2024 by Gregory Anderson; Caryn Fahey; Adrienne Richards; Samantha Sullivan Sauer; David Wegman; and Jen Booth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book