DNA to Protein
An interactive journey through the molecular machinery of life — from double helix to folded protein.
Follow the flow of information in a cell: DNA → RNA → Protein. Read each section, explore the interactive animations, then test yourself with 40 active recall questions and 25 hard MCQs.
DNA Structure
The blueprint of life — nucleotides, base pairing, and the double helix. Understanding DNA's architecture is the foundation for everything in molecular biology.
What Is DNA?
DNA (deoxyribonucleic acid) is the molecule that stores genetic information in all living cells. Think of DNA as a twisted ladder — the sides of the ladder are the sugar-phosphate backbone, and the rungs are pairs of nitrogenous bases held together by hydrogen bonds. The entire structure twists into the famous double helix first described by Watson and Crick in 1953.
DNA resides primarily in the nucleus of eukaryotic cells (with a small amount in mitochondria). It is remarkably long — if you stretched out all the DNA from a single human cell, it would be approximately 2 metres long, yet it is coiled and compacted into a nucleus just 6 micrometres across.
The Nucleotide — The Building Block
DNA is a polymer (long chain) of repeating subunits called nucleotides. Each nucleotide has exactly three components:
1. A phosphate group (PO₄) — the negatively charged "connector" that links one nucleotide to the next via phosphodiester bonds, creating the backbone.
2. A 5-carbon sugar called deoxyribose — the "D" in DNA. The carbons are numbered 1' through 5'. The 5' carbon connects to the phosphate, and the 3' carbon provides the —OH group where the next nucleotide attaches. This numbering gives DNA its directionality.
3. A nitrogenous base — the information-carrying part. DNA has four bases: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). The specific sequence of these bases encodes all genetic instructions.
Each nucleotide = Phosphate (P) + Sugar (pentagon) + Base. Hover over components to learn more.
Purines vs Pyrimidines
Purines = Adenine (A) and Guanine (G) — double-ring structure (larger bases).
Pyrimidines = Cytosine (C) and Thymine (T) — single-ring structure (smaller bases). In RNA, Uracil (U) replaces Thymine.
A purine always pairs with a pyrimidine, keeping the helix width constant.
PURe As Gold → Purines are A and G (double-ring). Everything else (C, T, U) is a pyrimidine (single-ring).
Base Pairing Rules — Chargaff's Rules
Adenine (A) pairs with Thymine (T) — joined by 2 hydrogen bonds. In RNA, A pairs with Uracil (U).
Guanine (G) pairs with Cytosine (C) — joined by 3 hydrogen bonds. G-C pairs are stronger than A-T pairs.
The strands run in opposite directions — one reads 5'→3' and the other 3'→5'. This is called antiparallel.
A-T: Apple Tree (2 H-bonds) | G-C: Good Car (3 H-bonds — stronger).
Why G-C content matters in pharmacy: DNA regions rich in G-C base pairs are harder to denature because each G-C pair has 3 hydrogen bonds versus 2 for A-T. This is directly relevant in PCR-based diagnostic tests — primers must have the right melting temperature (Tm), which depends on G-C content.
Intercalating agents: Anticancer drugs like doxorubicin and daunorubicin work by inserting between stacked base pairs in the DNA double helix, distorting the helix and preventing replication and transcription.
DNA vs RNA — Key Differences
| Feature | DNA | RNA |
|---|---|---|
| Sugar | Deoxyribose | Ribose (has —OH at 2') |
| Bases | A, T, G, C | A, U, G, C |
| Structure | Double-stranded helix | Usually single-stranded |
| Location | Nucleus, mitochondria | Nucleus and cytoplasm |
| Function | Long-term genetic storage | Messenger, transfer, ribosomal, regulatory |
| Stability | Very stable | Less stable (2'-OH susceptible to hydrolysis) |
Why RNA is less stable — and why that matters for mRNA vaccines: RNA's extra hydroxyl group at the 2' position makes it chemically less stable than DNA. This is why mRNA vaccines require ultra-cold storage. Pharmaceutical scientists use modified nucleotides (like pseudouridine) and lipid nanoparticle encapsulation to protect the fragile mRNA.
The Central Dogma
The flow of genetic information: DNA → RNA → Protein. This is the single most important concept in molecular biology.
What Is the Central Dogma?
The Central Dogma, first articulated by Francis Crick in 1958, describes the fundamental flow of genetic information: DNA is transcribed into RNA, and RNA is translated into Protein.
01 — Replication
DNA → DNA. Before a cell divides, it copies its entire genome. Semi-conservative — each new helix has one parent and one new strand.
02 — Transcription
DNA → mRNA. RNA polymerase copies a gene's DNA sequence into messenger RNA in the nucleus.
03 — Translation
mRNA → Protein. Ribosomes read mRNA in three-nucleotide codons, each specifying an amino acid.
Why Is It Called a "Dogma"?
The core idea is that sequence information flows from nucleic acid to protein, never in reverse. Exceptions exist — most notably reverse transcriptase in retroviruses like HIV — but the principle DNA → RNA → Protein remains foundational.
Reverse transcriptase and HIV treatment: HIV uses reverse transcriptase to convert its RNA genome into DNA. This is the target for NRTIs (zidovudine, tenofovir, emtricitabine) and NNRTIs (efavirenz).
Gene expression and drug targets: Most drugs target proteins produced by the Central Dogma pipeline. Newer therapeutics like antisense oligonucleotides (ASOs) and siRNA target mRNA before translation.
"Don't Really Party" → DNA (Replication), RNA (Transcription), Protein (Translation).
DNA Replication
How the cell copies its entire genome before division — semi-conservative, bidirectional, and astonishingly accurate.
Overview: Semi-Conservative Replication
DNA replication is semi-conservative — each new double helix contains one parent strand and one newly synthesised daughter strand, demonstrated by the Meselson-Stahl experiment (1958).
Replication begins at origins of replication. Human cells have thousands of origins that fire simultaneously.
Key Enzymes and Their Roles
Topoisomerase inhibitors as anticancer drugs: Irinotecan, topotecan (topo I inhibitors) and etoposide, doxorubicin (topo II inhibitors) trap topoisomerase on DNA, creating permanent breaks that trigger apoptosis.
Fluoroquinolone antibiotics: Ciprofloxacin, levofloxacin target bacterial DNA gyrase — selective toxicity.
Leading Strand vs Lagging Strand
Leading strand: Built 5'→3' continuously in the same direction as fork movement. Only one primer needed.
Lagging strand: Built in short Okazaki fragments going away from the fork. Each fragment needs its own primer. DNA Pol I removes primers; ligase seals the fragments.
Why 5'→3' Only?
DNA polymerase adds nucleotides to the 3'-OH group. Energy comes from hydrolysis of the incoming nucleotide's 5' triphosphate.
Leading = Continuous (same direction as fork). Lagging = Fragments (Okazaki, joined by ligase). Both 5'→3'.
Transcription
Copying DNA into messenger RNA — the first step in gene expression, occurring in the nucleus.
Template Strand vs Coding Strand
Template strand (antisense): Read by RNA polymerase, runs 3'→5'. mRNA is complementary to it.
Coding strand (sense): Same sequence as mRNA (T→U), runs 5'→3'. NOT read by RNA polymerase.
Key rule: To convert coding strand to mRNA, just replace every T with U. Do not take the complement.
The Three Stages of Transcription
1. Initiation: RNA polymerase binds to the promoter. DNA unwinds locally (~12–15 bp) to form a transcription bubble.
2. Elongation: RNA polymerase moves along the template (3'→5') and synthesises mRNA 5'→3', ~40 nt/sec.
3. Termination: RNA polymerase reaches a terminator sequence, detaches, and releases pre-mRNA.
Rifampicin: Binds the β-subunit of bacterial RNA polymerase, blocking mRNA elongation. Cornerstone of TB treatment.
α-Amanitin: Death cap mushroom toxin that inhibits human RNA polymerase II → acute liver failure.
Template = The one read (3'→5'). Coding = The one that codes (5'→3', same as mRNA except T→U).
mRNA Processing
Introns, exons, splicing, and post-transcriptional modifications — turning raw pre-mRNA into mature messenger.
The Three Modifications — CPS
1. 5' Cap: A modified guanine (m⁷G) added to the 5' end. Protects from degradation, required for ribosome recognition.
2. 3' Poly-A Tail: ~100–250 adenine nucleotides added to the 3' end. Protects from degradation, aids nuclear export.
3. Splicing: The spliceosome (snRNPs) removes introns and joins exons. Result: mature mRNA.
Cap (5'), Poly-A tail (3'), Splicing (introns out). EXons = EXpressed. INtrons = INterruptions.
Alternative Splicing
The same pre-mRNA can be spliced differently to produce different proteins from a single gene. This is how ~20,000 genes produce 100,000+ proteins.
Spinraza (nusinersen): An ASO drug for spinal muscular atrophy (SMA) that corrects SMN2 pre-mRNA splicing so exon 7 is included, producing functional SMN protein.
mRNA vaccine design: Poly-A tail length (100–120 adenines) is optimised to maximise mRNA stability and protein production.
Translation
Reading the mRNA code to build a protein — the final step of gene expression, at ribosomes in the cytoplasm.
Key Players
mRNA: Carries the genetic code. Read 5'→3'.
Ribosome: Three sites: A-site (incoming tRNA), P-site (growing chain), E-site (exit).
tRNA: Adapter molecule — carries amino acid, has anticodon complementary to mRNA codon.
Three Stages
1. Initiation: Small subunit + mRNA + initiator tRNA (Met at AUG) + large subunit joins.
2. Elongation: Charged tRNA enters A-site → peptide bond → translocation. ~15–20 aa/sec.
3. Termination: Stop codon (UAA, UAG, UGA) → release factors → polypeptide released.
AUG = "Always Use for Go!" — Methionine. Stops: UAA, UAG, UGA — "U Are Annoying, U Are Gone, U Go Away!"
Antibiotics targeting the bacterial ribosome:
Tetracyclines — block tRNA binding at 30S A-site.
Aminoglycosides — cause mRNA misreading at 30S.
Macrolides — block translocation at 50S.
Chloramphenicol — inhibits peptide bond formation at 50S.
Linezolid — prevents initiation complex at 50S.
Worked Translation Examples
Step-by-step from DNA coding strand to protein — practice these by hand before reading the answers.
DNA coding strand: 5'-ATGGCCCGAGGGGCTCGCATAACAGCG-3'
Step 1 — Transcribe (T→U): 5'-AUGGCCCGAGGGGCUCGCAUAACAGCG-3'
Step 2 — Codons from AUG: AUG-GCC-CGA-GGG-GCU-CGC-AUA-ACA-GCG
Step 3 — Translate: Met-Ala-Arg-Gly-Ala-Arg-Ile-Thr-Ala
Protein: M-A-R-G-A-R-I-T-A
DNA coding strand: 5'-ATGGTTCCAATTGCGATA-3'
Transcribe: 5'-AUGGUUCCAAUUGCGAUA-3'
Codons: AUG-GUU-CCA-AUU-GCG-AUA
Translate: Met-Val-Pro-Ile-Ala-Ile = MVPIAI
1. Forgetting T→U when going from DNA to mRNA.
2. Starting codons from the wrong position — always start from the first AUG.
3. Confusing template strand with coding strand — if given the coding strand, do NOT complement; just swap T→U.
4. Using DNA bases in the codon table — the codon table uses RNA bases (U, not T).
Enter a DNA coding strand above and click Translate to see the step-by-step conversion from DNA → mRNA → Protein.
The Genetic Code
64 codons, 20 amino acids — the universal dictionary of life.
Properties
Universal: Same codons = same amino acids in virtually all organisms.
Degenerate: 64 codons for 20 amino acids — most amino acids have multiple codons.
Non-overlapping: Each nucleotide belongs to exactly one codon.
Start codon: AUG = Methionine. Stop codons: UAA, UAG, UGA.
Wobble Position
The third position of a codon allows relaxed base pairing. This is why many amino acids have multiple codons differing only at position 3. Silent mutations at the wobble position are generally harmless.
Codon optimisation in biopharmaceuticals: Modifying codon usage to match host organism's preferences dramatically increases protein yield for therapeutic proteins (insulin, antibodies). The amino acid sequence stays the same — only synonymous codons are swapped.
Miniprep — Practical 1
Extracting plasmid DNA from bacteria using alkaline lysis — a fundamental molecular biology technique.
What Is a Miniprep?
A quick, small-scale method for extracting and purifying plasmid DNA from bacterial cells using alkaline lysis.
Solution 1 — Resuspend
Components: Tris buffer + EDTA + RNase A.
Resuspends the bacterial pellet into a uniform suspension.
Solution 2 — Lyse
Components: NaOH + SDS.
Denatures DNA and proteins, dissolves cell membrane. Do not vortex or leave >5 min.
Solution 3 — Neutralise
Components: Potassium acetate (acidic).
Plasmid DNA (small, circular) renatures and stays in solution. Chromosomal DNA (huge) tangles and precipitates.
Resuspend, Lyse, Neutralise. Plasmid survives (small + circular). Chromosomal DNA tangles (huge).
Plasmid DNA in gene therapy and DNA vaccines: The same miniprep principles (scaled up) are used to produce plasmid DNA for gene therapy and as templates for mRNA vaccines. Quality control ensures purity, correct supercoiling, and endotoxin-free product.
Key Comparison Tables
Side-by-side comparisons essential for exam success.
Replication vs Transcription vs Translation
| Feature | Replication | Transcription | Translation |
|---|---|---|---|
| Product | DNA | mRNA | Protein |
| Template | Both DNA strands | Template strand (3'→5') | mRNA (5'→3') |
| Main Enzyme | DNA Pol III | RNA Polymerase | Ribosome |
| Direction | 5'→3' | 5'→3' | N→C terminus |
| Location | Nucleus | Nucleus | Cytoplasm |
| Building Blocks | dNTPs | NTPs | Amino acids (20) |
| Start Signal | Origin of replication | Promoter | AUG |
| Stop Signal | Termination seq. | Terminator seq. | UAA/UAG/UGA |
| Primer Required? | Yes (RNA) | No | No |
| Proofreading? | Yes (3'→5' exonuclease) | No | Synthetases check |
Template Strand vs Coding Strand
| Feature | Template Strand | Coding Strand |
|---|---|---|
| Other Names | Antisense, non-coding | Sense, non-template |
| Direction | 3'→5' (read by RNA Pol) | 5'→3' |
| Relation to mRNA | Complementary | Same sequence (T→U) |
| In Exam | Given → complement to get mRNA | Given → just replace T→U |
Mnemonics & Memory Aids
Quick-fire memory tools — review these the night before your exam.
"Don't Really Party" → DNA, RNA, Protein
A-T: Apple Tree (2 bonds) | G-C: Good Car (3 bonds)
EXons = EXpressed. INtrons = INterruptions.
Cap, Poly-A, Splicing.
AUG = Always Use for Go! Stops: UAA, UAG, UGA.
Leading = Continuous. Lagging = Fragments (Okazaki).
Template read 3'→5'; mRNA built 5'→3'.
Resuspend, Lyse, Neutralise.
Active Recall — 40 Questions
Cover the answers. Write yours first. Then reveal and self-assess.
Hard MCQ Exam — 25 Questions
Exam-level multiple choice testing application, analysis, and clinical reasoning.