Molecular Biology of the Cell · Part 1 of 2 · 10 MCQs per part
Part 1 of 2
DNA Structure, the Nucleosome, and Chromatin Packaging
The two meters of DNA in a human cell must be compacted 10,000-fold to fit into a nucleus 6 μm across — yet specific genes must remain accessible to the transcription machinery at any moment. Understanding how DNA is packaged explains gene regulation, inheritance, and the origins of cancer.
4.1 The DNA Double Helix
DNA is a double helix composed of two antiparallel polynucleotide strands wound around each other. Each strand is a polymer of deoxyribonucleotides linked by 3′–5′ phosphodiester bonds. The four bases — adenine (A), thymine (T), guanine (G), and cytosine (C) — project inward and pair specifically: A with T (two hydrogen bonds) and G with C (three hydrogen bonds). This complementary base pairing means the sequence of one strand dictates the other, enabling faithful replication and transcription.
The helix has a major groove and a minor groove, which proteins exploit for sequence-specific DNA recognition. Under physiological conditions, DNA adopts the B-form helix with ~10.5 base pairs per turn and a rise of ~3.4 Å per base pair.
Key term
Complementary base pairing
The specific hydrogen-bonding between A–T and G–C pairs that holds the two strands of a DNA double helix together and governs replication, transcription, and translation fidelity.
4.2 Nucleosomes: First Level of Compaction
DNA in eukaryotes wraps around histone octamers to form nucleosomes — the fundamental repeating unit of chromatin. Each octamer contains two copies each of histones H2A, H2B, H3, and H4. About 147 bp of DNA makes ~1.65 left-handed turns around the histone core. Adjacent nucleosomes are connected by linker DNA (10–80 bp) and can be compacted further with the help of linker histone H1.
This arrangement reduces DNA length by about 7-fold. The N-terminal histone tails protrude from the nucleosome surface and are subject to extensive post-translational modifications (acetylation, methylation, phosphorylation, ubiquitylation) that regulate chromatin structure and gene expression.
Key term
Nucleosome
The basic structural unit of chromatin: 147 bp of DNA wound ~1.65 turns around an octamer of histone proteins (two each of H2A, H2B, H3, H4), reducing DNA length ~7-fold.
4.3 Higher-Order Chromatin Structure
Nucleosomal arrays compact further into a 30-nm fiber (though its exact structure remains debated), and then into larger loops, domains, and ultimately the highly condensed chromosomes visible at mitosis. Chromatin exists in two functional states: euchromatin — relatively open, gene-rich, and transcriptionally active — and heterochromatin — densely packed, gene-poor, and transcriptionally silent. Heterochromatin is further divided into constitutive heterochromatin (e.g., centromeres, telomeres) and facultative heterochromatin (e.g., the inactive X chromosome).
✒
Pause & Recall
Why do histone modifications not directly alter DNA sequence, yet can still be inherited through cell division?
Histone modifications constitute epigenetic information. Enzymes that write modifications (e.g., histone methyltransferases) can be recruited to daughter nucleosomes by modified parental histones, propagating the modification pattern after replication. Some modifications recruit chromatin-modifying complexes that reinforce the same state, creating a self-sustaining epigenetic mark without changing the DNA sequence.
Practice Questions — Part 1Score: 0 / 10
1. In the DNA double helix, which bases pair with each other?
Adenine pairs with thymine via two hydrogen bonds, and guanine pairs with cytosine via three hydrogen bonds. This complementary base pairing (Chargaff's rules: A=T, G=C) holds the two antiparallel strands together in the double helix.
2. How many base pairs of DNA wrap around a single histone octamer?
Approximately 147 bp of DNA wraps ~1.65 turns around the histone octamer in a nucleosome. About 200 bp is the total repeat length including linker DNA, but the core particle itself contains ~147 bp.
3. The histone octamer in a nucleosome core particle is composed of:
The histone octamer contains two copies each of the four core histones: H2A, H2B, H3, and H4, giving eight histone molecules total. Linker histone H1 is not part of the core octamer but associates with the linker DNA.
4. Euchromatin differs from heterochromatin in that euchromatin is:
Euchromatin is less condensed, accessible to transcription factors and RNA polymerase, and contains most actively expressed genes. Heterochromatin is densely packed, transcriptionally repressed, and includes constitutive regions like centromeres and telomeres.
5. Which groove of the DNA double helix is primarily used for sequence-specific protein recognition?
The major groove exposes more of the chemical groups on the base pairs (particularly H-bond donors and acceptors) and is wider, making it the primary site for sequence-specific recognition by transcription factors and other DNA-binding proteins. The minor groove is narrower and less informative for sequence discrimination.
6. The two strands of the DNA double helix run in what orientation relative to each other?
The two strands are antiparallel: one runs 5′ to 3′ in one direction while the complementary strand runs 5′ to 3′ in the opposite direction (3′ to 5′ relative to the first strand). This antiparallel arrangement is essential for proper base pairing geometry.
7. Which modification to histone H3 lysine 9 (H3K9me3) is most associated with:
Trimethylation of histone H3 lysine 9 (H3K9me3) is a hallmark of constitutive heterochromatin. It recruits HP1 (heterochromatin protein 1), which compacts chromatin and promotes gene silencing. By contrast, H3K4me3 and H3K27ac are marks of active transcription.
8. Linker histone H1 functions primarily to:
Linker histone H1 binds to the linker DNA between nucleosome cores and to the DNA entering/exiting each nucleosome. This stabilizes higher-order chromatin folding and promotes compaction into the 30-nm fiber, generally repressing transcription.
9. The B-form DNA helix has approximately how many base pairs per turn?
B-form DNA (the predominant form under physiological conditions) has ~10.5 base pairs per helical turn, with a rise of ~3.4 Å per base pair, giving a pitch of ~35 Å per turn. The 3.4 Å figure refers to the rise per base pair, not the number of bp per turn.
10. Centromeres are an example of:
Centromeres and telomeres are constitutive heterochromatin — they are permanently condensed in all cell types regardless of developmental state. Facultative heterochromatin (like the inactive X) is condensed only in specific cell types or developmental stages.
Part 1 complete! Score: 0 / 10
Section B · Recall Questions · Part 1
Type your answer, then click Check to reveal the sample answer.
B1
Describe the key structural features of the DNA double helix.
Sample answer: DNA is a double helix of two antiparallel polynucleotide strands. Bases face inward and pair specifically (A–T via 2 H-bonds, G–C via 3 H-bonds). Phosphodiester bonds link nucleotides 3′→5′ within each strand. The helix has major and minor grooves. Under physiological conditions it adopts B-form with ~10.5 bp/turn.
B2
What is a nucleosome and how does it contribute to DNA compaction?
Sample answer: A nucleosome consists of ~147 bp of DNA wound ~1.65 turns around an octamer of histone proteins (2× H2A, H2B, H3, H4). This first level of compaction reduces DNA length ~7-fold. Nucleosomes are connected by linker DNA and can be further compacted into 30-nm fibers.
B3
Name two types of histone modifications and describe how they generally affect gene expression.
Sample answer: Histone acetylation (e.g., H3K27ac) neutralizes positive charge on lysines, reducing DNA–histone interaction and generally opening chromatin to activate transcription. Histone methylation has context-dependent effects: H3K4me3 marks active promoters, while H3K9me3 and H3K27me3 are repressive marks that compact chromatin and silence genes.
B4
Distinguish between constitutive and facultative heterochromatin with one example of each.
Sample answer: Constitutive heterochromatin is permanently condensed in all cell types (e.g., centromeres, telomeres — repetitive sequences permanently silenced). Facultative heterochromatin is condensed only in certain cell types or developmental stages (e.g., the inactive X chromosome in female mammals, where one X is silenced by Xist RNA and Polycomb complexes).
B5
Explain why the major groove of DNA is important for protein–DNA recognition.
Sample answer: The major groove is wider than the minor groove and exposes more chemical groups (H-bond donors, acceptors, and hydrophobic methyl groups) on the edges of the base pairs. This richer chemical "code" allows proteins to distinguish between different DNA sequences without unwinding the helix, enabling sequence-specific binding by transcription factors, restriction enzymes, and other regulatory proteins.
B6
What is DNA methylation at CpG dinucleotides and how does it relate to gene silencing?
Sample answer: DNA methyltransferases add a methyl group to the 5-carbon of cytosine in CpG dinucleotides. Methylated CpGs in gene promoters (CpG islands) recruit methyl-CpG binding proteins (e.g., MeCP2) and histone deacetylases that compact chromatin and silence transcription. Methylation is maintained after replication by DNMT1 and is a heritable epigenetic mark.
B7
Why are topoisomerases needed during DNA replication and transcription?
Sample answer: Opening the double helix for replication or transcription creates torsional stress — overwinding (positive supercoiling) ahead of the moving polymerase and underwinding (negative supercoiling) behind. Topoisomerases relieve this stress: Type I topoisomerases transiently cut and rejoin one strand; Type II (including gyrase in bacteria) transiently cut both strands to pass another duplex through, relaxing supercoils and allowing chromosome segregation.
B8
What are telomeres and why are they essential for chromosome integrity?
Sample answer: Telomeres are repetitive DNA sequences (TTAGGG in humans) at chromosome ends, bound by the shelterin protein complex. They protect chromosome ends from being recognized as DNA double-strand breaks, prevent end-to-end fusions, and solve the end-replication problem via telomerase. Without telomeres, chromosomes shorten with each division, eventually triggering senescence or apoptosis.
B9
What is the function of the centromere during cell division?
Sample answer: The centromere is a specialized chromatin region that nucleates the kinetochore — a multiprotein complex that attaches to spindle microtubules. This attachment allows chromosomes to be pulled to opposite poles during mitosis/meiosis, ensuring accurate chromosome segregation. Centromeric chromatin contains the histone variant CENP-A instead of H3.
B10
State Chargaff's rules and explain what they imply about DNA structure.
Sample answer: Chargaff's rules state that in any DNA sample, [A]=[T] and [G]=[C], so the ratio of purines to pyrimidines is approximately 1:1. This implies that adenine always pairs with thymine and guanine always pairs with cytosine in a double-stranded structure, directly supporting the complementary base-pairing model of the double helix.
Section C · Critical Thinking · Part 1
Analyze and apply concepts; compare your reasoning to the sample answer.
C1
A drug inhibits histone deacetylases (HDACs). Predict the effect on chromatin structure and gene expression, and explain why HDAC inhibitors are investigated as anticancer agents.
Sample answer: HDACs remove acetyl groups from histone lysines, re-establishing positive charge that tightens DNA–histone interaction and represses transcription. Inhibiting HDACs leads to hyperacetylation, more open chromatin, and increased transcription of many genes — including tumor-suppressor genes that cancer cells silence. This can restore anti-proliferative pathways, induce differentiation, or trigger apoptosis in cancer cells, explaining the therapeutic interest.
C2
How can a differentiated cell (e.g., a liver cell) maintain its identity through hundreds of cell divisions without changing its DNA sequence?
Sample answer: Cell identity is maintained through epigenetic inheritance of chromatin states. DNA methylation patterns are copied after replication by DNMT1 (maintenance methyltransferase). Histone modifications are re-established on daughter nucleosomes by "reader-writer" complexes: modified histones recruit the same modifying enzyme to neighboring nucleosomes. Polycomb and Trithorax complexes maintain repressed and active states, respectively, through cell division, perpetuating gene expression patterns without DNA sequence changes.
C3
Why can't transcription factors simply bind to their target sequences in nucleosomal DNA without assistance? What machinery helps them gain access?
Sample answer: When DNA is wrapped around nucleosomes, the major groove faces inward or is sterically occluded by histone proteins, blocking most transcription factors from accessing their binding sites. ATP-dependent chromatin remodeling complexes (e.g., SWI/SNF, ISWI, CHD families) use ATP hydrolysis to slide, eject, or restructure nucleosomes, exposing regulatory DNA. Pioneer transcription factors (e.g., FOXA1) are an exception — they can bind nucleosomal DNA and recruit remodeling complexes.
C4
Most cancer cells reactivate telomerase. Explain why this is essential for tumor progression and why normal somatic cells suppress telomerase.
Sample answer: Each round of DNA replication shortens telomeres because the lagging strand cannot replicate the very end. Normal somatic cells suppress telomerase, so their telomeres shorten with each division, eventually triggering replicative senescence — a tumor-suppressor mechanism limiting cell proliferation. Cancer cells that reactivate telomerase (or use ALT — alternative lengthening of telomeres) maintain telomere length, allowing unlimited division. This reactivation is a hallmark of cancer; ~85–90% of human cancers show elevated telomerase activity, making it a therapeutic target.
C5
What are topologically associating domains (TADs) and why is their disruption clinically significant?
Sample answer: TADs are self-interacting chromatin domains (~100 kb–1 Mb) within which enhancers and promoters preferentially contact each other. TAD boundaries are demarcated by CTCF-binding sites and cohesin. Disruption of TAD boundaries — by deletions, inversions, or mutations — can bring enhancers from one TAD into proximity with genes in an adjacent TAD, causing aberrant gene activation. This "enhancer hijacking" has been found in developmental diseases and cancers (e.g., limb malformations when a limb enhancer is displaced to activate an oncogene).
Section D · Interactive Questions · Part 1
Enter your answer and click Check for instant feedback.
D1
How many hydrogen bonds hold an A–T base pair together? (number)
D2
How many base pairs of DNA are wrapped around each histone octamer? (number)
D3
What histone variant replaces H3 at centromeres? (one word, all caps)
D4
The human telomere repeat sequence is TTAGGG. How many guanines are in each repeat? (number)
D5
What enzyme adds telomeric repeats to chromosome ends? (one word)
Part 2 →
You've covered DNA structure and chromatin packaging. Part 2 explores how genomes are organized — the types of sequences they contain, how gene families arise, and what transposable elements reveal about genome evolution.
Part 2 of 2
Genome Organization, Gene Families, and Transposable Elements
Human DNA contains ~3.2 billion base pairs, but only ~1.5% encodes proteins. The rest — once dismissed as "junk" — turns out to include regulatory elements, structural sequences, and the remnants of millions of transposable element insertions that shaped the genome over evolutionary time.
4.4 Genome Organization and Repetitive DNA
Eukaryotic genomes contain a mix of unique sequences (single-copy genes), moderately repetitive sequences (gene families, tandem repeats such as rRNA and histone genes), and highly repetitive sequences (satellite DNA at centromeres/telomeres, which reassociates very rapidly in renaturation experiments — Cot analysis). In humans, about 50% of the genome derives from transposable elements.
C-value paradox: genome size (C-value) does not correlate with organismal complexity. Onions have ~5× more DNA than humans. This reflects differences in the amount of non-coding and repetitive DNA, not gene number.
Key term
C-value paradox
The lack of correlation between genome size (C-value, the haploid DNA content) and organismal complexity, explained by variable amounts of non-coding repetitive sequences across species.
4.5 Gene Families and Genome Duplication
Many genes exist in related copies called gene families that arose by gene duplication and divergence. Examples include the globin family (α- and β-globin clusters), Hox genes, and immunoglobulin genes. After duplication, one copy can evolve new functions (neofunctionalization) or the two copies can subdivide ancestral functions (subfunctionalization). Some duplicates become pseudogenes — nonfunctional remnants bearing mutations that disable expression.
4.6 Transposable Elements
Transposable elements (TEs) are DNA sequences that can move within the genome. In humans, most TEs are non-autonomous remnants. They are classified as:
Class I (retrotransposons): move via an RNA intermediate ("copy-and-paste"). LINEs (Long Interspersed Elements, especially LINE-1) and SINEs (Short Interspersed Elements, especially Alu) are the most abundant human TEs.
Class II (DNA transposons): move via "cut-and-paste" using transposase, without an RNA intermediate. These are largely inactive in humans.
TEs have shaped genomes by creating mutations, new regulatory elements, novel exons (exon shuffling), and species-specific gene expression patterns. Active retrotransposition can cause disease (e.g., LINE-1 insertions in cancer).
✒
Pause & Recall
What is the difference between a gene family and a pseudogene?
A gene family consists of related, functional genes that arose by duplication and divergence, each producing a protein with similar but not identical function (e.g., α- and β-globins). A pseudogene is a nonfunctional copy of a gene that has accumulated mutations (stop codons, frameshifts, lack of regulatory sequences) that prevent it from being expressed or producing functional protein. Pseudogenes represent evolutionary dead ends, while gene family members are retained by selection.
Practice Questions — Part 2Score: 0 / 10
1. The C-value paradox refers to the observation that:
The C-value paradox is the observation that genome size (C-value = haploid DNA content) does not correlate with organismal complexity. Salamanders and some plants have far more DNA than humans, mainly due to differences in the amount of non-coding and repetitive DNA rather than gene number.
2. LINEs and SINEs are classified as:
LINEs (e.g., LINE-1) and SINEs (e.g., Alu) are Class I retrotransposons that replicate via a "copy-and-paste" mechanism: the element is transcribed to RNA, then reverse-transcribed to DNA and inserted at a new genomic location. Together they make up ~45% of the human genome.
3. A pseudogene differs from a functional gene family member in that it:
Pseudogenes are gene copies that have been inactivated by mutations such as premature stop codons, frameshifts, disrupted splice sites, or loss of a promoter. They share sequence similarity with their functional relatives but cannot produce a functional protein. The human genome contains ~13,000 pseudogenes.
4. Gene duplication followed by mutation of one copy to acquire a new function is termed:
Neofunctionalization occurs when one copy of a duplicated gene accumulates mutations conferring a novel function while the other copy maintains the original function. Subfunctionalization is when the ancestral function is partitioned between the two copies. Both allow gene family diversification after duplication.
5. What fraction of the human genome consists of protein-coding sequences?
Only approximately 1.5% of the human genome encodes proteins. The remaining ~98.5% includes introns, regulatory sequences, non-coding RNAs, transposable elements, and other non-protein-coding sequences. However, much of this non-coding DNA is functionally important for gene regulation.
6. The globin gene family in humans is an example of:
The human globin family — including α-, β-, γ-, δ-, and ε-globin genes — arose through repeated gene duplication and divergence over evolutionary time. They are clustered in two chromosomal loci, show tissue- and developmental stage-specific expression, and all encode oxygen-carrying proteins.
7. Highly repetitive satellite DNA is characterized by:
Satellite DNA consists of short sequences (typically 5–200 bp) repeated in long tandem arrays at centromeres and telomeres. It is named "satellite" because it forms distinct bands (satellites) in CsCl density-gradient centrifugation. These sequences are non-coding and constitute constitutive heterochromatin.
8. Transposable elements have contributed to genome evolution by:
TEs can insert into or near genes, creating new promoters or enhancers, contributing to exon shuffling, generating duplications or deletions through ectopic recombination, and producing mutations. These TE-mediated events have profoundly shaped genome architecture and gene regulation over evolutionary time.
9. The most abundant short repetitive element in the human genome (~11% of human DNA) is:
Alu elements are the most abundant SINE in the human genome, with over 1 million copies comprising ~11% of the genome. They are ~300 bp and are related to the 7SL RNA gene. LINE-1 is the most abundant LINE (~17–20% of the genome) but is a much longer element. Alu elements are among the most active current retrotransposons in humans.
10. Which best describes a processed pseudogene?
Processed pseudogenes arise when an mRNA is reverse-transcribed and inserted back into the genome (by retrotransposon machinery). Because they derive from processed mRNA, they lack introns and typically lack a promoter, making them nonfunctional. They often have a poly-A tail and are flanked by target-site duplications.
Part 2 complete! Score: 0 / 10
Section B · Recall Questions · Part 2
Type your answer, then click Check to reveal the sample answer.
B1
Describe how a retrotransposon moves within the genome.
Sample answer: A retrotransposon is transcribed into RNA, which is then reverse-transcribed into cDNA by a reverse transcriptase encoded by the element. The resulting DNA is integrated at a new location in the genome by an integrase. This "copy-and-paste" mechanism increases copy number. LINEs encode their own reverse transcriptase; SINEs like Alu are non-autonomous and use LINE machinery.
B2
How do multigene families arise and what evolutionary advantage do they provide?
Sample answer: Multigene families arise through gene duplication (via unequal crossover, retrotransposition, or whole-genome duplication) followed by divergence of the copies. One copy can mutate freely while the other maintains the original function, allowing the evolution of related proteins with new or modified roles. This confers versatility: e.g., globin family members are expressed differentially during development (embryonic, fetal, adult) and carry oxygen with slightly different affinities.
B3
What are Hox genes and why is their chromosomal organization significant?
Sample answer: Hox genes encode homeodomain transcription factors that specify positional identity along the anterior-posterior body axis during development. Humans have four Hox clusters (HOXA–D) on four different chromosomes, totaling 39 genes. Their order on the chromosome corresponds to their expression domain along the body axis (collinearity) — a conserved arrangement from flies to humans that reflects their coordinated regulation during development.
B4
Why are rRNA genes present in hundreds of copies in most eukaryotic genomes?
Sample answer: Ribosomes are needed in enormous numbers by actively growing cells — a typical mammalian cell contains millions of ribosomes. Each ribosome requires several rRNA molecules. Since RNA polymerase can only transcribe each gene at a limited rate, having hundreds of tandemly repeated rRNA genes (in nucleolar organizer regions) allows the cell to produce sufficient rRNA to meet biosynthetic demands. This is an example of gene amplification by tandem duplication for dosage purposes.
B5
What are long non-coding RNAs (lncRNAs) and give one example of their functional role.
Sample answer: Long non-coding RNAs (lncRNAs) are RNA transcripts >200 nucleotides that do not encode proteins. They function in diverse regulatory roles: scaffolding chromatin-modifying complexes, acting as enhancer RNAs, decoys for miRNA, or structural components. Xist is a classic example: it is transcribed from the inactive X chromosome, coats it in cis, and recruits Polycomb repressive complexes to silence the chromosome, enabling X-chromosome inactivation in female mammals.
B6
How can unequal crossing over between tandem repeats lead to copy-number variation and disease?
Sample answer: When homologous chromosomes (or sister chromatids) misalign at repetitive sequences during meiosis, crossover between non-equivalent positions produces one chromosome with extra copies and one with fewer copies of the repeat — copy-number variation (CNV). If the repeat contains a gene, this changes gene dosage. Example: duplication of the PMP22 gene by unequal crossover causes Charcot-Marie-Tooth disease type 1A, while deletion of one copy causes hereditary neuropathy with liability to pressure palsies.
B7
What is genomic imprinting and why does it violate Mendelian expectations?
Sample answer: Genomic imprinting is the epigenetic silencing of one allele of a gene depending on which parent it came from, so only the maternal or paternal copy is expressed. This violates Mendelian genetics, where both alleles are equivalent. It is mediated by differential DNA methylation and histone modifications established in the germline. Example: IGF2 (insulin-like growth factor 2) is expressed only from the paternal allele; H19 is expressed only from the maternal allele.
B8
What is a single-nucleotide polymorphism (SNP) and how is it used in genomic studies?
Sample answer: A SNP is a position in the genome where a single nucleotide differs between individuals in a population at a frequency >1%. The human genome contains ~10 million common SNPs. SNPs are used as markers in genome-wide association studies (GWAS) to identify chromosomal regions linked to disease traits, since nearby SNPs tend to be inherited together (linkage disequilibrium). They are also used in ancestry analysis, pharmacogenomics, and forensics.
B9
How does whole-genome duplication (polyploidy) differ from single-gene duplication in its evolutionary consequences?
Sample answer: Whole-genome duplication instantly doubles every gene in the genome, providing two copies of all genes and regulatory sequences simultaneously. This can enable rapid diversification of entire biochemical pathways while maintaining the original functions. It is especially common in plants and has occurred in vertebrate evolution (2R hypothesis: two rounds of WGD at the vertebrate ancestor). Single-gene duplication affects only one gene, while WGD reshapes the entire regulatory network at once.
B10
What is synteny and what does conserved synteny between species tell us?
Sample answer: Synteny refers to the conservation of gene order and chromosomal arrangement between different species. Conserved synteny indicates that large chromosomal regions have remained intact since the species diverged from a common ancestor, implying: (1) the genes in the region function together and are under selection pressure to remain co-localized; (2) it can be used to identify orthologous regions and infer gene function across species; (3) it reveals evolutionary chromosomal rearrangements such as inversions, translocations, and fusions.
Section C · Critical Thinking · Part 2
Analyze and apply concepts; compare your reasoning to the sample answer.
C1
How can retrotransposon reactivation in somatic cells contribute to cancer, and what normally suppresses retrotransposon activity?
Sample answer: Active retrotransposons (especially LINE-1) can insert into genes, disrupting tumor suppressors or activating oncogenes. The resulting genomic instability drives cancer progression. Normally, retrotransposons are silenced by: (1) DNA methylation of their promoters; (2) piRNA-mediated silencing in germline cells; (3) histone H3K9me3 heterochromatin marks. In many cancers, global DNA hypomethylation reactivates TEs, increasing mutational load and chromosomal instability.
C2
A genome-wide association study (GWAS) identifies a SNP in an intergenic region associated with type 2 diabetes. Why might a non-coding SNP affect disease risk?
Sample answer: Non-coding SNPs can affect disease risk by altering regulatory elements: (1) A SNP in an enhancer can change transcription factor binding affinity, altering expression of a nearby gene. (2) A SNP in a splice site or 3′UTR can affect RNA processing or stability. (3) A SNP in a lncRNA or miRNA sequence can alter its function. The GWAS signal may also be in linkage disequilibrium with a causal variant elsewhere. Most GWAS hits (~90%) fall in non-coding regions, underscoring the importance of regulatory variation in complex disease.
C3
A deletion on chromosome 15q11-13 causes Prader-Willi syndrome if inherited from the father but Angelman syndrome if inherited from the mother. Explain this using imprinting.
Sample answer: The 15q11-13 region contains imprinted genes. The paternal allele expresses several genes (including SNRPN), while the maternal allele expresses UBE3A (especially in neurons). If the paternal copy is deleted, the maternally expressed genes (UBE3A) function normally but the paternally expressed genes are lost — causing Prader-Willi syndrome. If the maternal copy is deleted, UBE3A is silenced (the paternal copy is normally imprinted in neurons), causing Angelman syndrome. The same chromosomal deletion produces two distinct syndromes depending on its parental origin.
C4
Explain why copy-number variations (CNVs) can have more severe phenotypic effects than point mutations in the same gene.
Sample answer: CNVs typically span kilobases to megabases and can simultaneously affect multiple genes, non-coding regulatory elements, and entire pathways. A deletion CNV can completely eliminate gene expression (rather than just reducing function), while a duplication doubles protein dosage — both extremes can be severely disruptive. Moreover, CNVs can disrupt TAD boundaries and long-range enhancer–promoter contacts, affecting expression of multiple neighboring genes. Point mutations in one gene typically affect only that gene's function, whereas CNVs have broader genomic consequences.
C5
The Human Genome Project produced a reference genome, but it does not represent the full genomic diversity of humans. What are the limitations of a single reference genome?
Sample answer: A single reference genome: (1) doesn't capture the ~0.1% sequence variation between individuals (~3 million SNPs, thousands of CNVs per person); (2) some human populations have sequences absent from the reference (e.g., African populations have more genetic diversity); (3) highly repetitive regions and some structural variants are poorly assembled; (4) it represents a diploid consensus but doesn't distinguish haplotypes. The pangenome project aims to create a reference graph incorporating sequences from hundreds of diverse individuals to better capture human genomic diversity and improve variant interpretation.
Section D · Interactive Questions · Part 2
Enter your answer and click Check for instant feedback.
D1
Approximately what percentage of the human genome is made up of transposable elements? (round number, use %)
D2
How many Hox gene clusters are found in humans? (number)
D3
LINE-1 encodes a reverse transcriptase. What class of transposable element does LINE-1 belong to? (Class I or Class II)
D4
What non-coding RNA is responsible for X-chromosome inactivation in female mammals? (four letters)
D5
In genomic imprinting, IGF2 is expressed only from which parental allele? (maternal or paternal)