M W F; 10:10 - 11:00 pm

 Molecular Biology

 Douglas W. Smith

York 2722

 BIMM 100

 5254 Muir Biology Building

Fall, 2000  

 x42620; dsmith@ucsd.edu

 

 

| BIMM100 | Syllabus | Sections / Off Hrs | Grading Policy | DNASYSTEM |
| Lectures | Journal Articles | Study Qs | Lab Techniques | Exams |

 


 

 

5. Genome Sequencing

Outline:

A. Methods for DNA Sequencing
1. Maxam-Gilbert Chemical DNA Sequencing
2. Sanger dideoxy DNA Sequencing
3. Automated DNA Sequencing with Fluorescent Labels
B. Genome Sequencing
1. Sequencing is limited to 500-700 bp
2. DNA Sequence Assembly - Contigs
3. Methodologies for DNA Sequence Assembly
a. Shotgun Sequencing
b. Clone Contig Assembly
c. Directed Shotgun Assembly

 

 

A. Methods for DNA Sequencing

1. Maxam-Gilbert Chemical Sequencing ... [Brown, Box 4.1]
no in vitro DNA polymerase reaction

1. Use DNA, ds or ss, with radioactive label at one end ONLY

2. In at least 4 separate reactions, treat the DNA with Base specific chemicals that result in cleavage of the DNA strand at that base
Example: DMS (dimethyl sulfate) for G's, hydrazine for pyrimidines

3. Get a nested set of labeled DNA fragments ...

4. Analyse as a ladder on a DNA sequencing gel:
a. Polyacrylamide denaturing gel ... resolution of short ssDNA fragments
b. Denaturing gel: 8 M urea ... keep DNA denatured during electrophoresis
c. Analyse only the labeled DNA fragments via Autoradiography or Fluorescence analysis

5. Read the DNA sequence from bottom of gel to top by examination or "reading"
of the ladder of DNA bands ... gives sequence 5' to 3' (5' -> 3')

 

 


2. Sanger dideoxy DNA Sequencing ... [Brown, Fig 4.2]
comes from DNA Polymerase properties:

1. Have a DNA template and a DNA primer
Cloning vehicles often have universal primers for sequencing into DNA cloned into one of the MCS (Multiple Cloning Sites) sites or Polylinkers ...[Brown, Fig 4.5]

2. Execute 4 separate polymerization reactions containing each of the 4 dNTPs and one each of the four dideoxynucleoside TPs: ddGTP, ddATP, ddTTP, ddCTP ...
To assay the product DNA, one of the 4 dNTPs is radioactively labeled ...
or the Primer is labeled, radioactively or fluorescently ...
or the ddNTP is labeled fluorescently (see below)
Dideoxy means 3'-H as well as 2'-H
When a ddNTP is incorporated, it acts as a Chain Terminator:
DNA synthesis stops since the DNA primer no longer has a 3'-OH Primer Terminus

3. Thus, get as reaction products, a nested set of fragments, each terminated at one of the four bases: G in the ddGTP reaction, A in the ddATP reaction, ...

4. When "run" on a DNA sequencing gel (polyacrylamide, with DNA denatured),
the "nested set" of fragments forms a ladder of DNA bands corresponding to the positions of the bases ...

5. Read the DNA sequence by reading the 4 lanes, one for each base, from bottom up, to correspond to 5' -> 3' sequence

 



3. Automated DNA Sequencing with Fluorescent Labels ... [Brown, Fig 4.7]
use Sanger dideoxy sequencing ...
But with either fluorescent primers or fluorescent dideoxy chain terminators

Use different fluor for each of the four nucleotide types ...
This permits analysis of ALL FOUR nucleotide reactions for a given DNA sample in ONE LANE of the sequencing gel ... => 4-fold increase in analysis capability per gel run ...

Most recently: capillary gel electrophoresis is used in the Perkin Elmer - ABI 3700 automated sequencing machines rather than slab gel electrophoresis ... thus, have separate thin capillary gel for each DNA sample ...
Advantages:
1. Better resolution, no running over from one lane to another ...
2. Separation of bands occurs much faster ... => 10- to 15-fold increase in speed ...
Both advantages very important for Celera sequencing of the human genome ...

 

 

B. Genome Sequencing

1. The Major Problem in DNA Sequencing:
can only Sequence 500-700 nucleotides from a given DNA sample (!?!?)

This is due to convergence of the DNA bands ... [Brown, Fig 4.1]
That is, the percent size difference between bands of 9 and 10 nucleotides is 10% ...
BUT this percent size difference between bands of 99 and 100 nucleotides is only 1% !!
... thus: bands corresponding to 99 and 100 nucleotides are 10-fold closer to each other than the bands corresponding to DNA fragments of length 9 and 10 nucleotides ...

Thus, in Genome Sequencing or Sequencing of DNA molecules much longer than 500-700 nucleotides, one must obtain sequence of many, overlapping sets of ~500 bp fragments and then join these together by determining how they overlap each other ...

 

2. DNA Sequence Assembly - Joining of Overlapping Sequences to form Contigs

When overlapping Sequences are properly joined to form a single Sequence, this single sequence is called a Contig.
In Sequence Assembly for an entire Genome, ultimately one should end up with a single Contig for each Chromosome, since each Chromosome is composed of a single DNA molecule.

In practice, this is VERY difficult, due to repetitive DNA sequences ...
Repetitive DNA sequences present two major problems for Sequence Assembly:

1. Number of Repeat Copies:
if the length of a repetitive DNA sequence region is long compared to the sequenced DNA length of ~500 bp, it is nearly impossible to determine how many copies of the DNA repeat there are ...

2. Correct Assembly: ... [Brown, Fig 2.2]
if such a long repetitive DNA sequence region is present at several sites on a Genome, then it is nearly impossible to determine what DNA sequences should be properly joined on either side of each repetitive DNA sequence region ...

For these reasons (and a few others), in Genome Sequencing of Genomes from higher Eukaryotes, sequence is not obtained for much repetitive DNA and a given chromosome sequence will be present in several Contigs in the "final assembly" ...

 

3. Methodologies for DNA Sequence Assembly

There is three main methodologies (with variations on a theme):
a. Shotgun Sequencing ...
b. Clone Contig Assembly ...
c. Directed Shotgun Assembly ...

a. Shotgun Sequencing ... [Brown, Fig 2.1, 4.10]

One obtains a high redundancy of sequencing of a given long DNA: 10 - 15 fold
One then uses computer programs to find the correct overlaps and join individual sequence reads into long Contigs ...
To do this uniquely and correctly, one needs 1) little repetitive DNA, and 2) overlaps between reads of 20 - 40 nucleotides ... such overlaps necessitate the high degree of redundancy ...

Closure of Gaps:
One will still have some Gaps that need closing ... this is done via:
1) Use of a Second Clone Library ... often using a different Cloning Vehicle
This will often yield a Clone which will cover the Gap ... [Brown, Fig 4.11]
2) Use of Directed Sequencing:
From this Clone, obtain initial ~500 bp of sequence from one end.
Then use this sequence to construct an oligonucleotide to use as primer to extend the sequence further into the clone: Internal Primer ... [Brown, Fig 4.11]
Continue doing this until one has walked across the Gap, thereby closing the Gap ...

 

b. Clone Contig Assembly ...

First generate a collection of Mapped Clone Fragments:
1) Examples: YACs, BACs, PACs, Cosmids which are mapped relative to each other, forming a set of overlapping large cloned fragments ... often with a high degree of redundancy: 5 - 15 fold.
2) From among these, choose a minimum set of overlapping clones ... this is sometimes called a minimum tiling clone set
3) For each of these overlapping clones, do Shotgun Sequencing:
Re-clone or subclone each large overlapping clone as small sequencing fragments
Sequence these, ultimately forming a single contig corresponding to the sequence of each large clone
4) Join each Contig for each large Clone together via overlapping Sequence and knowledge of the Map of the Clones, yielding the final Sequence of the entire DNA molecule, eg a chromosome

 

How is the Collection of Mapped Clone Fragments Generated?

1. Generate a Genome Library as YACs, BACs, PACs, Cosmids, etc
2. Locate Markers to specific Clones in the Library.
These Markers can be Genetic markers, e.g. genes, or physical DNA markers, e.g. R.sites, STSs, RFLPs, etc ... [Brown, Fig 4.16]
3. Identify Overlapping Clones by identifying pairs of Clones uniquely containing the same Markers, e.g. genes or shared R.fragments ... [Brown, Fig 4.16]

When these Markers are STSs, they can serve as physical anchors in the sequence assembly process ... i.e., one knows the position of specific sequence from the position of the STS ... and the sequence assembly must have this sequence in this position to be correct assembly ...

Chromosome Walking:
One can also determine overlapping clones without specific markers by hybridizing one clone DNA to the DNA of other clones ... [Brown, Fig 4.13] ...
However, use of STSs to provide anchors is very desirable with very large DNA fragments ...

 

c. Directed Shotgun Assembly ...

This is Shotgun DNA Sequencing and Assembly of very large genomes, e.g. Drosophila or human coupled with use of anchored DNA markers, e.g. STSs

Example: Celera approach to Sequencing the Human Genome

Three genomic DNA libraries are used for sequencing:
1) Plasmid library with ~2 kb inserts
2) Library with ~ 10 kb inserts ... different Cloning Vehicle used ... 10 kb is large compared with most Repetitive DNA regions in the human genome, thereby avoiding much of this problem
3) BAC clone library with ~250 kb inserts ...

DNA Sequencing is done on both ends of the inserts present in each of these clones ... [see Brown, Fig 4.5A]
Computerized Assembly of sequence into Contigs is greatly helped by the fact that the distance between the sequences of each of the two ends is nearly constant, at 2 kb or 10 kb or 250 kb: these are called Sequence Pairs

The Sequences from the BAC clone ends are used in two ways:
1. they become STSs
2. these STSs are used to map the BACs and the STSs
3. the mapped STSs become Anchored Sequences for the subsequence Assembly into Contigs

Note the difference here from Clone Contig Assembly:
1) In Clone Contig Assembly, mapping is done first, a minimum set of YACs or BACs is determined, and then each of these YACs or BACs is subjected to Shotgun Sequencing.
2) Here, in Directed Shotgun Assembly, the entire genome is subjected to Shotgun Sequencing via the two smaller insert libraries, the 2 kb and 10 kb insert libraries.

 

These methodologies yield then the ultimate Physical Map of the Genome: its DNA Sequence

This DNA Sequence is then the ultimate foundation information about the organism ...
with this information, one knows completely the enzymatic and molecular capabilities of the organism ... what it can do ... and what it can not do ...





| BIMM100 | Syllabus | Sections / Off Hrs | Grading Policy | DNASYSTEM |
| Lectures | Journal Articles | Study Qs | Lab Techniques | Exams |

 

If you have problems or comments, send email to Doug Smith