| M W F; 10:10 - 11:00 pm |
Molecular Biology |
Douglas W. Smith |
| York 2722 |
BIMM 100 |
5254 Muir Biology Building |
| Fall, 2000 |
x42620; dsmith@ucsd.edu |
| BIMM100 | Syllabus
| Sections / Off Hrs | Grading
Policy | DNASYSTEM
|
| Lectures | Journal
Articles | Study Qs | Lab
Techniques | Exams |
Readings: Brown, 15: 392 - 412
Outline:
Genomes change via Mutation and Recombination
But ... how can one learn about the past? and how evolution has
taken place?
One has only the present ... plus some paleobiology evidence
Answer: genome comparisons ... topic of Phylogenetics
...
A. Molecular Phylogenetics
Initial efforts at Classification of Species based on Morphological Characters
Character States: all the possibilities or values found in different species individuals
Phylogeny: this Classification provided not only similarities between species but also evolutionary relationships.
More recently: Molecular
Characters ...
Types of Molecular Characters:
1) Immunological tests ... used since 1905 ... quantitative,
but relatively few characters
2) DNA sequence ... early: hybridization assays, eg Cot
curves
3) Protein sequence ... early: electrophoretic properties,
eg normal vs Sickle Cell hemoglobin
With development of, first, Protein sequencing procedures, and, then, DNA sequencing procedures ... and most recently: whole genome sequencing ... comes:
Complete Genome Comparisons !!
B. DNA and Protein Sequences:
Advantages to DNA or Protein
sequences:
1) well-defined and unambiguous Character States: the 4 or 20 residues
2) easy quantitation: use of Mathematical Analyses and Statistical
Tests
3) abundant and "complete" information: the complete
Genetic Information, and all products of Gene Expression, can
be analysed ... know also what is not present ...
DNA vs Protein Sequence information:
1) DNA advantages:
a. in coding regions,
synonymous changes (same amino acid) yield DNA Sequence changes
...
but not Protein Sequence changes ... [Brown, Fig 15.2]
b. in both noncoding AND coding regions, have RFLPs, SNPs, SSLPs,
...
2) Protein advantages:
a. Alphabet of 20 amino
acids vs alphabet of 4 nucleotides for DNA: 5% background, vs
25% background ...
b. Amino acid properties essential to properties of the resulting
protein ... DNA is structurally unrelated to properties of resulting
protein ... such properties critically important in evolutionary
processes ...
c. Evolutionary processes related to domain duplication or shuffling,
to evolution of new genes, to 3D structure limitations ... all
are related to Protein Sequence directly, not directly to DNA
C. Background and Terminology:
1) Phenetics vs Cladistics:
... [Brown, Box 15.1]
Phenetics: examine
as many character states as possible ...
Cladistics: do the same as Phenetics, but recognize that
some Characters are more important than others ...
Both Phenetics and Cladistics use quantitation and mathematical rigor to define these concepts ...
Example: Comparison of Protein
Sequence for a given Protein, eg Cytochrome C
1. Characters: each
amino acid at each position ... possibility of 1 of 20 at each
position ... or Gap ... Phenetics and Cladistics would consider
all of these ...
2. Unequal value to different Amino Acids:
... Phenetics would place the same value on each amino acid at
each position ...
... Cladistics recognizes that some amino acids, and some positions,
are of greater importance than others, eg Tryptophan and Cysteine
are in general more important than Valine or Alanine ...
2) Trees: ... [Brown, Fig
15.1, 15.3]
Used to define the
similarity and evolutionary relationships between species ...
Usually has Branching Pattern in one dimension, vs Time
in the other of 2-dimensions ...
Terms: ... [Brown, Fig 15.3]
Internal nodes
Branches
External nodes ... or OTU (Operational Taxonomic Unit)
Rooted tree vs NonRooted tree
Outgroup ... [Brown, Fig 15.4]
3) Orthologues: genes, or proteins encoded by genes,
descended from a common ancestor
Thus, external nodes of a given tree with a root
represent orthologues ...
D. Multiple Sequence Alignments lead to Trees
1. Multiple Sequence Alignments:
Align DNA or Protein sequences
using a Scoring Matrix to correlate Characters
Assign penalties for relative Gaps or relative Insertion/Deletions:
InDels
Given a Scoring Matrix, rigorous
mathematical theory exists for Sequence Alignment: Dynamic
Programming ... (Needleman and Wunsch)
Dynamic Programming is however computationally intensive ...
usually use approximation methods: BLAST or FASTA
These are mathematically rigorous as long as there are no Indels ...
But even with Indels one gets good results ...
Indels often occur in loop
regions in Proteins ... which often have little to do with the
function of the protein ... and hence are of less importance in
determination of evolutionary relationships ...
As a result: one often is concerned mainly with highly conserved
regions having few if any indels ... these highly conserved
regions in Proteins are called Blocks ...
A Dot Matrix representation
is excellent for display of:
1) the main region of similarity between two sequences
2) direct repeat regions between two sequences
3) inverted repeat regions between two sequences
... [Brown, Fig 15.9]
The Scoring Matrix is
used to quantitate the Degree of Similarity between the
Sequences.
BUT how does one proceed from Degree of Similarity to determining
if the two Sequences are in fact Homologues of each other ???
Similarity: quantitative measure of how close the
sequences are to each other.
Homology: two sequences are homologues if they share a
commone ancestor
Thus: Similarity is a quantitative measure ... two sequences
can be 30% similar to each other
But: Homology is a qualitative measure ... two sequences
are either Homologues ... or they aren't
In other words: Homology is
like Pregnancy ... you either are ... or you are not ...
One is never 65% pregnant ... so to: two sequences are not 65%
homologous to each other ...
2. Trees from Multiple Sequence Alignments:
a. Distance Matrix:
The Scoring Matrix provides a score at each position of the alignment between each pair of Sequences, and addition of these positional scores across the entire Pairwise Alignment provides a Score between each pair of Sequences.
This Score is inversely related to the Evolutionary Distance between each pair of Sequences, and hence can be converted into a relative Evolutionary Distance. The totality of these relative Evolutionary Distances gives a matrix of distances between each pair of sequences; this matrix is sometimes called a Distance Matrix.
A simplified way of determining a Distance Matrix is shown in Brown, Fig 15.10.
b. Unrooted Tree:
Once the Distance Matrix is determined, one of several algorithms can be used to determine the Tree Topology (the way the tree looks; the Branches and Internal Nodes), and the lengths of each of the Branches of the Tree.
An example of such an algorithm is the Neighbor Joining algorithm, a simplified example of such being shown in Brown, Fig 15.11.
c. Rooted vs Unrooted Tree:
The Tree so determined is an
unrooted tree; for example, see: ... [Brown, Fig 15.3A]
That is, no information is available from the Multiple Sequence
Alignment concerning which sequence is the most ancestral sequence,
ie the least related to all other sequences.
If one has an Outgroup,
then this Outgroup sequence can be used to determine a Rooted
Tree
An Outgroup sequence is one which is known from additional
outside information to be the least related to all other
sequences; for example, see: ... [Brown, Fig 15.4]
3. Quality of the Tree - Bootstrapping
One is often concerned with
how reliable the resulting tree really is ...
How good are all the Internal Nodes? How reliable are the
Branch Lengths?
Related questions are:
How robust is the tree?
How different would the sequences need to be to generate alternative
trees?
Would these alternative trees have the same topology (same
structure of internal nodes) and just have different branch lengths?
Or would these alternative trees have different topologies, indicating
a different scheme of relatedness among the sequences?
One measure of this reliability
is the Bootstrap process;
for example, see: ... [Brown, Fig 15.12]
In a Bootstrap, one creates a new Multiple Sequence Alignment from the original by just rearranging complete columns of the Multiple Sequence Alignment ... this maintains the original alignment, but gives a different order of columns to be considered by the Tree generating programs ...
There are many ways of generating these new Alignments, although usually the resulting new Alignment has the same number of columns as the original, ie the sequences maintain their original length (this is NOT true for Brown, Fig 15.12, where the new alignment is HALF as long as the original ...)
In practice, this Bootstrapping
process is repeated many times, eg 1000 times, generating the
new Multiple Sequence Alignments via a random number generator
... and the Trees are compared with each other ...
One calculates how many times out of these 1000 Bootstraps each
of the Internal Nodes of the original Alignment is present in
each of the new trees ... and the Bootstrap results are presented
as these numbers ...
Thus an Internal Node that
appeared in 238 of the Bootstrap trees would get the number 238
...
and an Internal Node that appeared in 998 of the Bootstrap trees
would get the number 998 ...
And one would conclude that the Node that appeared in 998 of the
Bootstrap trees was a robust and reliable Internal Node,
whereas the Internal Node that only came up in 238 of the Bootstrap
trees was a highly questionable Node ...
Nodes of Bootstrap values 500 or more (out of 1000 runs) are usually considered reliable ...
4. Software
Computer software is available
for all of these Comparative Sequence Analysis operations, some
of which is public domain, others of which are available directly
on the Web, and some of which must be purchased. Computer platforms
are usually Macintosh, PC, or Unix computers.
Some of these are mentioned in Brown, Tech Note 15.1
Links to some of these can be found from the Multiple
Sequence Analysis section
of the UCSD DNASYSTEM Web site.
E. Application of Phylogenetics to Human Prehistory
Deduction of Origins of Modern Humans and of Migratory Pathways ...
Genes Analysed:
Must show variability
of appropriate amount ... neither too much (too many multiple
changes at each position) ... nor too little (too many positions
not changed at all ...)
That is, the genes must be Polymorphic in the populations
studied ...
For humans, have three main
possibilities:
1) Multiallelic Genes ... several copies per human, each
of slightly different sequence
2) Microsatellites ... evolution via Replication Slippage
... [Brown, Box 14.4]
3) Mitochondrial DNA ... lacks DNA Repair systems, so Mutations
accumulate rapidly
Other possibilities exist now with the Human Genome Project, eg
RFLPs and SNPs
Mitochondrial DNA changes from
human to human are an example of a Haplotype:
the totality of allelic changes or variations in a group
of closely linked genes, eg those present on mitochondrial
DNA, such that they are all most often inherited together ...
Archeological evidence for
earliest Humans: australopithecus ... [Brown, Fig 14.20]
Homo sapiens evolved from Homo erectus ...
Earliest archeological evidence: in Olduvai Gorge in Kenya, Africa
...
Which Humanoid Species migrated
'Out of Africa' ??
Two theories: ... [Brown,
Fig 15.20]
1) Homo erectus before evolution of Homo sapiens:
multiregional evolution hypothesis
2) Homo erectus and Homo sapiens after
evolution of Homo sapiens from Homo erectus:
Out of Africa hypothesis ... both species
migrated from Africa ...
RFLP analysis of Mitochondrial
DNA:
supportive of Out
of Africa hypothesis ... tree: [Brown, Fig 15.21]
But: controversial ...
1) data actually support alternative trees ... no data
on tree robustness was given ...
2) more extensive, recent data support alternative roots ...
3) nuclear data supports alternative roots ...
Out of Africa hypothesis also
supported by one Neanderthal DNA analysis:
... [Brown, Research Brief 15.1]
Patterns of human migrations into the Old World and into the New World also controversial ... [Brown, Fig 15.22, 15.23, 15.24]
| BIMM100 | Syllabus
| Sections / Off Hrs | Grading
Policy | DNASYSTEM
|
| Lectures | Journal
Articles | Study Qs | Lab
Techniques | Exams |
If you have problems or comments, send email to Doug
Smith