Friday 11 June 2010

Introduction to Phylogeny part 4

Still sourcing from Evolution 2nd edition.

"Alignment must be used to establish the homology of sequences"
Descendants inherit traits through genes. The history of descent is written in the DNA sequences. The molecular sequence is a simple string (character data). Characters are positions in the sequence, the character states are the nucleotides at those positions. This is a simplistic understanding of how to compare DNA, however it assumes that the dna strands are homologous, there are two problems with this:
1) with four nucleotides it is highly probably that two of the nucleotides will be the same just do to mutation.
2) In all but the most conserved genes, random bits of DNA have been inserted and deleted as the species have diverged from the common ancestor. This will cause orthologous gene sequences to differ in length, also nucleotides can be added or removed with in the sequence (base pair mutation).

To make the sequence homology consistent were a deletion has happened there needs to be a gap added and where there has been an insertion there must be a gap added to the other sequence. This is done so that the majority of the rest of the sequence should be homologous. Once the sequence is the same length we can compare nucleotides to scan for changes (mutations). The alignment is done based on assumptions about the ancestral sequence from which the derived sequence is though to descend. Thus alignment depends on assumptions about phylogeny and homology. Though algorithms exist to perform alignment automatically they can be unreliable and as such many alignments are done manually.

Even after a sequence has been align homoplasy is common (sequence that are similar for reasons other than recent shared ancestry). Homoplasy can be reduced but not elimiated by careful selection of character traits. This is one of the reasons that maximum likelihood is preferred as parsimony can be tricked by homoplasy.

"The neutral model and the molecular clock"
Dating of a clade starts with a fossil that approximates the divergence of the branches, many systemic methods assume that molecular mutation happens at about the same rate in each branch. The important part of this assumption is the regular rate of change in the nucleotides.

The fact that nucleotides change isn't controversial but the idea of the molecular clock (the idea that they mutate at the same rate) is controversial. However the idea of he molecular clock is important because it can help to predict the amount of time since the species diverged. The first problem with the nuclear clock is that though rates may be the same with in groups, they are different in other groups (mammals vs bacteria). Secondly generational time lapse is different between species (human vs bacteria). Third, the assumption is to "clock like", it would mean that each branch would have the same number of mutations and branches which is not the case. For these reasons many studies try to avoid using the molecular clock unless forced to do so by a lack of dated fossil evidence.

Using the right molecule for the right job
Because molecules decay at different rates using one suited for the time period you are measuring (uranium to measure things like the earth and moon, vs carbon for archaeological studies, mitochondrial DNA is useful for a few million years but not beyond, for deep time DNA analysis we need highly conserved DNA sequences which can go to 1500 million years).

The genealogy of genes can differ from the phylogeny of species.
But I think that is out of scope for this project.

1 comment:

  1. Your posts are interesting man! Keep up the good work!!!

    ReplyDelete