next up previous contents
Next: Notes Up: Algorithms Previous: Algorithms   Contents

Overview

First the sequences are aligned with code2aln (Stocsits 2003). This sequence alignment programme tries to keep gaps in groups of three in CDSs and smoothly joins coding regions onto non-coding regions.

Next an unrooted phylogenetic tree is made with ednaml (maximum likelihood) or ednapars (maximum parsimony). These programmes are part of the PHYLIP package (Felsenstein 2004) repackaged into the EMBASSY package in EMBOSS (Rice et al. 2000). The tree is used to select a list of sequence pairs tracing round the outside of the tree (Figure 2). Conservation scores are calculated with mlrgd for each pair and summed over the tree. Note that this set of pairwise comparisons covers each branch of the tree precisely twice - hence no branch is given more weight than another. In general, the set of pairs selected in this way is not unique - since branches of the tree may be flipped into different places without changing the phylogeny. Note that these default trees just use a simple non-coding evolutionary model. However, if desired, the user may input their own list of pairs.

Figure 2: Example phylogenetic tree. For this tree, the sequence pairs used by mlrgd would be sequence 1 - sequence 2, sequence 2 - sequence 3, sequence 3 - sequence 4, sequence 4 - sequence 5 and sequence 5 - sequence 6.

For each sequence pair $S_1$-$S_2$, mlrgd finds the best-fitting sequence divergence $t$ and, optionally, the best-fitting synonymous:nonsynonymous weighting $V$ (see $\S$12.3). With these $t$ and $V$ values mlrgd calculates the expected number of mutations at each nucleotide in $S_1$ (a number between 0 and 1), and the observed number of mutations (either 0 or 1). These values are summed over all pairs to give the conservation plots.


next up previous contents
Next: Notes Up: Algorithms Previous: Algorithms   Contents
aef 2007-12-10