next up previous contents
Next: References Up: Substitution matrices Previous: Codon usage table   Contents

Amino acid substitution matrix

In our model, the probability that a nucleotide mutation occurs at the DNA level and the probability that the mutation is accepted (i.e. is functional) at the protein level are separated into the nucleotide and amino acid matrices. In contrast, the widely used BLOSUM (Henikoff & Henikoff 1992) and PAM (Dayhoff et al. 1978) matrices incorporate both effects into one matrix. In the PAM matrices, the small-$t$ amino acid substitution frequencies are extrapolated to larger $t$. This is a serious short-coming since, in reality, at small $t$ a mutating sequence is constrained to resemble the original sequence at both the nucleotide and amino acid levels, whereas at large $t$ a mutating sequence is only constrained to resemble the original at the amino acid level. On the other hand, the BLOSUM matrices are calculated, in effect, for a series of $t$ values: BLOSUM100, BLOSUM95, ... BLOSUM35, with the lower indices corresponding to more divergent sequences. By choosing a low-index BLOSUM matrix (viz. BLOSUM40) as our default amino acid distance matrix $\mathbf{A}$, we minimize the effect of the nucleotide mutation constraint relative to the amino acid acceptability constraint.

We use the scaled observed frequencies (Henikoff & Henikoff $\frac{q_{ij}}{e_{ij}}$ values) rather than log odds scores, and treat $\frac{q_{ij}}{e_{ij}}/\frac{q_{ii}}{e_{ii}}$ as the probability of acceptance for the amino acid substitution $X_i \rightarrow X_j$ relative to $X_i \rightarrow X_i$ which is unity. The $V$ parameter ($\S$12.3) scales the off-diagonal terms of $\mathbf{A}$ relative to the diagonal terms, with the default value $V
= 1$ giving the original BLOSUM40 matrix. Stop codons are also included, with the acceptabilities for mutations between stops and non-stops set to zero.


next up previous contents
Next: References Up: Substitution matrices Previous: Codon usage table   Contents
aef 2007-12-10