next up previous contents
Next: Amino acid substitution matrix Up: Substitution matrices Previous: Nucleotide substitution matrix   Contents

Codon usage table

As default, we use a null codon usage table (CUT) - i.e. equal codon frequencies. For typical viral genomes, due to the large number of overlapping CDSs and other constrained features, it is not clear that a CUT generated directly from the viral genome will be representative of mutation probabilities. The host species CUT may be more appropriate which, in the case of human viruses, is not strongly biased and so we use a null CUT for simplicity (see also simulations in Firth & Brown 2005). In addition, using a non-null CUT means that four-fold degenerate sites in CDSs are no longer strictly degenerate in terms of substitution probabilities.

More generally a non-null CUT may be incorporated as follows. Suppose we have the codon GGU. Mutations in the 3rd position are synonymous (all code for gly) and their relative frequencies are controlled by the nucleotide mutation matrix $\mathbf{Q}$. However we also wish to preserve codon bias as a sequence mutates. Since we are always working from an initial known amino acid, we must use relative (instead of absolute) codon frequencies but each frequency must be multiplied by the degeneracy of the corresponding amino acid, otherwise, for example, $\mathrm{?UG} \rightarrow \mathrm{CUG}$ (leu) will be a factor of six less probable than $\mathrm{?UG}
\rightarrow \mathrm{AUG}$ (met) simply because there are six codons for leu but only one for met. In addition, codon usage statistics implicitly include any nucleotide bias and, conversely, any nucleotide bias described by $\mathbf{Q}$ will automatically lead to a codon bias. Hence the nucleotide equilibrium frequencies $\pi_j$ must be factored out by dividing each codon usage value by $\pi_i.\pi_j.\pi_k$ where $i, j, k$ are the $i$th, $j$th, $k$th nucleotides in the codon. See Firth & Brown (2005) for scripts to produce appropriate CUTs from standard absolute or relative frequency CUTs.


next up previous contents
Next: Amino acid substitution matrix Up: Substitution matrices Previous: Nucleotide substitution matrix   Contents
aef 2007-12-10