CDS-plotcon: Programme for Detecting Enhanced Conservation in Coding Sequences

Note: This is old unpublished software that is available for legacy purposes, but should generally be avoided. Try SynPlot2 instead.

Summary: This is a suite of software for producing conservation plots for an input group of homologous sequences (either aligned or unaligned). The novel aspect of this software is that a 'null model' of the 'expected' sequence evolution in non-coding, single-coding or multiply-coding regions (as appropriate, given the input sequence annotation) is compared with the observed conservation. The basic output plot is a sliding window p-value plot (user-defined window size), giving the probability that the conservation in the window would be as great or greater than that observed, if the 'null model' was true. The output conservation plots can be used to identify 'unusually' conserved regions. Conservation plots are produced for 'all sites' and for '4-fold degenerate sites'. Comparing these plots can help to distinguish regions that are unusually conserved due to constraints on the encoded amino acids from regions that are unusually conserved due to constraints on the primary sequence (e.g. regulatory regions).

You can enter your sequences into the online form or download the programmes to run locally.

Please use the following login details if requested. Note that these will only allow access to public parts of this site. If you get an 'access denied' error then you are probably trying to access a non-public part. Please contact me (aef24at signcam.ac.uk). jpeg of user/passwd

Return to my homepage.



Summary:

A powerful technique for locating functional elements in genomes is to look for conserved columns in multiple sequence alignments. However it is difficult to use this method to detect additional functional elements within protein-coding sequences (CDSs), since many columns in CDSs show conservation due to constraints on the encoded protein. It is possible to look for conserved columns at four-fold degenerate sites (some, but not all, third nucleotide positions in codons), but this leaves out information from at least two thirds of columns and is much more difficult within overlapping genes (common in viruses).

The software package CDS-plotcon is specifically designed to search for conserved functional elements within CDSs. It uses an average model of the expected mutation patterns within CDSs (incorporating a nucleotide mutation matrix, amino acid substitution matrix, sequence divergence parameter t, mean synonymous:nonsynonymous substitution ratio V and phylogenetic tree; it can handle up to three overlapping CDSs in different read-frames). Using this, it calculates the expected number of mutations across the alignment in each column and compares this with the observed number of mutations. The results are plotted along the genome, and optionally passed through a sliding window (clipped) mean filter (output files; example plot).

Particularly conserved regions may indicate non-coding functional elements, new coding CDSs, or more-conserved regions within proteins (e.g. motifs). The software also produces conservation plots for four-fold degenerate sites, that may be used to help distinguish these alternatives. CDS-plotcon should also be used in conjunction with complementary programmes (e.g. RNA structure prediction programmes).

As well as running the core conservation-calculating programme, the package also aligns the input sequences (with code2aln), calculates a phylogenetic tree (with PHYLIP) and produces conservation-score plots. The user may alter many parameters, including parameters for fitting t and V, running mean window sizes and clipping levels, whether the genome is circular or not and sequence range to analyse.

CDS-plotcon is particularly useful for analysing virus genomes where (sometimes multiple) CDSs overlapping non-coding conserved features are common, and many sequenced genomes with a reasonable range of divergences are often available.



Notes: