Statistics of randomized library construction

We have investigated the statistics associated with constructing and sampling large, randomized protein-encoding libraries. Using fairly simple statistics we have written a number of algorithms for estimating the diversity in libraries generated by the most commonly-used randomization methods (see below). These are available through a web interface or by downloading the source code:

Web interface Programmes and information Brief Description
GLUE GLUE Broadly applicable to any randomization technique where all (DNA) daughter variants are equally likely: oligonucleotide-directed randomization, site-saturation mutagenesis, MAX randomization, synthetic shuffling, etc.
GLUE-IT- GLUE-Including Translation: estimate amino acid diversity in libraries with up to 6 variable codons.
Calculate amino acids encoded by a user-specified, partially- or fully-randomized codon.
Find optimal XYZ codons to encode selected amino acids.
PEDEL PEDEL Programme for Estimating Diversity in Error-prone PCR Libraries.
PEDEL-AA - Estimate the number of distinct protein variants encoded by an error-prone PCR library.
DRIVeR DRIVeR Diversity Resulting from In Vitro Recombination (i.e. DNA shuffling and StEP PCR).

Mathematical notes for GLUE, PEDEL and DRIVeR: pdf, ps.
Notes on the PEDEL-AA algorithms: html.

References: Return to my homepage.