PEDEL: Programme for Estimating Diversity in Error-prone PCR Libraries

Note:This page is a work in progress; currently only the Poisson distribution option works.

Version 2008-01-20/06:37:22.

The programme is introduced in Wayne M. Patrick, Andrew E. Firth, Jonathan M. Blackburn, 2003, User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries, Protein Engineering, 16, 451-457, and Andrew E. Firth, Wayne M. Patrick, 2005, Statistics of protein library construction, Bioinformatics, 21, 3314-3315.

Return to library statistics home.

Problem: Given a library of L sequences, comprising variants of a sequence of N nucleotides, into which random point mutations have been introduced, we wish to calculate the expected number of distinct sequences in the library. (Typically assuming L > 10, N > 5, and the mean number of mutations per sequence m < 0.1 x N).

Click here for a worked example.

Click here for some caveats.

See also:

Plot and tabulate more detailed statistics (e.g. the expected number of sequences, expected number of distinct sequences, and number of possible sequences, with exactly 1, 2, 3, ... mutations).
Calculate and plot the expected number of distinct sequences in a library for a range of mutation rates.
Calculate and plot the expected number of distinct sequences in a library for a range of library sizes.
Calculate and plot the expected number of distinct sequences in a library for a range of sequence lengths.