**Note:**This page is a work in progress; currently only the Poisson distribution option works.

Version 2008-01-20/06:37:22.

The programme is introduced in Wayne M. Patrick, Andrew E. Firth, Jonathan M. Blackburn, 2003, User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries, *Protein Engineering*, **16**, 451-457, and Andrew E. Firth, Wayne M. Patrick, 2005, Statistics of protein library construction, *Bioinformatics*, **21**, 3314-3315.

Return to library statistics home.

**Problem:** Given a library of *L* sequences, comprising variants of a sequence of *N* nucleotides, into which random point mutations have been introduced, we wish to calculate the expected number of distinct sequences in the library. (Typically assuming *L* > 10, *N* > 5, and the mean number of mutations per sequence *m* < 0.1 x *N*).

Click here for a worked example.

Click here for some caveats.

**See also:**

- Plot and tabulate more detailed statistics (e.g. the expected number of sequences, expected number of distinct sequences, and number of possible sequences, with exactly 1, 2, 3, ... mutations).
- Calculate and plot the expected number of distinct sequences in a library for a range of mutation rates.
- Calculate and plot the expected number of distinct sequences in a library for a range of library sizes.
- Calculate and plot the expected number of distinct sequences in a library for a range of sequence lengths.