Version 2010-07-16/06:00:09.

The programme is introduced in Wayne M. Patrick, Andrew E. Firth, Jonathan M. Blackburn, 2003, User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries, *Protein Engineering*, **16**, 451-457, and Andrew E. Firth, Wayne M. Patrick, 2005, Statistics of protein library construction, *Bioinformatics*, **21**, 3314-3315.

Return to library statistics home.

**Problem:** Given a library of *L* sequences, where each sequence is chosen at random from a set of *V* equiprobable variants, we wish to calculate the expected number of distinct (i.e. unique) sequences represented in the library. Alternatively, given a set of *V* equiprobable variants, we wish to calculate the library size *L* necessary to obtain a given percentage completeness, or to have a given probability of being 100% complete. (Typically assuming *V* >> 1, e.g. V > 10.)

Click here for a worked example.