Version 2010-07-16/06:00:09.
The programme is introduced in Wayne M. Patrick, Andrew E. Firth, Jonathan M. Blackburn, 2003, User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries, Protein Engineering, 16, 451-457, and Andrew E. Firth, Wayne M. Patrick, 2005, Statistics of protein library construction, Bioinformatics, 21, 3314-3315.
Return to library statistics home.
Problem: Given a library of L sequences, where each sequence is chosen at random from a set of V equiprobable variants, we wish to calculate the expected number of distinct (i.e. unique) sequences represented in the library. Alternatively, given a set of V equiprobable variants, we wish to calculate the library size L necessary to obtain a given percentage completeness, or to have a given probability of being 100% complete. (Typically assuming V >> 1, e.g. V > 10.)
Click here for a worked example.