- Download pedel.cxx and pedel.batch.cxx (for calculating the expected number of distinct sequences in an epPCR library).
- Download stats.batch.cxx (for calculating sub-library composition).
- Download the Monte Carlo simulation programmes pedel_mc_run and pedel_mc.cxx.

Return to library statistics home.

Click here for some warnings.

These two programmes calculate the expected number of distinct sequences in an epPCR library. In

Compile the programmes as follows (replace 'gcc' by an appropriate alternative, e.g. 'c++' or 'g++', if you're using a different C++ compiler):

Run the programmes as follows:

where

Currently

line in

This programme outputs to screen (html format) and to

Statistics:

*x*= exact number of mutations per sequence.*Px*= Poisson probability of*x*mutations, given*m*.*Lx*= expected number of sequences in library with exactly*x*mutations.*Vx*= number of possible sequences with exactly*x*mutations.*Cx*= expected number of distinct sequences in the sub-library comprising sequences with exactly*x*mutations.*Cx/Vx*= completeness of sub-library.*Lx - Cx*= number of redundant sequences in sub-library.

Run the programme as follows:

where

The programmes

The programmes are much slower than

The programmes are mainly useful as a sanity check on

There are two programmes:

to make it into an executable.

Run the programme as follows:

where

The programme outputs to screen the mean and standard deviation of the number of distinct sequences per library, and similar statistics for the sub-libraries comprising those sequences with exactly

Statistics for the first simulated library. Columns:

1)

2) number of sequences in the library with exactly

3) expected number for Poisson distribution.

Simulated sequences in the first simulated library. Columns:

1) number of mutations in the sequence,

2) the sequence (0 = unmutated nucleotide; 1,2,3 = mutated nucleotide).

List of the number of distinct sequences in each of the simulated libraries.

Sometimes just running

I thought that the

Current limits are maximum sequence length = 1000 and maximum library size = 100000. You can change these by editing the

lines in

- You must agree to the Terms of Usage before using any of this software.
- If you use this software for publications, please cite Wayne M. Patrick,
Andrew E. Firth and Jonathan M. Blackburn, 2003, User-friendly algorithms
for estimating completeness and diversity in randomized protein-encoding
libraries,
*Protein Engineering*, 16, 451-457**or**Andrew E. Firth and Wayne M. Patrick, 2005, Statistics of protein library construction,*Bioinformatics*, 21, 3314-3315. - If you seem to be getting bizarre results, check that none of the
limitations on
*L*,*N*,*m*etc. have been violated (see the maths notes). - All corrections and notifications of bugs are gratefully received.
- Queries or comments to Andrew Firth (aef24cam.ac.uk).
- AEF gratefully acknowledges funding from the Foundation for Research, Science and Technology, grant number UOOX0304.