- Download driver.cxx and
driver-batch.cxx (for calculating the
expected number of distinct sequences in a library constructed by
*in vitro*recombination of two highly homologous sequences). - Download the Monte Carlo simulation programme driver_mc.cxx.

Return to library statistics home.

Click here for some warnings.

These programmes are for calculating the expected number of distinct sequences in a library generated by random crossovers between two near-identical sequences. In

Note that you may, more or less, consider your sequence either as a sequence of nucleotides with a few variable nucleotides or as a sequence of codons with a few variable codons.

Compile the programmes as follows (replace 'gcc' by an appropriate alternative, e.g. 'c++' or 'g++', if you're using a different C++ compiler):

Before running the programmes, you will need to make a file listing the variable positions. The first line lists the number of variable positions. The remaining lines list the positions. These must be in numerical order. Click here for an example position file.

Run the programmes as follows:

where

1) coordinates of each interval between variable positions,

2) length of the interval,

3) the mean expected number of crossovers in the interval,

4) the probability for an even number of crossovers in the interval,

5) the probability for an odd number of crossovers in the interval.

1) true mean number of crossovers per sequence,

2) observed mean number of crossovers per sequence,

3-12) expected number of distinct sequences for different library sizes.

The library sizes (columns) range from

Currently the maximum number of variable positions is limited to 20 (in

lines in the programmes, and recompiling. Note that

If you get a

This programme does a full Monte Carlo simulation for the DRIVeR scenario. It may be useful for checking the analytic calculations used in

Compile the programme as follows (replace 'gcc' by an appropriate alternative, e.g. 'c++' or 'g++', if you're using a different C++ compiler):

Before running the programme, you will need to make a file listing the variable positions. The first line lists the number of variable positions. The remaining lines list the positions. These must be in numerical order. Click here for an example position file.

Run the programme as follows:

where

The programme outputs to screen the mean and standard deviation of the number of distinct daughter sequences per simulated library. For the final simulated library only, the programme outputs to the file

The

Current limits are maximum number of simulated libraries = 100000, maximum sequence length = 2000, maximum library size = 1000000, and maximum number of variable positions = 12. You can change these by editing the

lines in

- You must agree to the Terms of Usage before using any of this software.
- If you use this software for publications, please cite Wayne M. Patrick,
Andrew E. Firth and Jonathan M. Blackburn, 2003, User-friendly algorithms
for estimating completeness and diversity in randomized protein-encoding
libraries,
*Protein Engineering*, 16, 451-457**or**Andrew E. Firth and Wayne M. Patrick, 2005, Statistics of protein library construction,*Bioinformatics*, 21, 3314-3315. - If you seem to be getting bizarre results, check that none of the
limitations on
*L*,*N*,*m*etc. have been violated (see the maths notes). - All corrections and notifications of bugs are gratefully received.
- Queries or comments to Andrew Firth (aef24cam.ac.uk).
- AEF gratefully acknowledges funding from the Foundation for Research, Science and Technology, grant number UOOX0304.