ecoPCR: in silico PCR

ecoPCR in silico PCR preserves the taxonomic information of the selected sequences, and allows various specified conditions for the in silico amplification.

Additionally to the different options, the command requires two arguments corresponding to the two primers.

References

Bellemain E, Carlsen T, Brochmann C, Coissac E, Taberlet P, Kauserud H (2010) ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases BMC Microbiology, 10, 189.

Ficetola GF, Coissac E, Zundel S, Riaz T, Shehzad W, Bessiere J, Taberlet P, Pompanon F (2010) An in silico approach for the evaluation of DNA barcodes. BMC Genomics, 11, 434.

ecoPCR specific options

-d <filename>

Filename containing the database used for the in silico PCR. The database must be in the ecoPCR format (see obiconvert).

Warning

This option is compulsory.

-e <INTEGER>

Maximum number of errors (mismatches) allowed per primer (default: 0). See example 2 for avoiding errors on the 3’ end of the primers.

-l <INTEGER>

Minimum length of the in silico amplified DNA fragment, excluding primers.

-L <INTEGER>

Maximum length of the in silico amplified DNA fragment, excluding primers.

-r <TAXID>

Only the sequence records corresponding to the taxonomic group identified by its TAXID are considered for the in silico PCR. The TAXID is an integer that can be found either in the NCBI taxonomic database, or using the ecofind program.

-i <TAXID>

The sequences of the taxonomic group identified by its TAXID are not considered for the in silico PCR.

-c

Considers that the sequences of the database are circular (e.g. mitochondrial or chloroplast DNA).

-D <INTEGER>

Keeps the specified number of nucleotides on each side of the in silico amplified sequences, (including the amplified DNA fragment plus the two target sequences of the primers).

-k

Print in the programme output the kingdom of the in silico amplified sequences (default: print the superkingdom).

-m <1|2>

Defines the method used for estimating the Tm (melting temperature) between the primers and their corresponding target sequences (default: 1).

1 SantaLucia method (SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS, 95, 1460-1465).

2 Owczarzy method (Owczarzy R, Vallone PM, Gallo FJ et al. (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217-239).

-a <FLOAT>

Salt concentration used for estimating the Tm (default: 0.05).

-h

Print help.

Output file

The output file contains several columns, with ‘|’ as separator, and describes the properties of the in silico amplified sequences.

column 1: sequence identification in the reference database (= accession number when using EMBL or GenBank for building the reference database)

column 2: length of the original sequence

column 3: scientific name as indicated in the reference database

column 4: taxonomic rank as indicated in the reference database

column 5: taxid of the species

column 6: scientific name of the species

column 7: taxid of the genus

column 8: genus name

column 9: taxid of the family

column 10: family name

column 11: taxid of the super kingdom (or of the kingdom if the -k option is set)

column 12: super kingdom name (or kingdom name if the -k option is set)

column 13: strand (D or R, corresponding to direct or reverse, respectively)

column 14: target sequence of the first primer

column 15: number of mismatches for the first primer

column 16: target sequence of the second primer

column 17: number of mismatches for the second primer

column 18: length of the amplified fragment (excluding primers)

column 19: sequence

column 20: definition

Examples

Example 1:

>  ecoPCR -d mydatabase -e 3 -l 50 -L 500 \
   TCACAGACCTGTTATTGC TYTGTCTGSTTRATTSCG > mysequences.ecopcr

Launches an in silico PCR on mydatabase (see obiconvert for a description of the database format), with a maximum of three mismatches for each primer. The minimum and maximum amplified sequence lengths (excluding primers) are 50 bp and 500 bp, respectively. The primers used are TCACAGACCTGTTATTGC and TYTGTCTGSTTRATTSCG (possibility to use IUPAC codes). They amplify a short portion of the nuclear 18S gene. The results are saved in the mysequence.ecopcr file.

Example 2:

> ecoPCR -d mydatabase -e 2  -l 80 -L 120 -D 50 -r 7742 \
  TTAGATACCCCACTATG#C# TAGAACAGGCTCCTCTA#G# > mysequences.ecopcr

Launches an in silico PCR on mydatabase (see obiconvert for a description of the database format), with a maximum of two mismatches for each primer, but with a perfect match on the last two nucleotides of the 3’ end of each primer (a perfect match can be enforced by adding a ‘#’ after the considered nucleotide). The minimum and maximum amplified sequence lengths (excluding primers) are 80 bp and 120 bp, respectively. The -D option keeps 50 nucleotides on each side of the in silico amplified sequences, (including the amplified DNA fragment plus the two target sequences of the primers). The primers used are TTAGATACCCCACTATGC and TAGAACAGGCTCCTCTAG. They amplify a short portion of the mitochondrial 12S gene. The -r option restricts the search to vertebrates (7742 is the taxid of vertebrates). The results are saved in the mysequence.ecopcr file.

ecoPCR used sequence attributes