ecoPrimers
: new barcode markers and primers¶
Authors: Eric Coissac <eric.coissac@metabarcoding.org> and Tiayyba Riaz <tiayyba.riaz@metabarcoding.org>
ecoPrimers
designs the most efficient barcode markers and primers, based
on a set of reference sequence records, and according to specified parameters.
Reference¶
Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E (2011) ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Research, 39, e145.
ecoPrimers
specific options¶
-d
<filename>
¶Filename containing the reference sequence records used for designing the barcode markers and primers (see obiconvert for a description of the database format).
Warning
This option is compulsory.
-e
<INTEGER>
¶Maximum number of errors (mismatches) allowed per primer (default: 0).
-l
<INTEGER>
¶Minimum length of the barcode, excluding primers.
-L
<INTEGER>
¶Maximum length of the barcode, excluding primers.
-r
<TAXID>
¶Defines the example sequence records (example dataset). Only the sequences of the corresponding taxonomic group identified by its
TAXID
are taken into account for designing the barcodes and the primers. TheTAXID
is an integer that can be found either in the NCBI taxonomic database, or using the ecofind program.
-i
<TAXID>
¶Defines the counterexample sequence records (counterexample dataset). The barcodes and primers will be selected in order to avoid the counterexample taxonomic group identified by its
TAXID
.
-E
<TAXID>
¶Defines an counterexample taxonomic group (identified by its
TAXID
) within the example dataset.
-c
¶
Considers that the sequences of the database are circular (e.g. mitochondrial or chloroplast DNA).
-3
<INTEGER>
¶Defines the number of nucleotides on the 3’ end of the primers that must have a strict match with their target sequences.
-q
<FLOAT>
¶Defines the strict matching quorum, i.e. the proportion of the sequence records in which a strict match between the primers and their targets occurs (default: 0.7)
-s
<FLOAT>
¶Defines the sensitivity quorum, i.e. the proportion of the example sequence records that must fulfill the specified parameters for designing the barcodes and the primers.
-x
<FLOAT>
¶Defines the false positive quorum, i.e. the maximum proportion of the counterexample sequence records that fulfill the specified parameters for designing the barcodes and the primers.
-t
<TAXONOMIC_LEVEL>
¶Defines the taxonomic level that is considered for evaluating the barcodes and primers in the output of
ecoPrimers
. The default taxonomic level is the species level. When using a taxonomic database builts from a NCBI taxonomy dump files, the other possible taxonomic levels are genus, family, order, class, phylum, kingdom, and superkingdom.
-D
¶
Sets the double strand mode.
-S
¶
Sets the single strand mode.
-O
<INTEGER>
¶Sets the primer length (default: 18).
-m
<1|2>
¶Defines the method used for estimating the Tm (melting temperature) between the primers and their corresponding target sequences (default: 1).
1 SantaLucia method (SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS, 95, 1460-1465).
2 Owczarzy method (Owczarzy R, Vallone PM, Gallo FJ et al. (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217-239).
-a
<FLOAT>
¶Salt concentration used for estimating the Tm (default: 0.05).
-U
¶
No multi match of a primer on the same sequence record.
-R
<TEXT>
¶Defines the reference sequence by indicating its identifier in the database.
-A
¶
Prints the list of all identifiers of sequence records in the database.
-f
¶
Remove data mining step during strict primer identification.
-v
¶
Stores statistic file about memory usage during strict primer identification.
-h
¶
Print help.
Output file¶
The output file contains several columns, with ‘|’ as separator, and describes the characteristics of each barcode and its associated primers.
column 1: serial number
column 2: sequence of primer 1
column 3: sequence of primer 2
column 4: Tm (melting temperature) of primer 1, without mismatch
column 5: lowest Tm of primer 1 against example sequence records
column 6: Tm of primer 2, without mismatch
column 7: lowest Tm of primer 2 against example sequence records
column 8: number of C or G in primer 1
column 9: number of C or G in primer 2
- column 10: GG (Good-Good) means that both primer are specific to the example dataset,
- GB or BG (Good-Bad or Bad-Good) means that only one of the two primers is specific to the example dataset
column 11: number of sequence records of the example dataset that are properly amplified according to the specified parameters
column 12: proportion of sequence records of the example dataset that are properly amplified according to the specified parameters
column 13: yule-like output
column 14: number of taxa of the example dataset that are properly amplified according to the specified parameters
column 15: number of taxa of the counterexample dataset that are properly amplified according to the specified parameters
column 16: proportion of taxa of the example dataset that are properly amplified according to the specified parameters (Bc index)
column 17: number of taxa of the example dataset that are properly identified
column 18: proportion of taxa of the example dataset that are properly identified (Bs index)
column 19: minimum length of the barcode in base pairs for the example sequence records (excluding primers)
column 20: maximum length of the barcode in base pairs for the example sequence records (excluding primers)
column 21: average length of the barcode in base pairs for the example sequence records(excluding primers)
Examples¶
Example 1:
> ecoPrimers -d mydatabase -e 3 -l 50 \ -L 800 -r 2759 -3 2 > mybarcodes.ecoprimersLaunches a search for barcodes and corresponding primers on mydatabase (see obiconvert for a description of the database format), with a maximum of three mismatches for each primer. The minimum and maximum barcode lengths (excluding primers) are 50 bp and 800 bp, respectively. The search is restricted to the taxonomic group identified by its taxid (2759 corresponds to the Diatoma). The two last Nucleotides on the 3’ end of the primers must have a perfect match with their target sequences. The results are saved in the mybarcodes.ecoprimers file.
Example 2:
> ecoPrimers -d mydatabase -e 2 -l 30 -L 120 \ -r 7742 - i 2 -E 9604 -3 2 > mybarcodes.ecoprimersLaunches a search for barcodes and corresponding primers on mydatabase (see obiconvert for a description of the database format), with a maximum of two mismatches for each primer. The minimum and maximum barcode lengths (excluding primers) are 30 bp and 120 bp, respectively. The search is restricted to the Vertebrates, excluding Bacteria and Hominidae (7742, 2, and 9604 corresponds to the TAXID of Vertebrates, Bacteria, and Hominidae, respectively. The two last nucleotides on the 3’ end of the primers must have a perfect match with their target sequences. The results are saved in the mybarcodes.ecoprimers file.