obicut: trims sequences¶
obicut is a command that trims sequence objects based on two integer 
values: the -b option gives the first position of the sequence to be kept, 
and the -e option gives the last position to be kept. Both values can be 
computed using a python expression.
Example:
> obicut -b 50 -e seq_length seq1.fasta > seq2.fastaKeeps only the sequence part from the fiftieth position to the end.
Example:
> obicut -b 50 -e seq_length-50 seq1.fasta > seq2.fastaTrims the first and last 50 nucleotides of the sequence object.
obicut specific options¶
- 
-b<INTEGER>,--begin=<INTEGER>¶ Integer value (possibly calculated using a python expression) indicating the first position of the sequence to be kept.
- 
-e<INTEGER>,--end=<INTEGER>¶ Integer value (possibly calculated using a python expression) indicating the last position of the sequence to be kept.
Sequence record selection options¶
- 
-s<REGULAR_PATTERN>,--sequence=<REGULAR_PATTERN>¶ - Regular expression pattern to be tested against the sequence itself. The pattern is case insensitive.
Examples:
> obigrep -s 'GAATTC' seq1.fasta > seq2.fastaSelects only the sequence records that contain an EcoRI restriction site.
> obigrep -s 'A{10,}' seq1.fasta > seq2.fastaSelects only the sequence records that contain a stretch of at least 10
A.> obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fastaSelects only the sequence records that do not contain ambiguous nucleotides.
 
- 
-D<REGULAR_PATTERN>,--definition=<REGULAR_PATTERN>¶ - Regular expression pattern to be tested against the definition of the sequence record. The pattern is case sensitive.
Example:
> obigrep -D '[Cc]hloroplast' seq1.fasta > seq2.fastaSelects only the sequence records whose definition contains
chloroplastorChloroplast. 
- 
-I<REGULAR_PATTERN>,--identifier=<REGULAR_PATTERN>¶ - Regular expression pattern to be tested against the identifier of the sequence record. The pattern is case sensitive.
Example:
> obigrep -I '^GH' seq1.fasta > seq2.fastaSelects only the sequence records whose identifier begins with
GH. 
- 
--id-list=<FILENAME>¶ <FILENAME>points to a text file containing the list of sequence record identifiers to be selected. The file format consists in a single identifier per line.Example:
> obigrep --id-list=my_id_list.txt seq1.fasta > seq2.fastaSelects only the sequence records whose identifier is present in the
my_id_list.txtfile.
- 
-a<KEY>:<REGULAR_PATTERN>,¶ 
- 
--attribute=<KEY>:<REGULAR_PATTERN>¶ - Regular expression pattern matched against the attributes of the sequence record. the value of this attribute is of the form : key:regular_pattern. The pattern is case sensitive. Several
-aoptions can be used on the same command line and in this last case, the selected sequence records will match all constraints.Example:
> obigrep -a 'family_name:Asteraceae' seq1.fasta > seq2.fastaSelects the sequence records containing an attribute whose key is
family_nameand value isAsteraceae. 
- 
-A<ATTRIBUTE_NAME>,--has-attribute=<KEY>¶ - Selects sequence records having an attribute whose key = <KEY>.
Example:
> obigrep -A taxid seq1.fasta > seq2.fasta
Selects only the sequence records having a taxid attribute defined.
 
- 
-p<PYTHON_EXPRESSION>,--predicat=<PYTHON_EXPRESSION>¶ - Python boolean expression to be evaluated for each sequence record. The attribute keys defined for each sequence record can be used in the expression as variable names. An extra variable named ‘sequence’ refers to the sequence record itself. Several -p options can be used on the same command line and in this last case, the selected sequence records will match all constraints.
Example:
> obigrep -p '(forward_error<2) and (reverse_error<2)' \ seq1.fasta > seq2.fasta
Selects only the sequence records whose
forward_errorandreverse_errorattributes have a value smaller than two. 
- 
-L<##>,--lmax=<##>¶ - Keeps sequence records whose sequence length is equal or shorter than
lmax.Example:
> obigrep -L 100 seq1.fasta > seq2.fastaSelects only the sequence records that have a sequence length equal or shorter than 100bp.
 
- 
-l<##>,--lmin=<##>¶ - Selects sequence records whose sequence length is equal or longer than
lmin.Examples:
> obigrep -l 100 seq1.fasta > seq2.fastaSelects only the sequence records that have a sequence length equal or longer than 100bp.
 
- 
-v,--inverse-match¶ - Inverts the sequence record selection.
Examples:
> obigrep -v -l 100 seq1.fasta > seq2.fastaSelects only the sequence records that have a sequence length shorter than 100bp.