obisample: randomly resamples sequence records¶
obisample randomly resamples sequence records with or without replacement.
obisample specific options¶
- 
-s###,--sample-size###¶ Specifies the size of the generated sample.
- without the 
-aoption, sample size is expressed as the exact number of sequence records to be sampled (default: number of sequence records in the input file). - with the 
-aoption, sample size is expressed as a fraction of the sequence record numbers in the input file (expressed as a number between 0 and 1). 
Example:
> obisample -s 1000 seq1.fasta > seq2.fastaSamples randomly 1000 sequence records from the
seq1.fastafile, with replacement, and saves them in theseq2.fastafile.- without the 
 
- 
-a,--approx-sampling¶ Switches the resampling algorithm to an approximative one, useful for large files.
The default algorithm selects exactly the number of sequence records specified with the
-soption. When the-aoption is set, each sequence record has a probability to be selected related to thecountattribute of the sequence record and the-sfraction.Example:
> obisample -s 0.5 -a seq1.fastq > seq2.fastq
Samples randomly half of the sequence records of the
seq1.fastqfile, without replacement, and saves them in theseq2.fastqfile.
- 
-w,--without-replacement¶ - Asks for sampling without replacement.
Example:
> obisample -s 1000 -w seq1.fasta > seq2.fastaSamples randomly 1000 sequence records from the
seq1.fastafile, without replacement (the input file must contain at least 1000 sequences), and saves them in theseq2.fastafile. 
Options to specify input format¶
Restrict the analysis to a sub-part of the input file¶
- 
--skip<N>¶ The N first sequence records of the file are discarded from the analysis and not reported to the output file
- 
--only<N>¶ Only the N next sequence records of the file are analyzed. The following sequences in the file are neither analyzed, neither reported to the output file. This option can be used conjointly with the –skip option.