fastAQ package

Submodules

fastAQ.fastaInfo module

class fastAQ.fastaInfo.FastaParser(fasta_file)[source]

Parses a FASTA file to extract the sequences and header information, if any.

fasta_file : Fasta file to be parsed.

>>> import sys
>>> input_file = sys.argv[1] 
>>> out = FastaParser(input_file)
>>> seqDict = out.sequenceDict()
>>> print len(seqDict.keys())
maskAll(intervals, toLower=False, maskingChar='N')[source]

Masks the sequences in the FASTA file based on the given intervals.

intervals: A list of tuples containing the start and end positions for the masking. toLower: If True, the sequence in the interval is converted to lower case bases.

Default is False.

maskingChar : Masking character. Default is ‘N’.

Masked sequences.

maskSeq(name, interval, toLower=False, maskingChar='N')[source]

Masks the sequence based on the given interval.

name: Name/header of the sequence. interval: A tuple containing the start and end positions for the masking. toLower: If True, the sequence in the interval is converted to lower case bases.

Default is False.

maskingChar : Masking character. Default is ‘N’.

Masked sequence.

readFasta(fastaFile)[source]

Reads and parser the FASTA file.

fastaFile - A FASTA file.

Generator object containing sequences.

reverseComplement(nameSeq)[source]

Compute the reverse complement of a given sequence.

sequence: Name of the sequence whose reverse complement is to be computed.

sequence which is the reverse complement of the input sequence.

reverseComplementAll()[source]

Compute the reverse complements of all the sequences in the given FASTA file.

sequence: Name of the sequence whose reverse complement is to be computed.

Prints the reverse complements.

seqFromName(name)[source]

Extract the sequence corresponding to the given name.

name : Name of the sequence to be retrieved.

Sequence corresponding to the input name.

seqNames()[source]

Names/Headers of all the sequences.

A list of names of all the sequences in the FASTA file.

sequenceDict()[source]

Creates a dictionary of sequences with their header.

A dictionary of sequences.

trimAll(intervals, quality=None)[source]

Trims all the sequence in the FASTA file from both sides based on the intervals.

interval : A list of tuples containing the number of bp’s to be trimmed from left and right side respectively.

Trimmed sequences.

trimSeq(name, interval, quality=None)[source]

Trims the sequence from both sides based on the interval.

name : Name/header of the sequence to be trimmed. interval : The interval containing the number of bp’s to be trimmed from left and right side respectively.

Trimmed sequence.

fastAQ.fastq2fasta module

fastAQ.fastqInfo module

class fastAQ.fastqInfo.FastqParser(fastq_file)[source]

Parses a FASTQ file to extract the sequences and the base qualities.

fasta_file : Fastq file to be parsed.

>>> import sys
>>> input_file = sys.argv[1] 
>>> out = FastqParser(input_file)
>>> seqDict = out.sequenceDict()
>>> print len(seqDict.keys())
baseQualities()[source]

Creates a dictionary of base qualities of the sequences.

A dictionary of base qualities.

maskAll(intervals, toLower=False, maskingChar='N')[source]

Masks the sequences in the FASTA file based on the given intervals.

intervals: A list of tuples containing the start and end positions for the masking. toLower: If True, the sequence in the interval is converted to lower case bases.

Default is False.

maskingChar : Masking character. Default is ‘N’.

Masked sequences.

maskSeq(name, interval, toLower=False, maskingChar='N')[source]

Masks the sequence based on the given interval.

name: Name/header of the sequence. interval: A tuple containing the start and end positions for the masking. toLower: If True, the sequence in the interval is converted to lower case bases.

Default is False.

maskingChar : Masking character. Default is ‘N’.

Masked sequence.

readFastq(fastqFile)[source]

Reads and parser the FASTQ file.

fastqFile - A FASTQ file.

Generator object containing sequences.

reverseComplement(nameSeq)[source]

Compute the reverse complement of a given sequence.

sequence: Name of the sequence whose reverse complement is to be computed.

sequence which is the reverse complement of the input sequence.

reverseComplementAll()[source]

Compute the reverse complements of all the sequences in the given FASTA file.

sequence: Name of the sequence whose reverse complement is to be computed.

Prints the reverse complements.

seqNames()[source]

Names/Headers of all the sequences.

A list of names of all the sequences in the FASTQ file.

sequenceDict()[source]

Creates a dictionary of sequences with their header.

A dictionary of sequences.

trimAll(qualityCutOff=0, byInterval=False, intervals=None, mott=False, limitValue=None)[source]

Trims all the sequence in the FASTA file from both sides based on the intervals.

interval : A list of tuples containing the number of bp’s to be trimmed from left and right side respectively.

Trimmed sequences.

trimSeq(name, qualityCutOff=0, byInterval=False, interval=None, mott=False, limitValue=None)[source]

Trims the sequence.

name : Name/header of the sequence to be trimmed. qualityCutOff : Threshold value of the quality for trimming sequence based on removing low quality bases. byInterval : If True, the sequence will be trimmed by removing bases according to the given interval. interval : The interval containing the number of bp’s to be trimmed from left and right side respectively.

Need byInterval to be True.

mott : If True, the sequence will be trimmed according to the Mott’s algorithm. limitValue : Numerical value of the limit to be used in Mott’s algorithm.

Requires mott to be True.

Trimmed sequence.

fastAQ.sequenceOperations module

class fastAQ.sequenceOperations.SequenceManipulation(input_sequence)[source]

Edits/Modifies a DNA sequence.

input_sequence : A nucleotide sequence.

maskSequence(interval, toLower=False, maskingChar='N')[source]

Masks the sequence based on the interval.

interval : A tuple containing the start and end positions for the masking. toLower: If True, the sequence in the interval is converted to lower case bases.

Default is False.

maskingChar : Masking character. Default is ‘N’.

Masked sequence.

reverseComplement()[source]

Compute the reverse complement of a given sequence.

sequence which is the reverse complement of the input sequence.

fastAQ.trimming module

class fastAQ.trimming.Trimming(sequence, qualities)[source]

Trimming a FASTQ sequence.

sequence : The sequence to be trimmed. qualities : Base qualities of the bp’s in the sequence.

lowQualTrim(qualCutOff)[source]

Trims a sequence by removing low quality bp’s.

qualCutOff : The threshold for the quality. The bases with quality below this threshold
will be removed.

Trimmed sequence.

mott(limitValue)[source]

Trims a sequence by using Mott’s algorithm.

limitValue : limiting value.

Trimmed sequence.

trimSequence(interval)[source]

Trims the sequence from both sides based on the interval.

interval : The interval containing the number of bp’s to be trimmed from left and right side respectively.

Trimmed sequence.

Module contents