Pyteomics documentation v3.4.2

mgf - read and write MS/MS data in Mascot Generic Format

Contents   ::   ms1 - read and write MS/MS data in MS1 format  »

mgf - read and write MS/MS data in Mascot Generic Format

Summary

MGF is a simple human-readable format for MS/MS data. It allows storing MS/MS peak lists and exprimental parameters.

This module provides minimalistic infrastructure for access to data stored in MGF files. The most important function is read(), which reads spectra and related information as saves them into human-readable dicts. Also, common parameters can be read from MGF file header with read_header() function. write() allows creation of MGF files.

Functions

read() - iterate through spectra in MGF file. Data from a single spectrum are converted to a human-readable dict.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

read_header() - get a dict with common parameters for all spectra from the beginning of MGF file.

write() - write an MGF file.


pyteomics.mgf.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:

files : iterable

Iterable of file names or file objects.

pyteomics.mgf.read(*args, **kwargs)[source]

Read an MGF file and return entries iteratively.

Read the specified MGF file, yield spectra one by one. Each ‘spectrum’ is a dict with four keys: ‘m/z array’, ‘intensity array’, ‘charge array’ and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, ‘charge array’ is a masked array (numpy.ma.MaskedArray) of ints, and ‘params’ stores a dict of parameters (keys and values are str, keys corresponding to MGF, lowercased).

Parameters:

source : str or file or None, optional

A file object (or file name) with data in MGF format. Default is None, which means read standard input.

use_header : bool, optional

Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is True.

convert_arrays : one of {0, 1, 2}, optional

If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

read_charges : bool, optional

If True (default), fragment charges are reported. Disabling it improves performance.

dtype : type or str or dict, optional

dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’ and/or ‘charge array’.

Returns:

out : FileReader

pyteomics.mgf.read_header(*args, **kwargs)[source]

Read the specified MGF file, get search parameters specified in the header as a dict, the keys corresponding to MGF format (lowercased).

Parameters:

source : str or file

File name or file object representing an file in MGF format.

Returns:

header : dict

pyteomics.mgf.write(*args, **kwargs)[source]

Create a file in MGF format.

Parameters:

spectra : iterable

A sequence of dictionaries with keys ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ should be sequences of int, float, or str. Strings will be written ‘as is’. The sequences should be of equal length, otherwise excessive values will be ignored.

‘params’ should be a dict with keys corresponding to MGF format. Keys must be strings, they will be uppercased and used as is, without any format consistency tests. Values can be of any type allowing string representation.

‘charge array’ can also be specified.

output : str or file or None, optional

Path or a file-like object open for writing. If an existing file is specified by file name, it will be opened for appending. In this case writing with a header can result in violation of format conventions. Default value is None, which means using standard output.

header : dict or (multiline) str or list of str, optional

In case of a single string or a list of strings, the header will be written ‘as is’. In case of dict, the keys (must be strings) will be uppercased.

fragment_format : str, optional

Format string for m/z, intensity and charge of a fragment. Useful to set the number of decimal places and/or suppress writing charges, e.g.: fragment_format='{:.4f} {:.0f}'. Default is '{} {} {}'.

Note

See the docs for details on writing the format string. If some or all charges are missing, an empty string is substituted instead, so formatting as float or int will raise an exception. Hence it is safer to just use {} for charges.

key_order : list, optional

A list of strings specifying the order in which params will be written in the spectrum header. Unlisted keys will be in arbitrary order. Default is _default_key_order.

Note

This does not affect the order of lines in the global header.

param_formatters : dict, optional

A dict mapping parameter names to functions. Each function must accept two arguments (key and value) and return a string. Default is _default_value_formatters.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

Returns:

output : file

Contents   ::   ms1 - read and write MS/MS data in MS1 format  »