Pyteomics documentation v3.4.1

mzxml - reader for mass spectrometry data in mzXML format

Contents

mzxml - reader for mass spectrometry data in mzXML format

Summary

mzXML is a (formerly) standard XML-format for raw mass spectrometry data storage, intended to be replaced with mzML.

This module provides a minimalistic way to extract information from mzXML files. You can use the old functional interface (read()) or the new object-oriented interface (MzXML) to iterate over entries in <scan> elements. MzXML also supports direct indexing with scan IDs.

Data access

MzXML - a class representing a single mzXML file. Other data access functions use this class internally.

read() - iterate through spectra in mzXML file. Data from a single scan are converted to a human-readable dict. Spectra themselves are stored under ‘m/z array’ and ‘intensity array’ keys.

chain() - read multiple mzXML files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

Deprecated functions

version_info() - get version information about the mzXML file. You can just read the corresponding attribute of the MzXML object.

iterfind() - iterate over elements in an mzXML file. You can just call the corresponding method of the MzXML object.

Dependencies

This module requires lxml and numpy.


pyteomics.mzxml.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:

files : iterable

Iterable of file names or file objects.

pyteomics.mzxml.version_info(source)

Provide version information about the XML file.

Note

This function is provided for backward compatibility only. It simply creates an MzXML instance and returns its version_info attribute.

Parameters:

source : str or file

File name or file-like object.

Returns:

out : tuple

A (version, schema URL) tuple, both elements are strings or None.

pyteomics.mzxml.iterfind(source, path, **kwargs)[source]

Parse source and yield info on elements with specified local name or by specified XPath.

Note

This function is provided for backward compatibility only. If you do multiple iterfind() calls on one file, you should create an MzXML object and use its iterfind() method.

Parameters:

source : str or file

File name or file-like object.

path : str

Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: "/path/to/element[some_value>1.5]" Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

recursive : bool, optional

If False, subelements will not be processed when extracting info from elements. Default is True.

iterative : bool, optional

Specifies whether iterative XML parsing should be used. Iterative parsing significantly reduces memory usage and may be just a little slower. When retrieve_refs is True, however, it is highly recommended to disable iterative parsing if possible. Default value is True.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the mzIdentML header (default). Otherwise, use default parameters. Disable this to avoid waiting on slow network connections or if you don’t like to get the related warnings.

Returns:

out : iterator

class pyteomics.mzxml.MzXML(*args, **kwargs)[source]

Bases: pyteomics.xml.ArrayConversionMixin, pyteomics.xml.IndexedXML

Parser class for mzXML files.

Methods

build_id_cache(*args, **kwargs) Construct a cache for each element in the document, indexed by id
build_tree(*args, **kwargs) Build and store the ElementTree instance
clear_id_cache() Clear the element ID cache
clear_tree() Remove the saved ElementTree.
get_by_id(*args, **kwargs) Retrieve the requested entity by its id.
iterfind(path, **kwargs)
next()
reset()
__init__(*args, **kwargs)
build_id_cache(*args, **kwargs)

Construct a cache for each element in the document, indexed by id attribute

build_tree(*args, **kwargs)

Build and store the ElementTree instance for the underlying file

clear_id_cache()

Clear the element ID cache

clear_tree()

Remove the saved ElementTree.

get_by_id(*args, **kwargs)

Retrieve the requested entity by its id. If the entity is a spectrum described in the offset index, it will be retrieved by immediately seeking to the starting position of the entry, otherwise falling back to parsing from the start of the file.

Parameters:

elem_id : str

The id value of the entity to retrieve.

Returns:

dict :

pyteomics.mzxml.iterfind(source, path, **kwargs)[source]

Parse source and yield info on elements with specified local name or by specified XPath.

Note

This function is provided for backward compatibility only. If you do multiple iterfind() calls on one file, you should create an MzXML object and use its iterfind() method.

Parameters:

source : str or file

File name or file-like object.

path : str

Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: "/path/to/element[some_value>1.5]" Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

recursive : bool, optional

If False, subelements will not be processed when extracting info from elements. Default is True.

iterative : bool, optional

Specifies whether iterative XML parsing should be used. Iterative parsing significantly reduces memory usage and may be just a little slower. When retrieve_refs is True, however, it is highly recommended to disable iterative parsing if possible. Default value is True.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the mzIdentML header (default). Otherwise, use default parameters. Disable this to avoid waiting on slow network connections or if you don’t like to get the related warnings.

Returns:

out : iterator

pyteomics.mzxml.read(source, read_schema=True, iterative=True, use_index=False, dtype=None)[source]

Parse source and iterate through spectra.

Parameters:

source : str or file

A path to a target mzML file or the file object itself.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the mzML header (default). Otherwise, use default parameters. Disable this to avoid waiting on slow network connections or if you don’t like to get the related warnings.

iterative : bool, optional

Defines whether iterative parsing should be used. It helps reduce memory usage at almost the same parsing speed. Default is True.

use_index : bool, optional

Defines whether an index of byte offsets needs to be created for spectrum elements. Default is False.

Returns:

out : iterator

An iterator over the dicts with spectrum properties.

Contents