Pyteomics documentation v3.4.2

trafoxml - reader for trafoXML files

Contents

trafoxml - reader for trafoXML files

Summary

trafoXML is a format specified in the OpenMS project. It defines a transformation, which is a result of retention time alignment.

This module provides a minimalistic way to extract information from trafoXML files. You can use the old functional interface (read()) or the new object-oriented interface (TrafoXML) to iterate over entries in <Pair> elements.

Data access

TrafoXML - a class representing a single trafoXML file. Other data access functions use this class internally.

read() - iterate through pairs in a trafoXML file. Data from a single trafo are converted to a human-readable dict.

chain() - read multiple trafoXML files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

Dependencies

This module requres lxml.


pyteomics.openms.trafoxml.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:

files : iterable

Iterable of file names or file objects.

class pyteomics.openms.trafoxml.TrafoXML(source, read_schema=True, iterative=True, build_id_cache=False, **kwargs)[source]

Bases: pyteomics.xml.XML

Parser class for trafoXML files.

Methods

build_id_cache(*args, **kwargs) Construct a cache for each element in the document, indexed by id
build_tree(*args, **kwargs) Build and store the ElementTree instance
clear_id_cache() Clear the element ID cache
clear_tree() Remove the saved ElementTree.
get_by_id(*args, **kwargs) Parse the file and return the element with id attribute equal to elem_id.
iterfind(*args, **kwargs) Parse the XML and yield info on elements with specified local name or by specified “XPath”.
next()
reset()
__init__(source, read_schema=True, iterative=True, build_id_cache=False, **kwargs)

Create an XML parser object.

Parameters:

source : str or file

File name or file-like object corresponding to an XML file.

read_schema : bool, optional

Defines whether schema file referenced in the file header should be used to extract information about value conversion. Default is True.

iterative : bool, optional

Defines whether an ElementTree object should be constructed and stored on the instance or if iterative parsing should be used instead. Iterative parsing keeps the memory usage low for large XML files. Default is True.

build_id_cache : bool, optional

Defines whether a dictionary mapping IDs to XML tree elements should be built and stored on the instance. It is used in XML.get_by_id(), e.g. when using pyteomics.mzid.MzIdentML with retrieve_refs=True.

build_id_cache(*args, **kwargs)

Construct a cache for each element in the document, indexed by id attribute

build_tree(*args, **kwargs)

Build and store the ElementTree instance for the underlying file

clear_id_cache()

Clear the element ID cache

clear_tree()

Remove the saved ElementTree.

get_by_id(*args, **kwargs)

Parse the file and return the element with id attribute equal to elem_id. Returns None if no such element is found.

Parameters:

elem_id : str

The value of the id attribute to match.

Returns:

out : dict or None

iterfind(*args, **kwargs)

Parse the XML and yield info on elements with specified local name or by specified “XPath”.

Parameters:

path : str

Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: "/path/to/element[some_value>1.5]" Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

**kwargs : passed to self._get_info_smart().

Returns:

out : iterator

pyteomics.openms.trafoxml.read(source, read_schema=True, iterative=True)[source]

Parse source and iterate through pairs.

Parameters:

source : str or file

A path to a target trafoXML file or the file object itself.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the file header (default). Otherwise, use default parameters. Disable this to avoid waiting on slow network connections or if you don’t like to get the related warnings.

iterative : bool, optional

Defines whether iterative parsing should be used. It helps reduce memory usage at almost the same parsing speed. Default is True.

Returns:

out : iterator

An iterator over the dicts with feature properties.

Contents