Pyteomics documentation v2.1.5

pepxml - pepXML file reader

«  mgf - read and write MS/MS data in Mascot Generic Format.   ::   Contents   ::   mzid - mzIdentML file reader  »

pepxml - pepXML file reader

Summary

pep.XML was the first widely accepted format for proteomics search engines’ output. Even though it is to be replaced by a community standard mzIdentML, it is still used commonly.

This module provides minimalistic infrastructure for access to data stored in pep.XML files. The most important function is read(), which reads peptide-spectum matches and related information and saves them into human-readable dicts. The rest of data can be obtained via get_node() function. This function relies on the terminology of the underlying lxml library.

Data access

read() - iterate through peptide-spectrum matches in a pep.XML file. Data for a single spectrum are converted to an easy-to-use dict.

roc_curve() - get a receiver-operator curve (min peptideprophet probability is a sample vs. false discovery rate) of peptideprophet analysis.

version_info() - get version information about the pepXML file.

iterfind() - iterate over elements in a pepXML file.


pyteomics.pepxml.version_info(source, *args, **kwargs)

Provide version information about the pepXML file.

pyteomics.pepxml.iterfind(source, *args, **kwargs)

Parse source and yield info on elements with specified local name or by specified “XPath”. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: “/path/to/element[some_value>1.5]” Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

pyteomics.pepxml.read(*args, **kwargs)[source]

Parse source and iterate through peptide-spectrum matches.

Parameters :

source : str or file

A path to a target pepXML file or the file object itself.

Returns :

out : iterator

An iterator over the dicts with PSM properties.

pyteomics.pepxml.roc_curve(source)[source]

Parse source and return a ROC curve for peptideprophet analysis.

Parameters :

source : str or file

A path to a target pepXML file or the file object itself.

Returns :

out : list

A list of ROC points, sorted by ascending min prob.

«  mgf - read and write MS/MS data in Mascot Generic Format.   ::   Contents   ::   mzid - mzIdentML file reader  »