STEME started life as an approximation to the Expectation-Maximisation algorithm for the type of model used in motif finders such as MEME. STEME’s EM approximation runs an order of magnitude more quickly than the MEME implementation for typical parameter settings. STEME has now developed into a fully-fledged motif finder in its own right.
STEME’s source code can be found at its PyPI page. The latest version of STEME’s documentation is at its Python package page. An installation of STEME is available to run over the web.
STEME is based on the tried-and-tested MEME algorithm. MEME is one of the most mature and popular motif finders. It was one of the top performers in Tompa et al.’s benchmark comparison of motif finders.
STEME is designed to be used on the type of large data sets typically generated by modern biological experiments. STEME has been tested on input in the tens of megabases, but there is no reason why it should not be used on larger data sets.
STEME is fast. Typically motif finders have a runtime that grows quickly with the size of the input. Due to STEME’s use of suffix trees it does not suffer this problem. STEME provides options to control the runtime so that the user controls how long they are prepared to wait for the results.
Many motif finders (especially fast enumerative motif finders) use consensus sequences as models of binding sites. These are not as flexible as the PWMs that STEME uses and cannot capture the same range of motifs as PWMs.
STEME produces output in MEME’s well established format making it easy to use in downstream tools. STEME’s output has been tested with tools from MEME, BioPython and BioPerl.
STEME’s significance calculations are designed with large data sets in mind. Motif finders that have not been written for large data sets can often badly miscalculate the significance of the motifs they find. This is a particularly insidious problem and hard for the user to identify.
The EM approximation that is at the heart of STEME has been published in Nucleic Acids Research:
STEME: efficient EM to find motifs in large data sets
Nucl. Acids Res. (2011) 39(18)
Reid JE, Wernisch L
If you find STEME useful please cite us.