Input/Output

Saving and loading

Indexes can be saved to disk and loaded back from disk. The save() method serialises the index to disk.

>>> seqs = seqan.StringDNASet(('ACGT', 'AAAA', 'GGGG', 'AC'))
>>> index = seqan.IndexStringDNASetESA(seqs)
>>> seqan.traverse.depthfirsttraversal(index, lambda it: True)
<...>
>>> index.save('my-index')

This will create several (typically many) files with names such as my-index.bwt, my-index.child, etc... One thing to note is that the seqan data structures are often only initialised on first use so it can be worth traversing the entire index before saving, otherwise the index can be saved in an uninitialised state. Indexes can be restored from disk using the load() method.

>>> index2 = seqan.IndexStringDNASetESA.load('my-index')
>>> it = index2.topdown()
>>> it.goDown('AC')
True
>>> print 'Index has {0} occurrence(s) of representative "{1}"'.format(
...     it.numOccurrences, it.representative)
Index has 2 occurrence(s) of representative "AC"

Graphical output

If you have the graph-tool package installed, you can use it to create graphs that represent suffix trees or arrays. The graphs can be saved to various output formats or examined interactively. For example, suppose we have an index

We can build a graphtool graph from the index

>>> builder = seqan.io.graphtool.Builder(index)

and save it as a figure using a scale force directed placement (SFDP) layout algorithm

>>> pos = graph_tool.draw.graph_draw(
...     builder.graph,
...     pos=graph_tool.draw.sfdp_layout(builder.graph),
...     vertex_size=30,
...     vertex_fill_color="lightgrey",
...     vertex_text=builder.occurrences,
...     vertex_pen_width=seqan.io.graphtool.root_vertex_property(builder),
...     edge_text=seqan.io.graphtool.edge_labels_for_output(builder),
...     edge_color=seqan.io.graphtool.color_edges_by_first_symbol(builder),
...     edge_end_marker="none",
...     edge_pen_width=2,
...     output="index.png"
... )

Here we have set various edge and vertex properties such that:

  • The root vertex has a thicker border
  • The edges are colored by their first base
  • The number of occurrences for each vertex are shown
_images/index.png

We could have used predicates to control which parts of the suffix we built a graph for. A depthpredicate only shows those vertices within a certain distance of the root vertex

>>> builder = seqan.io.graphtool.Builder(index, predicate=seqan.traverse.depthpredicate(2))
>>> pos = graph_tool.draw.graph_draw(
...     builder.graph,
...     pos=graph_tool.draw.sfdp_layout(builder.graph),
...     vertex_size=30,
...     vertex_fill_color="lightgrey",
...     vertex_text=builder.occurrences,
...     vertex_pen_width=seqan.io.graphtool.root_vertex_property(builder),
...     edge_text=seqan.io.graphtool.edge_labels_for_output(builder),
...     edge_color=seqan.io.graphtool.color_edges_by_first_symbol(builder),
...     edge_end_marker="none",
...     edge_pen_width=2,
...     output="maxdepth-2.png"
... )
_images/maxdepth-2.png

or a suffix predicate only shows those vertices and edges near a given suffix

>>> suffix = 'ACG'
>>> builder = seqan.io.graphtool.Builder(index, predicate=seqan.traverse.suffixpredicate(suffix))
>>> pos = graph_tool.draw.graph_draw(
...     builder.graph,
...     pos=graph_tool.draw.sfdp_layout(builder.graph),
...     vertex_size=30,
...     vertex_fill_color="lightgrey",
...     vertex_text=builder.occurrences,
...     vertex_pen_width=seqan.io.graphtool.root_vertex_property(builder),
...     edge_text=seqan.io.graphtool.edge_labels_for_output(builder),
...     edge_color=seqan.io.graphtool.color_edges_by_first_symbol(builder),
...     edge_end_marker="none",
...     edge_pen_width=2,
...     edge_dash_style=seqan.io.graphtool.dash_non_suffix_edges(builder, suffix),
...     output="suffix.png"
... )
_images/suffix.png

Table Of Contents

Previous topic

Indexes

Next topic

pyseqan

This Page