1.3. Writing Phylogenetic Data

1.3.1. Writing to Streams, Filepaths, or Strings

The Tree, TreeList, CharacterMatrix-derived, and DataSet classes all support the following instance methods for writing data:

write_to_stream(dest, schema, **kwargs)
Takes a file or file-like object opened for writing the data as the first argument, and a string specifying the schema as the second.
write_to_path(dest, schema, **kwargs)
Takes a string specifying the path to the file as the first argument, and a string specifying the schema as the second.
as_string(schema, **kwargs)
Takes a string specifying the schema as the first argument, and returns a string containing the formatted-representation of the data.

1.3.2. Specifying the Data Writing Format

The schema specification string can be one of the following:

nexus
To write Tree, TreeList, CharacterMatrix, or DataSet objects in NEXUS format.
newick
To write Tree, TreeList, or DataSet objects in Newick format. With DataSet objects, only tree data will be written.
fasta
To write CharacterMatrix or DataSet objects in FASTA format. With DataSet objects, only character data will be written.
phylip
To write CharacterMatrix or DataSet objects in PHYLIP format. With DataSet objects, only character data will be written.

1.3.3. Customizing the Data Writing Format

The writing of data can be controlled or fine-tuned using keyword arguments. As with reading, some of these arguments apply generally, while others are only available or make sense for a particular format.

1.3.3.1. All Formats

taxon_set
When writing a DataSet object, if passed a specific TaxonSet, then only TreeList and CharacterMatrix objects associated with this TaxonSet will be written. By default, this is None, meaning that all data in the DataSet object will be written.
exclude_trees
When writing a DataSet object, if True, then no tree data will be written (i.e., all TreeList objects in the DataSet will be skipped in the output). By default, this is False, meaning that all tree data will be written.
exclude_chars
When writing a DataSet object, if True, then no characer data will be written (i.e., all CharacterMatrix objects in the DataSet will be skipped in the output). By default, this is False, meaning that all character data will be written.

1.3.3.2. NEXUS and Newick

The following code fragment shows a typical invocation of a NEXUS-format write operation using all supported keywords with their defaults:

d.write_to_path(
        'data.nex',
        'nexus',
        taxon_set=None,
        exclude_trees=False,
        exclude_chars=False,
        simple=False,
        suppress_taxa_block=True,
        exclude_trees=False,
        exclude_chars=False,
        preamble_blocks=[],
        supplemental_blocks=[],
        file_comments=None,
        suppress_leaf_taxon_labels=False,
        suppress_leaf_node_labels=True,
        suppress_internal_taxon_labels=False,
        suppress_internal_node_labels=False,
        suppress_rooting=False,
        suppress_edge_lengths=False,
        unquoted_underscores=False,
        preserve_spaces=False,
        store_tree_weights=False,
        suppress_annotations=False,
        annotations_as_nhx=False,
        suppress_item_comments=False,
        node_label_element_separator=' ',
        node_label_compose_func=None,
        edge_label_compose_func=None)

The following code fragment shows a typical invocation of a Newick-format write operation using all supported keyword arguments with their default values:

d.write_to_path(
        'data.tre',
        'newick',
        taxon_set=None,
        suppress_leaf_taxon_labels=False,
        suppress_leaf_node_labels=True,
        suppress_internal_taxon_labels=False,
        suppress_internal_node_labels=False,
        suppress_rooting=False,
        suppress_edge_lengths=False,
        unquoted_underscores=False,
        preserve_spaces=False,
        store_tree_weights=False,
        suppress_annotations=True,
        annotations_as_nhx=False,
        suppress_item_comments=True,
        node_label_element_separator=' ',
        node_label_compose_func=None)

The special keywords supported for writing NEXUS-formatted output include:

simple
When writing NEXUS-formatted data, if True, then character data will be represented as a single “DATA” block, instead of separate “TAXA” and “CHARACTERS” blocks. By default this is False.
block_titles
When writing NEXUS-formatted data, if False, then title statements will not be added to the various NEXUS blocks (i.e., “TAXA”, “CHARACTERS”, and “TREES”). By default, this is True, i.e., block titles will be written.
suppress_taxa_block
If True, do not write a “TAXA” block. Default is False.
exclude_trees
When writing NEXUS-formatted data, if True, then no tree data will be written (i.e., all TreeList objects in the DataSet will be skipped in the output). By default, this is False, meaning that all tree data will be written.
exclude_chars
When writing NEXUS-formatted data, if True, then no characer data will be written (i.e., all CharacterMatrix objects in the DataSet will be skipped in the output). By default, this is False, meaning that all character data will be written.
preamble_blocks
When writing NEXUS-formatted data, a list of other blocks (or strings) to be written at the beginning of the file.
supplemental_blocks
When writing NEXUS-formatted data, a list of other blocks (or strings) to be written at the end of the file.
file_comments
When writing NEXUS-formatted data, then the contents of this variable (a string or a list of strings) will be added as a NEXUS comment to the file (at the top). By default, this is None.

The special keywords supported for writing both NEXUS- or Newick-formatted trees include:

suppress_leaf_taxon_labels
If True, then taxon labels will not be printed for leaves. Default is False.
suppress_leaf_node_labels
If False, then node labels (if available) will be printed for leaves. Defaults to True. Note that DendroPy distinguishes between taxon labels and node labels. In a typical NEWICK string, taxon labels are printed for leaf nodes, while leaf node labels are ignored (hence the default ‘True‘ setting, to ignore leaf node labels).
suppress_internal_taxon_labels
If True, then taxon labels will not be printed for internal nodes. Default is False. NOTE: this replaces the internal_labels argument which has been deprecated.
suppress_internal_node_labels
If True, internal node labels will not be written. Default is False. NOTE: this replaces the internal_labels argument which has been deprecated.
suppress_rooting
If True, will not write rooting statement. Default is False. NOTE: this keyword argument replaces the write_rooting argument which has now been deprecated.
suppress_edge_lengths
If True, will not write edge lengths. Default is False. NOTE: this keyword argument replaces the edge_lengths argument which has now been deprecated.
unquoted_underscores
If True, labels with underscores will not be quoted, which will mean that they will be interpreted as spaces if read again (“soft” underscores). If False, then labels with underscores will be quoted, resulting in “hard” underscores. Default is False. NOTE: this keyword argument replaces the quote_underscores argument which has now been deprecated.
preserve_spaces
If True, spaces not mapped to underscores in labels. Default is False.
store_tree_weights
If True, tree weights are written. Default is False.
suppress_annotations
If True, will not write annotated attributes as comments. Default is False if writing in NEXUS format and simple is False; otherwise, if writing in NEWICK format or NEXUS format with simple set to True, then defaults to True.
annotations_as_nhx
If True and suppress_annotations is True, then annotations will be written in NHX format (‘[&&field=value:field=value]’), as opposed to a more generic format with only one leading ampersand (‘[&field=value,field=value,field={value,value}]’). Defaults to False.
suppress_item_comments
If True, will not write any additional comments associated with (tree) items. Default is False if writing in NEXUS format and simple is False; otherwise, if writing in NEWICK format or NEXUS format with simple set to True, then defaults to True.
node_label_element_separator
If both suppress_leaf_taxon_labels and suppress_leaf_node_labels are False, then this will be the string used to join them. Defaults to ‘ ‘.
node_label_compose_func
If not None, should be a function that takes a Node object as an argument and returns the string to be used to represent the node in the tree statement. The return value from this function is used unconditionally to print a node representation in a tree statement, by-passing the default labelling function (and thus ignoring suppress_leaf_taxon_labels, suppress_leaf_node_labels=True, suppress_internal_taxon_labels, suppress_internal_node_labels, etc.). Defaults to None.
edge_label_compose_func
If not None, should be a function that takes an Edge object as an argument, and returns the string to be used to represent the edge length in the tree statement.

1.3.3.3. FASTA

The following code fragment shows a typical invocation of a FASTA-format write operation using all supported keywords with their defaults:

d.write_to_path(
        'data.fas',
        'fasta',
        taxon_set=None,
        wrap=False,
        wrap_width=70)

The special keywords supported for writing FASTA-formatted data include:

wrap
If True, then sequences will be wrapped at wrap_width characters. Defaults to False. Output is prettier, but writing operations are considerably slower.
wrap_width
If wrap is True, then sequences will be wrapped at these many characters. Defaults to 70.

1.3.3.4. PHYLIP

The following code fragment shows a typical invocation of a PHYLIP-format write operation using all supported keywords with their defaults:

d.write_to_path(
        'data.day',
        'phylip',
        taxon_set=None,
        strict=False,
        space_to_underscores=False,
        force_unique_taxon_labels=False)

The special keywords supported for writing PHYLIP-formatted data include:

strict
If True, write in “strict” PHYLIP format, i.e., with taxon labels truncated to 10-characters, and sequence characters beginning on column 11. Defaults to False: writes in “relaxed” format (taxon labels not truncated, and separated from sequence characters by more two consecutive spaces).
spaces_to_underscores
If True, replace all spaces in taxon labels with underscores; useful if writing in relaxed mode, where spaces are used to delimit the beginning of sequence characters. Defaults to False: labels not changed.
force_unique_taxon_labels
If True, then identical taxon labels (or labels that are identical due to truncation) will be disambiguated through the appending of indexes.

Table Of Contents

Previous topic

1.2. Reading Phylogenetic Data

Next topic

1.4. Examining Data Objects

Documentation

Obtaining

AnnouncementsGoogle Groups

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

DiscussionGoogle Groups

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.