- New infrastructure for metadata annotations: AnnotationSet and Annotation.
- [NeXML] Full support of NeXML 0.9 metadata parsing and writing.
- get_from_url() and read_from_url() methods now allow for reading of phylogenetic data from URL’s.
- Added GBIF interoperability module (“dendropy.interop.gbif”).
- Added GenBank interoperability module (“dendropy.interop.genbank”).
- Splits hash calculations now correctly deal with unrooted incomplete leaf-set trees.
- Annotations/metadata API significantly changed. Affects not only parsing of NeXML-formatted data by NEXUS/NEWICK as well (metadata no longer stored in dictionaries, by AnnotationSet collections of Annotation objects).
- [NEXUS/NEWICK] suppress_internal_node_taxa keyword now actually instantiates Taxon objects from internal node labels when ‘False’, but defaults to True for consistency with legacy behavior.
- [NEXUS/NEWICK] Faster parsing of tree files.
- NCBI module (“dendropy.interop.ncbi”) has been deprecated. Use “dendropy.interop.genbank” instead.
New application script for concatenation of branch labels from across multiple input trees: sumlabels.py.
New interoperability class dendropy.interop.seqgen.SeqGen: wrapper for Seq-Gen integrated into library.
New interoperability function dendropy.interop.muscle.muscle_align(): wrapper for MUSCLE alignment.
New interoperability class dendropy.interop.raxml.RaxmlRunner: wrapper for RAxML.
prune_taxa() method added to CharacterMatrix.
Math modules moved to their own subpackage: dendropy.mathlib.
New module for matrix and vector computations: dendropy.mathlib.linearalg.
New module for statistical distance calculation: dendropy.mathlib.distance.
Family of Mahalanobis distance calculation functions in dendropy.mathlib.distance: squared_mahalanobis, squared_mahalanobis_1d, mahalanobis, mahalanobis_1d.
- Better documentation and examples of geodispersal analysis of Lieberman
and Eldredge (see extras/geodispersal subdirectory).
- Split orientation regression bug introduced in 3.10.0 fixed (affected SumTrees and other split comparisons when dealing with unrooted trees).
- Support for writing preamble blocks in NEXUS format.
- Support for writing character subsets in NEXUS.
- Support for user customization of edge label formatting.
- find_missing_splits() method added to dendropy.Tree().
- concatenate(), concatenate_from_paths(), and concatenate_from_streams() methods added to CharacterMatrix and derived classes: allows for creation of new CharacterMatrix or subtype from multiple character matrices or file sources. Sub-components or alignments will be stored as character subsets in the concatenated matrices (and can be re-exported as individual matrices using export_character_subset().
- Log probability of coalescent frames/trees now calculated correctly [PLEASE REVISIT PREVIOUS ANALYSIS IF YOU USED THIS FEATURE!].
- Strict PHYLIP format now correct.
- Phylogenetic independent contrasts (PIC) analysis can now be carried out using the dendropy.continuous.PhylogeneticIndependentContrasts class.
- Simplified contained coalescent (gene tree in species tree) simulation.
- Keyword arguments to as_string(), write_to_path(), write_to_file, etc. methods have been tweaked to become more consistent for NEXUS and Newick formats. Previous keywords are still supported, but will be deprecated. The new set of keyword arguments supported can be seen in the :ref:NEXUS and Newick writing customization <Customizing_Writing_NEXUS_and_Newick> section.
- NEXUS and Newick formats now default to case-insensitive taxon labels; specify case_insensitive_taxon_labels=False for case-sensitivity.
- Reading interleaved character matrices no longer results in the following block being skipped (NEXUS).
- Caught OverflowError when calculating summary statistics.
- SumTrees now allows e unweighted / edges=unweighted to strip edge length information from output trees.
- NexusReader now processes “ASSUMPTIONS” and “CODONS” blocks in addtion to “SETS” blocks for character set information.
- SumTrees now will not write edge lengths if none of the input trees have length information for that edge (previously, SumTrees would write “0.0” for the edge).
- NexusWriter (and hence SumTrees) does not write an extra-semicolon after each tree statement (which confuses some applications, e.g. FigTree).
- SumTrees can now summarize edge lengths on target trees in different ways:
- -e mean-length: sets the edge lengths of the target/consensus tree(s) to the mean of the lengths of the corresponding edges of the input trees.
- -e median-length: sets the edge lengths of the target/consensus tree(s) to the median of the lengths of the corresponding edges of the input trees.
- -e median-age: adjusts the edge lengths of the target/consensus tree(s) such that the node ages correspond to the median age of corresponding nodes of the input trees [requires rooted ultrametric trees].
- -e mean-age: adjusts the edge lengths of the target/consensus tree(s) such that the node ages correspond to the mean age of corresponding nodes of the input trees [requires rooted ultrametric trees].
SumTrees will now annotate nodes with summaries (mean, median, standard deviation, range, 95% highest posterior density, 5% and 95% quantiles, etc.) of the distribution of ages across across input trees if the trees are indicated to be ultrametric with the “--ultrametric” flag [requires rooted ultrametric trees].
SumTrees will now annotate edges with summaries (mean, median, standard deviation, range, 95% highest posterior density, 5% and 95% quantiles, etc.) of the distribution of edge lengths across across input trees.
SumTrees will now be interpret and handle tree weights if the --weighted-trees option is specified.
SumTrees will now report topology frequencies/probabilities, if given the --trprobs option.
SumTrees can now handle arbitrarily-large numbers of files.
SumTrees creates a rooted consensus tree if input trees are specified to be rooted using the --rooted-trees flag (or rooted trees are assumed as with the ‘–ultrametric’ flag).
- Tree objects can now be rerooted at midpoint (see Tree.reroot_at_midpoint()).
- Annotations (i.e., attributes of Tree, Node or Edge objects that have had “annotate()” called on them) can now be written as metadata comments (“[&field=value]”) when writing NEXUS/NEWICK format if the keyword argument annotations_as_comments is used.
- When reading in NEXUS/NEWICK format trees, specifying extract_comment_metadata=True will result in metadata comments being pulled into dictionary, with keys being fieldnames and values being the field values.
- When reading NEXUS format data, SETS blocks will be processed, and character sets parsed into the relevant CharacterDataMatrix.
- Character sets (as, for example, parsed out of NEXUS SETS blocks: see above) can be exported as new CharacterDataMatrix objects, and be saved/manipulated/etc. independentally.
- When writing in NEXUS or NEWICK formats, the write_item_comments keyword argument (True or False) can control whether extended comments associated with nodes on trees will be written or not.
- TopologyCounter class added to dendropy.treesum: allows for tracking of topology frequencies.
- treesplits.tree_from_splits() allows constructing of (topology-only) trees from a set of splits.
- Most functionality that used to be ‘dendropy.treemanip’ has now been migrated as native methods of the dendropy.Tree class. ‘dendropy.treemanip’ will be deprecated.
- Trees can now be pruned based on a list of taxa labels to remove or keep (previously, the methods would only accept lists of Taxon objects).
- “1.0” threshold special-cased to correctly recover strict consensus tre “1.0” threshold special-cased to correctly recover strict consensus tree.
- Implementation of the ‘General Sampling Approach’ (Hartman et al. 2010: Sampling Trees from Evolutionary Models; Syst. Biol. 49, 465-476) method of simulating trees from the birth-death model.
- Correct/consistent names for some probability functions.
- Bug in confirming overwriting of output file when using SumTrees ‘-e’/’–split-edges’ option.
- Ancient and grizzled semi-fossilized reference to ‘taxa_block’ corrected to ‘taxon_set’.
Migrated to BSD-style license.
- SumTrees now works (in serial mode) under older Python versions (i.e. < 2.6).
- Fixes for compatibility with Python 2.4.x.
SumTrees now can can run in parallel mode, with each input source tree file handled in a separate process. The maximum number of parallel processes can be specified by the ‘-m’ or ‘–multiprocessing’ flag:$ sumtrees.py -m 4 mb.run1.t mb.run2.t mb.run3.t mb.run4.t > consensus.tre $ sumtrees.py -multiprocessing=4 mb.run1.t mb.run2.t mb.run3.t mb.run4.t > consensus.tre
Tree comments now parsed and stored with trees (NEXUS and NEWICK formats), allowing for processing of annotations such as tree weights and calculated likelihoods.
Added new module for interacting with NCBI databases: dendropy.interop.ncbi. Sequences can be downloaded individually or by specifying ranges. In addition, labels suitable for use in phylogenetic analyses can be automatically composed for each sequence. For example:>>> from dendropy.interop import ncbi >>> entrez = ncbi.Entrez(generate_labels=True, sort_taxa_by_label=True) >>> data1 = entrez.fetch_nucelotide_accessions(['EU105474', 'EU105476']) >>> data1.write_to_path('seqs1.nex', 'nexus') >>> data2 = entrez.fetch_nucleotide_accession_range(105474, 106045, prefix="EU") >>> data2.write_to_path('seqs2.nex', 'nexus')
Note that unlike Python’s native “range” command, here the last element is included in the range (i.e., a range specified as “a, b” => [a,b]), and thus entrez.fetch_nucleotide_accession_range(105474, 106045, prefix="EU") will result in the full range of sequences from “EU105474-106045” being retrieved.
Added “beast-summary-tree” schema specification to process BEAST annotated consensus trees:>>> import dendropy >>> tree = dendropy.Tree.get_from_path('pythonidae.beast.tre', 'beast-summary-tree')
Each node on the resulting tree will have the following attributes: “height”, “height_median”, “height_95hpd”, “height_range”, “length”, “length_median”, “length_95hpd”, “length_range”, “posterior'. Scalar values will be of ``float type, while ranges (e.g., “height_95hpd”, “height_range”, “length_95hpd”, “length_range”) will be two-element lists of float types.
Added ladderize() method, to order nodes in ascending (default) or descending (ladderize(right=True)) order.
- age is now a permanent attribute of Node objects from initialization (previously, this attribute was optionally added when add_ages_to_nodes() was called).
- Tree.add_ages_to_nodes() replaced by Tree.calc_node_ages(): functioning is similar, but no longer offers the option for client code to select attribute name for the node age; this is now fixed to age (see above).
- NeXML namespace has been updated to use modern NeXML instance docs.
- new ContainingTree class to manage operations with embedded trees (species tree with gene trees...)
An exception is now thrown when branches with edge weights of “None” are encountered when calculating weighted Robinson-Foulds or Euclidean distances, EXCEPT if the edges are root edges (i.e., edges subtending the root node).
Deep-coalescence counting methods now moved to a distinct module: dendropy.reconcile:
- TaxonSetPartition iterator now iterates over partitions without crashing.
- When a Tree object is independently written, its TaxonSet no longer gets re-created.
- Corrected variable reference when inferring TaxonSet of a Tree object.
- Correct list comprehension when composing Node and Edge sets of a Tree object.
Taxon objects can now be accessed from a TaxonSet directly by their label:>>> taxonset = TaxonSet(['a', 'b', 'c']) >>> assert taxonset is taxonset['a'] # True
- Fixed taxon-pruning clean-up resulting in all leaves being removed (!).
- Fixed improper value returned for latest commit date when running under Windows.
- DendroPy is now backwards-compatible with Python 2.4.
- Native ASCII tree plotting is available as the “as_ascii_plot()” method of a Tree object.
- Some tree distance functions integrated as native methods of Tree: symmetric_difference, false_positives_and_negatives, robinson_foulds_distance, euclidean_distance.
- Tree() objects can now be cloned with a custom TaxonSet object by passing it to the constructor via the taxon_set keyword argument.
- Interoperability with the ETE <http://ete.cgenomics.org/>``_ library via the ``dendropy.interop.ete module.
- New character matrix type added: NucleotideCharacterMatrix (corresponding to NEXUS ‘datatype=nucleotide‘).
- Symbol ‘X‘ added to DNA, RNA and Nucleotide alphabets.
- ‘--unrooted‘ option added to SumTrees, forcing trees to be treated as unrooted (option ‘--rooted‘ is also still available, forcing trees to be treated as rooted).
- SumTrees now handles ‘--rooted‘ option correctly; previously unrooted trees were treated as rooted, leading to incorrect split supports when structurally-equivalent trees of different rotations were used.
FASTA format reading now accepts ‘fasta‘ as a schema specification:>>> import dendropy >>> cytb = dendropy.DnaCharacterMatrix.get_from_path("data.fas", "fasta")
but requires supplemental keyword argument, ‘data_type‘ to specify type of data (‘dna‘, ‘rna‘, ‘protein‘, etc.) when using DataSet:>>> import dendropy >>> cytb_ds = dendropy.DataSet.get_from_path("data.fas", "fasta", data_type="dna")
Implemented full-featured PHYLIP reader and enhanced PHYLIP writer, for both strict and relaxed modes (PHYLIP reader requires ‘data_type’ keyword argument when using DataSet):>>> import dendropy >>> cytb = dendropy.DnaCharacterMatrix.get_from_path("data.dat", "phylip") >>> cytb.write_to_path("data2.dat", "phylip", strict=True) >>> cytb_ds = dendropy.DataSet.get_from_path("data2.dat", "phylip", data_type="dna", strict=True)
Setup is now zipsafe by default.
- Fixed major bug in SumTrees when mapping support to target tree (support value did not get written to node label).
- Fixed bug in handling trailing whitespaces in interleaved NEXUS data.
- Fixed issue with encoding of splits when root has degree one.
- Initial release of Dendropy 3.