4.1. Character Matrices

4.1.1. Types of Character Matrices

The CharacterMatrix object represents character data in DendroPy. In most cases, you will not deal with objects of the CharacterMatrix class directly, but rather with objects of one of the classes specialized to handle specific data types:

4.1.2. CharacterMatrix Creating and Reading

As with most other phylogenetic data objects, objects of the CharacterMatrix-derived classes support the get_from_* factory and read_from_* instance methods to populate objects from a data source. These methods take a data source as the first argument, and a schema specification string (“nexus”, “newick”, “nexml”, “fasta”, or “phylip”, etc.) as the second, as well as optional keyword arguments to customize the reading behavior.

4.1.2.1. Creating a New CharacterMatrix from a Data Source

The following examples simultaneously instantiate and populate CharacterMatrix objects of the appropriate type from various file data sources:

>>> import dendropy
>>> dna = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_cytb.nex', 'nexus')
>>> rna = dendropy.DnaCharacterMatrix.get_from_path('hiv1_env.nex', 'nexus')
>>> aa = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_mos.nex', 'nexus')
>>> cv = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_sizes.nex', 'nexus')
>>> sm = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_skull.nex', 'nexus')

4.1.2.2. Repopulating a CharacterMatrix from a DataSource

The read_from_* instance methods replace the calling object with data from the data source, overwriting existing data:

>>> import dendropy
>>> dna = dendropy.DnaCharacterMatrix()
>>> dna.read_from_path('pythonidae_cytb.nex', 'nexus')
>>> dna.read_from_path('pythonidae_rag1.nex', 'nexus')

The second read_from_* will result in the dna object being re-populated with data from the file pythonidae_rag1.nex.

4.1.3. CharacterMatrix Saving and Writing

4.1.3.1. Writing to Files

The write_to_stream, and write_to_path instance methods allow you to write the data of a CharacterMatrix to a file-like object or a file path respectively. These methods take a file-like object (in the case of write_to_stream) or a string specifying a filepath (in the case of write_to_path) as the first argument, and a schema specification string as the second argument.

The following example reads a FASTA-formatted file and writes it out to a a NEXUS-formatted file:

>>> import dendropy
>>> dna = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_cytb.fasta', 'dnafasta')
>>> dna.write_to_path('pythonidae_cytb.nexus', 'nexus')

Fine-grained control over the output format can be specified using keyword arguments.

4.1.3.2. Composing a String

If you do not want to actually write to a file, but instead simply need a string representing the data in a particular format, you can call the instance method as_string, passing a schema specification string as the first argument:

>>> import dendropy
>>> dna = dendropy.DnaCharacterMatrix.get_from_path('pythonidae_cytb.fasta', 'dnafasta')
>>> s = dna.as_string('nexus')
>>> print(s)

As above, fine-grained control over the output format can be specified using keyword arguments.

4.1.4. Taxon Management with Character Matrices

Taxon management with CharacterMatrix-derived objects work very much the same as it does with Tree or TreeList objects: every time a CharacterMatrix-derived object is independentally created or read, a new TaxonSet is created, unless an existing one is specified. Thus, again, if you are creating multiple character matrices that refer to the same set of taxa, you will want to make sure to pass each of them a common TaxonSet reference:

>>> import dendropy
>>> taxa = dendropy.TaxonSet()
>>> dna1 = dendropy.DnaCharacterMatrix.get_from_path("pythonidae_cytb.fasta", "dnafasta", taxon_set=taxa)
>>> std1 = dendropy.ProteinCharacterMatrix.get_from_path("pythonidae_morph.nex", "nexus", taxon_set=taxa)

4.1.5. Accessing Data

Each sequence for a particular Taxon object is organized into a CharacterDataVector object, which, in turn, is a list of CharacterDataCell objects. You can retrieve the CharacterDataVector for a particular taxon by passing the corresponding Taxon object, its label, or its index to the CharacterMatrix object. Thus, to get the character sequence vector associated with the first taxon (“Python regius”) from the data source pythonidae_cytb.fasta:

>>> from dendropy import DnaCharacterMatrix
>>> cytb = DnaCharacterMatrix.get_from_path('pythonidae_cytb.fasta', 'dnafasta')
>>> v1 = cytb[0]
>>> v2 = cytb['Python regius']
>>> v3 = cytb[cytb.taxon_set[0]]
>>> v1 == v2 == v3
True

Table Of Contents

Previous topic

4. Working with Characters

Next topic

4.2. Phylogenetic Character Analyses

Documentation

Obtaining

AnnouncementsGoogle Groups

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

DiscussionGoogle Groups

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.