Gromacs topology file (ITP) parser¶
New in version 0.2.5.
Basic reading, manipulating and writing works but can still be a bit
awkward. See documentation for ITP
.
Warning
The code is still experimental.
Limitations
- only one
[moleculetype]
at the moment (need to distinguish by molecule name) or maybe subsequent ones overwrite previous ones — not tested so better only use a single moleculetype (TODO) - merges multiple dihedral sections into one
- probably fails for FEP itps (TODO)
- does not reproduce positions of comment-only lines in sections (in fact, currently we do not even write them out again, only trailing line comments are kept and the header before the first section)
- only simple preprocessor directives such as
#ifdef POSRES
are processed by thePreprocessor
. Preprocessor directives are evaluated and stripped so thatITP
represents one functional itp file. The directives are not stored and cannot be written out or accessed.
Example¶
An itp file can be loaded and individual sections accessed (print shows a representation of what it would look like written to a file):
>>> itp = gromacs.fileformats.ITP("benzylhyd5.itp")
>>> print itp.header
>>> ...
>>> print itp.header.moleculetype
[ moleculetype ]
; Name nrexcl
5FH 3
Each section is an object that contains the parsed data in a
data
attribute:
>>> print itp.header.moleculetype.data
odict([('name', '5FH'), ('nrexcl', 3)])
>>> print itp.header.moleculetype.data['name']
5FH
The moleculetypes section contains the important subsections that make up the topology:
>>> print itp.header.moleculetype.sections.keys()
['atoms', 'bonds', 'angles', 'dihedrals', 'pairs']
They can be accessed in the same manner. For instance, counting all atom records:
>>> print len(itp.header.moleculetype.atoms)
24
For the atoms, the data
structure is a numpy.rec.recarray
.
Data can be modified and then finally a new itp file can be written with the
ITP.write()
method.
>>> itp.write("modified.itp")
User classes¶
The ITP
class always represents one itp file. It contains a tree of
parsed subsections that correspond to the subsections in the itp file. The way
to drill down is to access the attribute ITP.sections
, where the
subsections are stored. ITP.sections
acts like a dictionary. Each
subsection contains again a ITPsection.sections
dictionary containing
its own subsections. To find the [ atoms ]
data:
itp.sections['header'].sections['moleculetype'].sections['atoms'].data
For convenience, one can also use the keys in the sections
dictionary directly as attributes. In this way finding the [ atoms ]
data
is even simpler:
itp.header.moleculetype.atoms.data
For more details on the organization of the data structure see Classes corresponding to ITP sections.
-
class
gromacs.fileformats.itp.
ITP
(filename=None, **kwargs)¶ Class that represents a Gromacs topology itp file.
The file is processed by a
Preprocessor
according to directves in the file and variables defined in the constructor keyword arguments. Thus, the classITP
represents one particularly state of the itp file with any preprocessor constructs excluded. Only a limited set of directives is understood and aSyntaxError
is raised in cases where the preprocessor does not know how to handle the file.The file is parsed into a hierarchy of sections (and subsections), which are accessible in the attribute
ITP.sections
.Data of (sub)sections is stored in
data
, e.g. the[atoms]
sectionitp.sections['header'].sections['moleculetype'].sections['atoms'].data
contains the list of atoms for molecule
itp.sections['header'].sections['moleculetype'].data['name']
Data attributes are typically (ordered) dictionaries or NumPy recarrays.
The top-level section is called “header” and simply contains all comment lines before the first real parsed section.
Warning
Not all section types are implemented. There can only be a single type of section at each level of the hierarchy.
Only a single
[moleculetype]
is currently supported.Subsequent
[dihedrals]
sections are merged but if another section intersperses two[dihedral]
sections then the first one is lost.Comments are stripped except at the beginning of the file and at the end of data lines.
New in version 0.2.5.
Changed in version 0.3.1: Access to sections by attribute access is possible.
Initialize ITP structure.
Arguments: - filename
read from mdp file
- kwargs
#define
VAR variables that are used at the pre-processing stage of the itp file, e.g. POSRES= True
will activate a#ifdef POSRES
section in the itp file.
See also
preprocessor
describes details about the implementation of preprocessing of the itp file, including the (limited) syntax understood by the preprocesser.-
contains_preprocessor_constructs
()¶ Check if file makes use of any preprocessor constructs.
The test is done by running the file through the
Preprocessor
(while stripping all empty and lines starting with a comment character. This is compared to the original file, stripped in the same manner. If the two stripped files differ from each other then the preprocessor altered the file and preprocessor directives must have been involved and this function returnsTrue
.
-
parse
(stream)¶ Simple recursive descent parsing of ITP.
stream must have a
next()
and anunread()
method. A call must supply the next line of input or raiseEOFError
orStopIteration
.
-
read
(filename=None, preprocess=True, **defines)¶ Preprocess, read and parse itp file filename.
Any keywords in defines are use to modify the default preprocessor variables (see
gromacs.fileformats.preprocessor.Preprocessor.parse()
for details). Setting preprocess =False
skips the preprocessing step.
-
walk_sections
(func=None, flat=True)¶ Generator that applies func to each section.
Traverses all sections depth first. Applies
-
func
(name, section)¶ User supplied function that uses both the string name and the
ITPsection
instance and returns a result. Note that any lists and lists of lists will be flattened so it mostly makes sense to return a string or similar.
at each level and iterates over the individual results. With flat =
True
the resulting list is returned flat (default), otherwise the list represents the tree structure of the itp file.By default, func is
str(section)
. Thus, the whole itp file can be printed out withprint "\n".join(itp.walk_sections())
(See
write()
for writing out the ITP.)-
-
write
(filename)¶ Write ITP file to filename
Developer reference¶
Class ITP
makes use of the following classes to parse and
represent a topology as stored in an itp file.
Base classes and utility functions¶
-
class
gromacs.fileformats.itp.
ITPsection
(parent, **kwargs)¶ Class to parse and store ITP data.
data
contains the data of the sectionsections
holds the sub-parsers (including their owndata
; thisodict
mirrors the parse tree.Note that sections at the same level are merged.
parsers
contains a dict of the all the parser classes (keyed by section name) that are included in this section; typically overriden.
New in version 0.2.5.
Changed in version 0.3.1: Access to sections by attribute access is possible.
-
walk_sections
(func=None, flat=True)¶ Generator that applies func to each section and subsections.
Traverses all sections depth first. Applies
-
func
(name, section)¶ User supplied function that uses both the string name and the
ITPsection
instance and returns a result. Note that any lists and lists of lists will be flattened so it mostly makes sense to return a string or similar.
at each level and iterates over the individual results. With flat =
True
the resulting list is returned flat (default), otherwise the list represents the tree structure of the itp file.By default, func is
str(section)
.-
-
class
gromacs.fileformats.itp.
ITPdata
(*args, **kwargs)¶ Class to represent most ITP sections.
- parse line by line and store data and comments
- format data as recarray; uses up columns from left to right
- manipulate data in the recarray or write section with
section()
New in version 0.2.5.
-
data
¶ atom data as a
numpy.rec.array
The data inside the array can be changed but not appended.
-
section
()¶ Return a string of the section data in ITP format.
Data entries containing
None
are stripped from the output. In accordance with Gromacs ITP parsing rules, data columns can only be ommitted from the right.
-
set_data
(data)¶ data is atom data, stored as
numpy.rec.arry
-
class
gromacs.fileformats.itp.
OneLineBuffer
(getline)¶ Wrapper around anything with a
next()
method, to provide anunread()
.Provide the
next
method (or any other function that returns one line of input) as the argument getline.Implements a one-line buffer,
unread()
can only move the stream one line back (repeatedly callingunread()
does not do anything).New in version 0.2.5.
-
gromacs.fileformats.itp.
flatten
(l)¶ Generator for flattening a nested list l.
http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists-in-python
New in version 0.2.5.
Classes corresponding to ITP sections¶
The ITP file is parsed with a simple recursive descending depth-first algorithm. Classes contain a dictionary of sub-section parsers.
The section hierarchy is
-
Moleculetype
(Note: currently only a single moleculetype allowed)
-
class
gromacs.fileformats.itp.
Header
(*args, **kwargs)¶ The customary comments section before the first section.
Example format:
; itp file for Benzene ; example "header"
New in version 0.2.5.
-
class
gromacs.fileformats.itp.
Atomtypes
(*args, **kwargs)¶ ITP
[atomtypes]
section.Example format:
[ atomtypes ] ;name bond_type mass charge ptype sigma epsilon c3 c3 0.0000 0.0000 A 3.39967e-01 4.57730e-01 hc hc 0.0000 0.0000 A 2.64953e-01 6.56888e-02
New in version 0.2.5.
-
class
gromacs.fileformats.itp.
Moleculetype
(*args, **kwargs)¶ ITP
[ moleculetype ]
sectionExample format:
[ moleculetype ] ; Name nrexcl 5FH 3
New in version 0.2.5.
-
set_moleculename
(molname)¶ Set the molecule name to molname.
- changes the name in
[moleculetype]
section - changes the resname in the whole
[atoms]
section
- changes the name in
-
-
class
gromacs.fileformats.itp.
Atoms
(*args, **kwargs)¶ ITP
[ atoms ]
sectionExample format:
[ atoms ] ; atomnr atomtype resnr resname atomname chargegrp charge mass 1 opls_145 1 5FH C7 7 -0.115 12.01100 ; CA # Benzene C - 12 site JACS,112,4768-90. Use #145B for biphenyl 2 opls_146 1 5FH H71 7 0.115 1.00800 ; HA # Benzene H - 12 site.
The data itself are stored in
data
, anumpy.rec.array
.data
is a managed attribute but the values inside can be changed. Fields in the array have a maximum size (in particular, comments are cut off after 128 characters).New in version 0.2.5.
-
dtypes
= [('atomnr', 'i4'), ('atomtype', 'S10'), ('resnr', 'i4'), ('resname', 'S4'), ('atomname', 'S4'), ('chargegrp', 'i4'), ('charge', 'f8'), ('mass', 'f8'), ('atomtypeB', 'S10'), ('chargeB', 'f8'), ('massB', 'f8')]¶ numpy.dtype
columns for the data
-
fmt
= ['%8d', '%9s', '%7d', '%8s', '%9s', '%10d', '%8.3f', '%10.5f', '%9s', '%8.3f', '%10.5f']¶ output format (space separated), same ordering as
columns
-
set_resname
(name)¶ Changes the resname to name for all atom entries
-
-
class
gromacs.fileformats.itp.
Bonds
(*args, **kwargs)¶ ITP
[ bonds ]
sectionExample format:
[ bonds ] ; ai aj funct r k 1 2 1 ; CA-HA # PHE, etc. 1 3 1 ; CA-CA # TRP,TYR,PHE
New in version 0.2.5.
-
dtypes
= [('ai', 'i4'), ('aj', 'i4'), ('func', 'i4'), ('b0', 'f8'), ('kb', 'f8')]¶ numpy.dtype
columns for the data
-
fmt
= ['%4d', '%4d', '%6d', '%10.5f', '%10.1f']¶ output format (space separated), same ordering as
columns
-
-
class
gromacs.fileformats.itp.
Angles
(*args, **kwargs)¶ ITP
[ angles ]
sectionExample format:
[ angles ] ; ai aj ak funct theta cth 2 1 3 1 ; HA-CA-CA # 2 1 11 1 ; HA-CA-CA # 3 1 11 1 ; CA-CA-CA # PHE(OL)
New in version 0.2.5.
-
dtypes
= [('ai', 'i4'), ('aj', 'i4'), ('ak', 'i4'), ('func', 'i4'), ('theta0', 'f8'), ('cth', 'f8')]¶ numpy.dtype
columns for the data
-
fmt
= ['%4d', '%4d', '%4d', '%6d', '%9.3f', '%10.3f']¶ output format (space separated), same ordering as
columns
-
-
class
gromacs.fileformats.itp.
Dihedrals
(*args, **kwargs)¶ ITP
[ dihedrals ]
sectionExample format:
[ dihedrals ] ; ai aj ak al funct C0 ... C5 2 1 3 4 3 ; HA-CA-CA-HA # aromatic ring (X-CA-CA-X generic proper dihedral) 2 1 3 5 3 ; HA-CA-CA-CA # aromatic ring (X-CA-CA-X generic proper dihedral)
Note
Multiple adjacent dihedral sections are currently merged into a single dihedral section. This means that if you separated proper and improper dihedrals into two different sections then they will now appear one after another under a single
[ dihedrals ]
section. If two dihedral sections are separated by another section then the last one overwrites the previous one.New in version 0.2.5.
-
dtypes
= [('ai', 'i4'), ('aj', 'i4'), ('ak', 'i4'), ('al', 'i4'), ('func', 'i4'), ('c0', <type 'object'>), ('c1', <type 'object'>), ('c2', <type 'object'>), ('c3', <type 'object'>), ('c4', <type 'object'>), ('c5', <type 'object'>)]¶ numpy.dtype
columns for the data
-
fmt
= ['%4d', '%4d', '%4d', '%4d', '%5d', '%9s', '%9s', '%9s', '%9s', '%9s', '%9s']¶ output format (space separated), same ordering as
columns
-
-
class
gromacs.fileformats.itp.
Pairs
(*args, **kwargs)¶ ITP [ pairs ] section
Example format:
[ pairs ] ; ai aj funct 1 6 1 1 7 1
New in version 0.2.5.
-
class
gromacs.fileformats.itp.
Dummy
(parent, **kwargs)¶ Not implemented, just ignores everything