2. Selection Commands¶
>>> kalp = universe.select_atoms("segid KALP")
select_atoms() of a
Universe returns a
AtomGroup, so you can use all the methods
defined for AtomGroups on them. Selections always return an
atoms sorted according to their index in the topology (this is to ensure that
there aren’t any duplicates, which can happen with complicated selections).
One can group subselections using parentheses:
>>> universe.select_atoms("segid DMPC and not (name H* or name O*)") <AtomGroup with 3420 atoms>
Almost all the basic CHARMM selections work.
It is also possible to export selections for external software packages with the help of Selection exporters.
2.1. Selection Keywords¶
The following describes all selection keywords currently understood by the selection parser. The following applies to all selections:
- Keywords are case sensitive.
- Atoms are automatically sequentially ordered in a resulting selection (see notes below on Ordered selections for how to circumvent this if necessary).
- Selections are parsed left to right and parentheses can be used for grouping.
- Currently, only “stemming” is implemented as a primitive form of pattern
matching: Using the
*character in a string such as
GL*selects all strings that start with “GL” such as “GLU”, “GLY”, “GLX29”, “GLN”.
2.1.1. Simple selections¶
- protein, backbone, nucleic, nucleicbackbone
- selects all atoms that belong to a standard set of residues; a protein is identfied by a hard-coded set of residue names so it may not work for esoteric residues.
- segid seg-name
- select by segid (as given in the topology), e.g.
- resid residue-number-range
- resid can take a single residue number or a range of numbers. A range consists of two numbers separated by a colon (inclusive) such as
resid 1:5. A residue number (“resid”) is taken directly from the topology.
- resnum resnum-number-range
- resnum is the canonical residue number; typically it is set to the residue id in the original PDB structure.
- resname residue-name
- select by residue name, e.g.
- name atom-name
- select by atom name (as given in the topology). Often, this is force field dependent. Example:
name CA(for Cα atoms) or
name OW(for SPC water oxygen)
- type atom-type
- select by atom type; this is either a string or a number and depends on the force field; it is read from the topology file (e.g. the CHARMM PSF file contains numeric atom types). It has non-sensical values when a PDB or GRO file is used as a topology.
- atom seg-name residue-number atom-name
- a selector for a single atom consisting of segid resid atomname, e.g.
DMPC 1 C2selects the C2 carbon of the first residue of the DMPC segment
- all atoms not in the selection, e.g.
not proteinselects all atoms that aren’t part of a protein
- and, or
- combine two selections according to the rules of boolean algebra, e.g.
protein and not (resname ALA or resname LYS)selects all atoms that belong to a protein, but are not in a lysine or alanine residue
- around distance selection
- selects all atoms a certain cutoff away from another selection, e.g.
around 3.5 proteinselects all atoms not belonging to protein that are within 3.5 Angstroms from the protein
- sphlayer innerRadius externalRadius selection
- selects all atoms within a spherical layer centered in the center of geometry (COG) of a given selection, e.g.
sphlayer 2.4 6.0 ( protein and ( resid 130 or resid 80 ) )selects the center of geometry of protein, resid 130, resid 80 and creates a spherical layer of inner radius 2.4 and external radius 6.0 around the COG.
- sphzone externalRadius selection
- selects all atoms within a spherical zone centered in the center of geometry (COG) of a given selection, e.g.
sphzone 6.0 ( protein and ( resid 130 or resid 80 ) )selects the center of geometry of protein, resid 130, resid 80 and creates a sphere of radius 6.0 around the COG.
- cylayer innerRadius externalRadius zMax zMin selection
- selects all atoms within a cylindric layer centered in the center of geometry (COG) of a given selection, e.g.
cylayer 5 10 10 -8 proteinselects the center of geometry of protein, and creates a cylindrical layer of inner radius 5, external radius 10 centered on the COG. In z, the cylinder extends from 10 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.
- cyzone externalRadius zMax zMin selection
- selects all atoms within a cylindric zone centered in the center of geometry (COG) of a given selection, e.g.
cyzone 15 4 -8 protein and resid 42selects the center of geometry of protein and resid 42, and creates a cylinder of external radius 15 centered on the COG. In z, the cylinder extends from 4 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.
Changed in version 0.10.0: keywords cyzone and cylayer now take zMax and zMin to be relative to the COG of selection, instead of absolute z-values in the box.
- point x y z distance
- selects all atoms within a cutoff of a point in space, make sure coordinate is separated by spaces, e.g.
point 5.0 5.0 5.0 3.5selects all atoms within 3.5 Angstroms of the coordinate (5.0, 5.0, 5.0)
- prop [abs] property operator value
- selects atoms based on position, using property x, y, or z coordinate. Supports the abs keyword (for absolute value) and the following operators: <, >, <=, >=, ==, !=. For example,
prop z >= 5.0selects all atoms with z coordinate greater than 5.0;
prop abs z <= 5.0selects all atoms within -5.0 <= z <= 5.0.
From version 0.6 onwards, some geometrics selections (around, sphlayer, sphzone, point)
can use a k-d tree based, fast search algorithm (about three times faster than the
previous version). However, it does not take periodicity into
account. The fast algorithm is the default for around. Periodicity
is only taken into account with the
distance_array() functions via a
minimum image convention (and this only works for rectangular
simulation cells). If periodic boundary conditions should be taken
into account then change the default behaviour of MDAnalysis by setting
these two flags:
MDAnalysis.core.flags['use_periodic_selections'] = True MDAnalysis.core.flags['use_KDTree_routines'] = False
2.1.4. Similarity and connectivity¶
- same subkeyword as selection
- selects all atoms that have the same subkeyword value as any atom in selection. Allowed subkeyword values are the atom properties:
name, type, resname, resid, segid, mass, charge, radius, bfactor, resnum, the groups an atom belong to:
residue, segment, fragment, and the atom coordinates
x, y, z.
- byres selection
- selects all atoms that are in the same segment and residue as selection, e.g. specify the subselection after the byres keyword.
byresis a shortcut to
same residue as
- bynum index-range
- selects all atoms within a range of (1-based) inclusive indices, e.g.
bynum 1selects the first atom in the universe;
bynum 5:10selects atoms 5 through 10 inclusive. All atoms in the
MDAnalysis.Universeare consecutively numbered, and the index runs from 1 up to the total number of atoms.
2.1.6. Preexisting selections and modifiers¶
- group group-name
- selects the atoms in the
AtomGrouppassed to the function as an argument named group-name. Only the atoms common to group-name and the instance
select_atoms()was called from will be considered. group-name will be included in the parsing just by comparison of atom indices. This means that it is up to the user to make sure they were defined in an appropriate
- fullgroup group-name
- just like the
groupkeyword with the difference that all the atoms of group-name are included. The resulting selection may therefore have atoms that were initially absent from the instance
select_atoms()was called from.
Deprecated since version 0.11: The use of
fullgrouphas been deprecated in favor of the equivalent
- global selection
- by default, when issuing
AtomGroupselections and subselections are returned intersected with the atoms of that instance. Prefixing a selection term with
globalcauses its selection to be returned in its entirety. As an exmaple, the
globalkeyword allows for
lipids.select_atoms("around 10 global protein")— where
lipidsis a group that does not contain any proteins. Were
globalit absent the result would be an empty selection since the
proteinsubselection would itself be empty. When issuing
2.2. Instant selectors¶
For interactive work it becomes rather tedious to type common selection strings
repeatedly. MDAnalysis automatically generates a number of instant selectors
as attributes of the
Universe and number of other levels
of the structural hierarchy, namely for
2.2.1. Segment selector¶
universe.s<segid>(if <segid> starts with a number)
>>> u.s4AKE <Segment '4AKE'>
2.2.2. Resid selector¶
seg.r<N>selects residue with number
>>> u.s4AKE.r100 <Residue 'GLY', 100>
2.2.3. Residue name selector¶
seg.<resname>selects residues with residue name
>>> u.s4AKE.MET <ResidueGroup [<Residue 'MET', 1>, <Residue 'MET', 21>, <Residue 'MET', 34>, <Residue 'MET', 53>, <Residue 'MET', 96>, <Residue 'MET', 174>]> >>> u.s4AKE.CYS <ResidueGroup [<Residue 'CYS', 77>]> >>> u.s4AKE.TRP NoDataError: No atoms defined for AtomGroup
The result is always a
ResidueGroup; if no residues can be found then a
2.2.4. Atom name selector¶
g.<atomname>selects a single atom or a group of atoms with name
Atomif only a single atom was found,
AtomGroupif more than one atom was found, or
- raises a
MDAnalysis.SelectionErrorif no atom was found.
works for any group derived from
AtomGroup(i.e. all the ones mentioned above)
>>> u.atoms.CG >>> <AtomGroup with 125 atoms> >>> u.s4AKE.CG <AtomGroup with 125 atoms> >>> u.s4AKE.r100.CA < Atom 1516: name 'CA' of type '23' of resname 'GLY', resid 100 and segid '4AKE'> >>> u.s4AKE.r100.CB SelectionError: No atom in residue GLY with name CB
2.3. Ordered selections¶
select_atoms() sorts the atoms in the
AtomGroup by atom index before returning them (this is to
eliminate possible duplicates in the selection). If the ordering of atoms is
crucial (for instance when describing angles or dihedrals) or if duplicate
atoms are required then one has to concatenate multiple AtomGroups, which does
not sort them.
The most straightforward way to concatentate two AtomGroups is by using the + operator:
>>> ordered = u.select_atoms("segid DMPC and resid 3 and name P") + u.select_atoms("segid DMPC and resid 2 and name P") >>> print list(ordered) [< Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>, < Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>]
A shortcut is to provide two or more selections to
select_atoms(), which then does the concatenation
>>> print list(universe.select_atoms("segid DMPC and resid 3 and name P", "segid DMPC and resid 2 and name P")) [< Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>, < Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>]
Just for comparison to show that a single selection string does not work as one might expect:
# WRONG! >>> print list(universe.select_atoms("segid DMPC and ( resid 3 or resid 2 ) and name P")) [< Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>, < Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>]