Managing jobs remotely¶
Warning
Experimental code, use at your own risk.
The Manager
encapsulates knowledge of how to run a job on a
remote system. This basically means the host name, where are the
scratch directories, what queuing system is running, etc. It simply
uses ssh to communicate with the remote system and thus it
is necessary to set up ssh with public key authentication
to make this work smoothly.
- The manager can move files between the local file system and the remote scratch directories back and forth (using scp).
- It can remotely launch a job (by running qsub).
- It can also check on the progress by inspecting (in a primitive fashion) the log file of mdrun on the remote system.
The remote directory name is constructed in the following way:
- topdir is stripped from the local working directory to give WDIR
- scratchdir/WDIR is the directory on the remote system
Configuration file¶
See Manager
for how the values in the configuration file are
used.
Example:
[DEFAULT]
name = leviathan
[local]
topdir = ~
[remote]
hostname = leviathan.petagrid.org
scratchdir = /scratch/username/Projects
[queuing_system]
name = PBS
qscript = leviathan.pbs
walltime = 24.0
start_cwd = True
All entries except walltime and start_cwd are required; walltime
can be omitted or set to None
.
DEFAULT section¶
name
identifier of the configuration; should also be the name of the configuration file, i.e. name.cfg
remote section¶
hostname
fully qualified domain name of the host; used for runningssh hostname
orscp FILES hostname:DIR
scratchdir
top level directory on the remote host udner which the working directories are constructed; see below for how this is done
queuing_system section¶
name
identifier for the queuing system (should be a valid python identifier)
qscript
default queuing system script template; store it in~/.gromacswrapper/qscripts
walltime
maximum allowed run time on the system; job files are written in such a way that Gromacs stops run at 0.99 or walltime. If omitted then the job runs until it is done (provided the queuing system policy allows that)
start_cwd
Set toTrue
means that the queuing system requires the queuing system script tocd
into the job directory; this seems to be a bug in some versions of PBS, which we can work-around inManager.qsub()
Queuing system Manager¶
The configuration files are stored in the ~.gromacswrapper/manager directory. A file named “foo.cfg” corresponds to the manager named “foo”.
The Manager
class must be customized for each system such as
a cluster or a super computer through a cfg file (see
Configuration file). It then allows submission and control of
jobs remotely (using ssh).
-
class
gromacs.manager.
Manager
(name, dirname=None, **kwargs)¶ Base class to launch simulations remotely on computers with queuing systems.
Basically, ssh into machine and run job.
The manager is configured through a cfg file “name.cfg”, whose format is described in Configuration file.
If a special job submission procedure is required then a class must be dreived that implements a specialized
Manager.qsub()
method.ssh must be set up (via ~/.ssh/config) to allow access via a commandline such as
ssh <hostname> <command> ...
Typically you want something such as
host <hostname> hostname <hostname>.fqdn.org user <remote_user>
in
~/.ssh/config
and also set up public-key authentication in order to avoid typing your password all the time.Set up the manager.
Arguments: - name
configuration name (corresponds to a store cfg file)
- dirname
directory component under the remote scratch dir (should be different for different jobs); the default is to strip topdir from the config file from the full path of the current directory
- prefix
identifier for job names [MD]
-
job_done
()¶ alias for
get_status()
-
qstat
()¶ alias for
get_status()
-
cat
(dirname, prefix='md', cleanup=True)¶ Concatenate parts of a run in dirname.
Always uses
gromacs.cbook.cat()
with resolve_multi = ‘guess’.Note
The default is to immediately delete the original files (cleanup =
True
).Keywords: - dirname
directory to work in
- prefix
prefix (deffnm) of the files [md]
- cleanup : boolean
if
True
, remove all used files [True
]
-
get
(dirname, checkfile=None, targetdir='.')¶ scp -r
dirname from host into targetdirArguments: - dirname: dir to download
- checkfile: raise OSError/ENOENT if targetdir/dirname/checkfile was not found
- targetdir: put dirname into this directory
Returns: return code from scp
-
get_dir
(*args)¶ Directory on the remote machine.
-
get_status
(dirname, logfilename='md*.log', silent=False)¶ Check status of remote job by looking into the logfile.
Report on the status of the job and extracts the performance in ns/d if available (which is saved in
Manager.performance
).Arguments: - dirname
- logfilename can be a shell glob pattern [md*.log]
- silent = True/False; True suppresses log.info messages
Returns: True
is job is done,False
if still runningNone
if no log file found to look atNote
Also returns
False
if the connection failed.Warning
This is an important but somewhat fragile method. It needs to be improved to be more robust.
-
local_get
(dirname, checkfile, cattrajectories=True, cleanup=False)¶ Find checkfile locally if possible.
If checkfile is not found in dirname then it is transferred from the remote host.
If needed, the trajectories are concatenated using
Manager.cat()
.Returns: local path of checkfile
-
log_RE
= <_sre.SRE_Pattern object>¶ Regular expression used by
Manager.get_status()
to parse the logfile from mdrun.
-
ndependent
(runtime, performance=None, walltime=None)¶ Calculate how many dependent (chained) jobs are required.
Uses performance in ns/d (gathered from
get_status()
) and job max walltime (in hours) from the class unless provided as keywords.n = ceil(runtime/(performance*0.99*walltime)Keywords: - runtime
length of run in ns
- performance
ns/d with the given setup
- walltime
maximum run length of the script (using 99% of it), in h
Returns: n or 1 if walltime is unlimited
-
put
(dirname)¶ scp dirname to host.
Arguments: dirname to be transferred Returns: return code from scp
-
putfile
(filename, dirname)¶ scp filename to host in dirname.
Arguments: filename and dirname to be transferred to Returns: return code from scp
-
qsub
(dirname, **kwargs)¶ Submit job remotely on host.
This is the most primitive implementation: it just runs the commands
cd remotedir && qsub qscript
on
Manager._hostname
. remotedir is dirname underManager._scratchdir
and qscript is the name of the queuing system script in remotedir.Arguments: - dirname
directory, relative to the current one, under which the all job files reside (typically, this is also were the queuing system script qscript lives)
- qscript
name of the queuing system script; defaults to the queuing system script hat was produced from the template
Manager._qscript
; searched in the current directory (.
) and under dirname- remotedir
full path to the job directory on the remote system; in most cases it should be sufficient to let the programme choose the appropriate value based on dirname and the configuration of the manager
-
remotepath
(*args)¶ Directory on the remote machine.
-
remoteuri
(*args)¶ URI of the directory on the remote machine.
-
setup_MD
(jobnumber, struct='MD_POSRES/md.pdb', **kwargs)¶ Set up production and transfer to host.
Arguments: - jobnumber: 1,2 ...
- struct is the starting structure (default from POSRES run but that is just a guess);
- kwargs are passed to
gromacs.setup.MD()
-
setup_posres
(**kwargs)¶ Set up position restraints run and transfer to host.
kwargs are passed to
gromacs.setup.MD_restrained()
-
waitfor
(dirname, **kwargs)¶ Wait until the job associated with dirname is done.
Super-primitive, uses a simple while ... sleep for seconds delay
Arguments: - dirname
look for log files under the remote dir corresponding to dirname
- seconds
delay in seconds during re-polling
The actual config file contents can be retrieved with
get_manager_config()
.
-
gromacs.manager.
get_manager_config
(filename)¶ Load manager configuration file from filename.
Helper classes and functions¶
The following classes and functions are mainly documented for developers.
-
gromacs.manager.
find_manager_config
(name)¶ Find a configuration file for manager name.
-
class
gromacs.manager.
ManagerConfigParser
(defaults=None, dict_type=<class 'collections.OrderedDict'>, allow_no_value=False)¶ -
add_section
(section)¶ Create a new section in the configuration.
Raise DuplicateSectionError if a section by the specified name already exists. Raise ValueError if name is DEFAULT or any of it’s case-insensitive variants.
-
get
(section, option, raw=False, vars=None)¶ Get an option value for a given section.
If `vars’ is provided, it must be a dictionary. The option is looked up in `vars’ (if provided), `section’, and in `defaults’ in that order.
All % interpolations are expanded in the return values, unless the optional argument `raw’ is true. Values for interpolation keys are looked up in the same manner as the option.
The section DEFAULT is special.
-
getfloat
(section, option, **kwargs)¶ Return as
float()
orNone
.
-
getpath
(section, option, **kwargs)¶ Return option as an expanded path.
-
has_option
(section, option)¶ Check for the existence of a given option in a given section.
-
has_section
(section)¶ Indicate whether the named section is present in the configuration.
The DEFAULT section is not acknowledged.
-
items
(section, raw=False, vars=None)¶ Return a list of tuples with (name, value) for each option in the section.
All % interpolations are expanded in the return values, based on the defaults passed into the constructor, unless the optional argument `raw’ is true. Additional substitutions may be provided using the `vars’ argument, which must be a dictionary whose contents overrides any pre-existing defaults.
The section DEFAULT is special.
-
options
(section)¶ Return a list of option names for the given section name.
-
read
(filenames)¶ Read and parse a filename or a list of filenames.
Files that cannot be opened are silently ignored; this is designed so that you can specify a list of potential configuration file locations (e.g. current directory, user’s home directory, systemwide directory), and all existing configuration files in the list will be read. A single filename may also be given.
Return list of successfully read files.
-
readfp
(fp, filename=None)¶ Like read() but the argument must be a file-like object.
The `fp’ argument must have a `readline’ method. Optional second argument is the `filename’, which if not given, is taken from fp.name. If fp has no `name’ attribute, `<???>’ is used.
-
remove_option
(section, option)¶ Remove an option.
-
remove_section
(section)¶ Remove a file section.
-
sections
()¶ Return a list of section names, excluding [DEFAULT]
-
set
(section, option, value=None)¶ Set an option. Extend ConfigParser.set: check for string values.
-
write
(fp)¶ Write an .ini-format representation of the configuration state.
-
-
class
gromacs.manager.
Job
¶ Properties of a job.