Previous topic

Installation

Next topic

Managing a research project with Sumatra

This Page

Getting startedΒΆ

Let us assume that you already have a project based on numerical simulation, which you wish to start managing using Sumatra, and that the code for this project is under version control. Note that the following is equally valid if your project is based on data analysis rather than, or as well as, simulation: just mentally replace “simulation” with “analysis” in the following.

Change to the working directory for your project, and then create a new Sumatra project in this directory using the smt init command:

$ cd myproject
$ smt init MyProject

where MyProject is the project name. This creates a sub-directory named .smt.

Sumatra tracks data files created by your simulation by searching for newly created files within a given directory tree. By default, it assumes that your simulation will create files in a sub-directory Data of your working directory. (You can change this by providing the --datapath option to smt init or smt configure.)

Now let’s run a simulation. We will assume that your simulation code is written in Python, and that you run the simulation by executing a file called main.py, passing it the name of a parameter file on the command line, i.e., you would normally run a simulation using:

$ python main.py default.param

To run it using Sumatra, you would use:

$ smt run --executable=python --main=main.py default.param

Now we can see a list of the simulations we have run:

$ smt list
20140418-154800

This shows the label for each simulation we have run. Since we did not specify a label, one was automatically generated from the timestamp. To see more detail, use the --long option:

$ smt list --long
--------------------------------------------------------------------------------
Label            : 20140418-154800
Timestamp        : 2014-04-18 15:48:00.439809
Reason           :
Outcome          :
Duration         : 0.216287851334
Repository       : GitRepository at /path/to/myproject
Main_File        : main.py
Version          : a75cf131a69ba831e915bc6a09987e832e65e7bc
Script_Arguments : <parameters>
Executable       : Python (version: 2.6.8) at /usr/local/bin/python
Parameters       : seed = 65785 # seed for random number generator
                 : distr = "uniform" # statistical distribution to draw values from
                 : n = 100 # number of values to draw
Input_Data       : []
Launch_Mode      : serial
Output_Data      : [example2.dat(43a47cb379df2a7008fdeb38c6172278d000fdc4)]
User             : Arthur Dent <dent@example.com>
Tags             :
Repeats          : None

(most options also have a short form, -l in this case.)

It is a bit tedious to have to tell Sumatra which simulator and which file to run every time. Presumably, the name of the main file changes infrequently and the simulator almost never. Therefore, these can be set as defaults for a given project:

$ smt configure --executable=python --main=main.py

(you could also have given these options to smt init. init is used to create a project and configure to change its configuration later, but they mostly accept the same arguments).

Now you can run a simulation with a much shorter command line:

$ smt run default.param

To see the current configuration of your project, use the info command:

$ smt info
Project name        : MyProject
Default executable  : Python (version: 2.6.8) at /usr/local/bin/python
Default repository  : GitRepository at /path/to/myproject
Default main file   : main.py
Default launch mode : serial
Data store (output) : /path/to/myproject/Data
.          (input)  : /
Record store        : Django (/path/to/myproject/.smt/records)
Code change policy  : error
Append label to     : None
Label generator     : timestamp
Timestamp format    : %Y%m%d-%H%M%S
Sumatra version     : 0.7.0

Sumatra automatically records the identity and versions of the simulation files and the simulator executable, stores links to any files created by the simulation, records any error messages, the date and time at which the simulation was run, and its duration. You may also add your own annotations, in several different ways. On running the simulation, you can specify a unique label, and the reason for which you are running the simulation:

$ smt run --label=haggling --reason="determine whether the gourd is worth 3 or 4 shekels" romans.param

After the simulation is complete, you can add a description of the outcome:

$ smt comment "apparently, it is worth NaN shekels."

This adds the comment to the most recent simulation. You may also describe the outcome of an earlier simulation, by specifying its label:

$ smt comment 20140418-154800 "Eureka! Nobel prize here we come."

You can also tag a simulation record with one or more short keywords:

$ smt tag foobar
$ smt tag barfoo

and remove tags:

$ smt tag --remove barfoo

The parameter file may be in any format - it is your script which is responsible for reading it. However, if it is in one of the formats that Sumatra understands then it is possible to modify parameter values on the command line. Suppose default.param contains a parameter tau_m = 20.0, as well as a number of other parameters, then:

$ smt run --reason="test effect of a smaller time constant" default.param tau_m=10.0

will generate a new parameter file identical to default.param but with tau_m equal to 10.0, and then will pass this new parameter file to your script. This can be very convenient when you wish to study the effects of changing one or two parameters, without having to edit your parameter file each time.

One of the main aims of Sumatra is to ensure the reproducibility of simulation results. The repeat command re-runs a previous simulation, and checks that the output is identical to that of the original run:

$ smt repeat haggling
The new record exactly matches the original.

Although it is better not to delete simulation records (so as to preserve a full record of the project, false starts and all), it is possible:

$ smt delete 20140418-154800

It is also possible to delete all simulations with a given tag:

$ smt delete --tag foobar

Most of the commands described here have further options that we have not described. A full description of the options for each command is given in the command reference. The full list of commands is available by running smt by itself:

$ smt
Usage: smt <subcommand> [options] [args]

Simulation/analysis management tool version 0.7.0

Available subcommands:
  init
  configure
  info
  run
  list
  delete
  comment
  tag
  repeat
  diff
  help
  export
  upgrade
  sync
  migrate

and help on a given command is available by running the command with the --help option, e.g.:

$ smt comment --help
usage: smt comment [options] [LABEL] COMMENT

This command is used to describe the outcome of the simulation/analysis. If
LABEL is omitted, the comment will be added to the most recent experiment. If
the '-f/--file' option is set, COMMENT should be the name of a file containing
the comment, otherwise it should be a string of text. By default, comments
will be appended to any existing comments. To overwrite existing comments, use
the '-r/--replace flag.

positional arguments:
  LABEL          the record to which the comment will be added
  comment        a string of text, or the name of a file containing the
                 comment.

optional arguments:
  -h, --help     show this help message and exit
  -r, --replace  if this flag is set, any existing comment will be
                 overwritten, otherwise, the new comment will be appended to
                 the end, starting on a new line
  -f, --file     interpret COMMENT as the path to a file containing the
                 comment

or smt help CMD, where CMD is the name of the command.

This tutorial has covered using smt for serial simulations/analyses. A further tutorial covers using smt for parallel computations (using MPI).

Also see smtweb, which provides a more graphical interface to viewing lists of records than smt list.