Configuration File

Location

All commands in The GC3Apps software and The GC3Utils software read two configuration files at startup:

  • system-wide one located at :file:/etc/gc3/gc3pie.conf, and
  • a user-private one at :file:~/.gc3/gc3pie.conf.

Both files use the same format. The system-wide one is read first, so that users can override the system-level configuration in their private file. Configuration data from corresponding sections in the two configuration files is merged; the value in the user-private file overrides the one from the system-wide configuration.

If you try to start any GC3Utils command without having a configuration file, a sample one will be copied to the user-private location :file:~/.gc3/gc3pie.conf and an error message will be displayed, directing you to edit the sample file before retrying.

Configuration file format

The GC3Pie configuration file follows the format understood by Python ConfigParser, which is very close to the syntax used in MS-Windows .INI files. See http://docs.python.org/library/configparser.html for reference.

The GC3Libs configuration file consists of several configuration blocks. Each configuration block (section) starts with a keyword in square brackets and contains the configuration options for a specific part.

The following sections are used by the GC3Apps/GC3Utils programs:

  • [DEFAULT] – this is for global settings.
  • [auth/name] – these are for settings related to identity/authentication (identifying yourself to clusters & grids).
  • [resource/name] – these are for settings related to a specific computing resource (cluster, grid, etc.)

Sections with other names are allowed but will be ignored.

The DEFAULT section

The [DEFAULT] section is optional.

Values defined in the [DEFAULT] section can be used to insert values in other sections, using the %(name)s syntax. See documentation of the Python SafeConfigParser object at http://docs.python.org/library/configparser.html for an example.

auth sections

There can be more than one [auth] section.

Each authentication section must begin with a line of the form:

[auth/name]

where the name portion is any alphanumeric string.

You can have as many [auth/name] sections as you want; any name is allowed provided it’s composed only of letters, numbers and the underscore character _.

This allows you to define different auth methods for different resources. Each [resource/name] section will reference one (and one only) authentication section.

Authentication types

Each auth section must specify a type setting.

type defines the authentication type that will be used to access a resource. There are three supported authentication types:

  • ssh; use this for resources that will be accessed by opening an SSH connection to the front-end node of a cluster.
  • voms-proxy: uses voms-proxy-init to generate a proxy; use for resources that require a VOMS-enabled Grid proxy.
  • grid-proxy: uses grid-proxy-init to generate a proxy; use for resources that require a Grid proxy (but no VOMS extensions).
  • ec2: use this for a EC2-compatible cloud resource.

For the ssh-type auth, the following keys must be provided:

  • type: must be ssh
  • username: must be the username to log in as on the remote machine

For the ec2-type auth, the following keys can be provided. If they are not found, the value of the corresponding environment variable will be used instead, if found, otherwise an error will be raised.

  • ec2_access_key: Your personal access key to authenticate against the specific cloud endpoint. If not found, the environment variable EC2_ACCESS_KEY will be used.
  • ec2_secret_key: Your personal secret key associated with the above ec2_access_key. If not found, the environment variable EC2_SECRET_KEY will be used.

Any other key/value pair will be ignored.

For the voms-proxy type auth, the following keys must be provided:

  • type: must be voms-proxy
  • vo: the VO to authenticate with (passed directly to voms-proxy-init as argument to the --vo command-line switch)
  • cert_renewal_method: see below.
  • remember_password: see below.

Any other key/value pair will be ignored.

For the grid-proxy type auth, the following keys must be provided:

  • type: must be grid-proxy
  • cert_renewal_method: see below.
  • remember_password: see below.

Any other key/value pair will be ignored.

For the voms-proxy and grid-proxy authentication types, the cert_renewal_method setting specifies whether GC3Libs should attempt to get a certificate if the current one is expired or otherwise invalid. Currently there are two supported cert_renewal_method types:

  • slcs: user certificate is generated through an invocation of the slcs-init:command: program.
  • manual: user certificate is generated/renewed though an external process and has to be performed by the user outside of the scope of GC3Pie. In this case, if the user certificate is expired, invalid or non-existent, GC3Pie will fail to authenticate.

For the slcs certificate renewal method, the following keys must be provided:

  • aai_username: passed directly to slcs-init as argument to the --user command-line switch.
  • idp: passed directly to slcs-init as argument to the --idp command-line switch.

For the manual certificate renewal method, no additional keys are required.

The remember_password entry (optional) must be set to a boolean value (the strings 1`, ``yes, true and on are interpreted as boolean “true”; any other value counts as “false”). If set to a true value, the remember_password entry instructs GC3Pie to keep the password used for this authentication in the program’s main memory; this implies that you will be asked for the password at most once per program invocation. This setting is optional, and defaults to “false”. Keeping passwords in memory is bad security practice; do not set this option to “true” unless you understand the implications.

Example 1. The following example auth section shows how to configure GC3Pie for using SWITCHaai SLCS services to generate a certificate and a VOMS proxy to access the Swiss National Distributed Computing Infrastructure SMSCG:

[auth/smscg]
type = voms-proxy
cert_renewal_method = slcs
aai_username = <aai_user_name> # SWITCHaai/Shibboleth user name
idp= uzh.ch
vo = smscg

Example 2. The following configuration sections are used to set up two different accounts, that GC3Pie programs can use. Which account should be used on which computational resource is defined in the resource sections (see below).

[auth/ssh1]
type = ssh
username = murri # your username here

[auth/ssh2] # I use a different account name on some resources
type = ssh
username = rmurri

Example 3. The following configuration section is used to access an EC2 resource (access and secret keys are of course invalid :)):

[auth/hobbes]
type=ec2
ec2_access_key=1234567890qwertyuiopasdfghjklzxc
ec2_secret_key=cxzlkjhgfdsapoiuytrewq0987654321

resource sections

Each resource section must begin with a line of the form:

[resource/name]

You can have as many [resource/name] sections as you want; this allows you to define many different resources. Each [resource/name] section must reference one (and one only) [auth/name] section (by its auth key).

Resources currently come in several flavours, distinguished by the value of the type key:

  • If type is arc1, then the resource is accessed using the ARC grid middleware (version 1.1.x/1.0.x);
  • If type is arc0, then the resource is accessed using the ARC grid middleware (version 0.8.x);
  • If type is sge, then the resource is a Grid Engine batch system, to be accessed by an SSH connection to its front-end node.
  • If type is pbs, then the resource is a Torque/PBS batch system, to be accessed by an SSH connection to its front-end node.
  • If type is lsf, then the resource is a LSF batch system, to be accessed by an SSH connection to its front-end node.
  • If type is slurm, then the resource is a SLURM batch system, to be accessed by an SSH connection to its front-end node.
  • If type is shellcmd, then the resource is the computer where the GC3Pie script is running and applications are executed by just spawning a local UNIX process.
  • If type is ec2+shellcmd, then the resource is a cloud with EC2-compatible APIs, and applications are run on Virtual Machines spawned on the cloud.

All [resource/name] sections (except those of shellcmd type) must reference a valid auth/*** section. Resources of sge, pbs, lsf and slurm type can only reference :command:ssh type sections.; resources of type arc0 or arc1 can only reference [auth/***] sections whose type is voms-proxy or grid-proxy.

Some configuration keys are commmon to all resource types:

  • type: Resource type, see above.

  • auth: the name of a valid [auth/name] section; only the authentication section name (after the /) must be specified.

  • max_cores_per_job: Maximum number of CPU cores that a job can request; a resource will be dropped during the brokering process if a job requests more cores than this.

  • max_memory_per_core: Max amount of memory (expressed in GBs) that a job can request.

  • max_walltime: Maximum job running time (in hours).

  • max_cores: Total number of cores provided by the resource.

  • architecture: Processor architecture. Should be one of the strings x86_64 (for 64-bit Intel/AMD/VIA processors), i686 (for 32-bit Intel/AMD/VIA x86 processors), or x86_64,i686 if both architectures are available on the resource.

  • time_cmd: Used only when type is shellcmd. The time program is used as wrapper for the application in order to collect informations about the execution when running without a real LRMS.

  • prologue: Used only when type is pbs, lsf,

    slurm or sge. The content of the prologue script will be inserted into the submission script and it’s executed before the real application. It is intended to execute some shell commands needed to setup the execution environment before running the application (e.g. running a module load ... command). The script must be a valid, plain /bin/sh script.

  • epilogue: Used only when type is pbs, lsf,

    slurm or sge. The content of the epilogue script will be inserted into the submission script and it’s executed after the real application. The script must be a valid, plain /bin/sh script.

  • <application_name>_prologue: Same as prologue, but it is used only when <application_name> matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,

  • <application_name>_epilogue: Same as epilogue, but it is used only when <application_name> matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,

arc0 resources

The arc_ldap key should be set to the LDAP URL of an ARC GIIS or GRIS. If, in addition, the frontend key is also defined, then only queues belonging to the specified frontend will be considered for brokering.

When a job has just been submitted, the ARC information system does not immediately report about it: the job will appear at the next cache update. This creates a time window during which no information is reported about the job by ARC, as if it never existed. In order not to mistake this for a “job lost” error, GC3Libs allow a “grace time”: job information lookups are allowed to fail for a certain time span after submission. The duration of this time span is set with the optional lost_job_timeout parameter, whose default is 4 times the ARC default cache time; this parameter should not be lower than twice the information system update frequency.

  • lost_job_timeout: Time (in seconds) a failure in job lookup in the information system will not be considered critical

arc1 resources

The arc_ldap key should be defined to a valid ARC1 information system URL.

When a job has just been submitted, the ARC information system does not immediately report about it: the job will appear at the next cache update. This creates a time window during which no information is reported about the job by ARC, as if it never existed. In order not to mistake this for a “job lost” error, GC3Libs allow a “grace time”: job information lookups are allowed to fail for a certain time span after submission. The duration of this time span is set with the optional lost_job_timeout parameter, whose default is 4 times the ARC default cache time; this parameter should not be lower than twice the information system update frequency.

  • lost_job_timeout: Time (in seconds) a failure in job lookup in the information system will not be considered critical

sge resources

The following configuration keys are required in a sge-type resource section:

  • frontend: should contain the FQDN of the SGE front-end node. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
  • transport: Possible values are: ssh or local. If ssh, we try to connect to the host specified in frontend via SSH in order to execute SGE commands. If local, the SGE commands are run directly on the machine where GC3Pie is installed.

To submit parallel jobs to SGE, a “parallel environment” name must be specified. You can specify the PE to be used with a specific application using a configuration parameter application name + _pe (e.g., gamess_pe, zods_pe); the default_pe parameter dictates the parallel environment to use if no application-specific one is defined. If neither the application-specific, nor the ``default_pe`` parallel environments are defined, then it will not be possible to submit parallel jobs.

When a job has finished, the SGE batch system does not (by default) immediately write its information into the accounting database. This creates a time window during which no information is reported about the job by SGE, as if it never existed. In order not to mistake this for a “job lost” error, GC3Libs allow a “grace time”: qacct job information lookups are allowed to fail for a certain time span after the first time qstat failed. The duration of this time span is set with the sge_accounting_delay parameter, whose default is 15 seconds (matches the default in SGE, as of release 6.2):

  • sge_accounting_delay: Time (in seconds) a failure in qacct will not be considered critical.

GC3Pie uses standard command line utilities to interact with the resource manager. By default these commands are searched using the PATH environment variable, but you can specify the full path of these commands and/or add some extra options. The following options are used by the SGE backend:

  • qsub: submit a job.
  • qacct: get info on resources used by a job.
  • qdel: cancel a job.
  • qstat: get the status of a job or the status of available resources.

pbs resources

The following configuration keys are required in a pbs-type resource section:

  • transport: Possible values are: ssh or local. If ssh, we try to connect to the host specified in frontend via SSH in order to execute Troque/PBS commands. If local, the Torque/PBS commands are run directly on the machine where GC3Pie is installed.
  • frontend: should contain the FQDN of the Torque/PBS front-end node. This configuration item is only relevant if transport is local. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.

GC3Pie uses standard command line utilities to interact with the resource manager. By default these commands are searched using the PATH environment variable, but you can specify the full path of these commands and/or add some extra options. The following options are used by the PBS backend:

  • queue: the name of the queue to which jobs are submitted. If empty (the default), no job will be specified during submission.
  • qsub: submit a job.
  • qdel: cancel a job.
  • qstat: get the status of a job or the status of available resources.
  • tracejob: get info on resources used by a job.

lsf resources

The following configuration keys are required in a lsf-type resource section:

  • transport: Possible values are: ssh or local. If ssh, we try to connect to the host specified in frontend via SSH in order to execute LSF commands. If local, the LSF commands are run directly on the machine where GC3Pie is installed.
  • frontend: should contain the FQDN of the LSF front-end node. This configuration item is only relevant if transport is local. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.

GC3Pie uses standard command line utilities to interact with the resource manager. By default these commands are searched using the PATH environment variable, but you can specify the full path of these commands and/or add some extra options. The following options are used by the LSF backend:

  • bsub: submit a job.
  • bjobs: get the status and resource usage of a job.
  • bkill: cancel a job.
  • lshosts: get info on available resources.

slurm resources

The following configuration keys are required in a slurm-type resource section:

  • transport: Possible values are: ssh or local. If ssh, we try to connect to the host specified in frontend via SSH in order to execute SLURM commands. If local, the SLURM commands are run directly on the machine where GC3Pie is installed.
  • frontend: should contain the FQDN of the SLURM front-end node. This configuration item is only relevant if transport is local. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.

GC3Pie uses standard command line utilities to interact with the resource manager. By default these commands are searched using the PATH environment variable, but you can specify the full path of these commands and/or add some extra options. The following options are used by the SLURM backend:

  • sbatch: submit a job.
  • scancel: cancel a job.
  • squeue: get the status of a job or of the available resources.
  • sacct: get info on resources used by a job.

shellcmd resources

The following optional configuration keys are available in a shellcmd-type resource section:

  • transport: Like any other resources, possible values are ssh or local. Default value is local.
  • frontend: If transport is ssh, then frontend is the FQDN of the remote machine where the jobs will be executed.
  • time_cmd: ShellcmdLrms needs the GNU implementation of the command time in order to get resource usage of the submitted jobs. time_cmd must contains the path to the binary file if this is different from the standard (/usr/bin/time).
  • override: ShellcmdLrms by default will try to gather information on the system the resource is running on, including the number of cores and the available memory. These values may be different from the values stored in the configuration file. If override is True, then the values automatically discovered will be used. If override is False, the values in the configuration file will be used regardless of the real values discovered by the resource.
  • spooldir: Path to a filesystem location where to create temporary working directories for processes executed through this backend. The default value None means to use $TMPDIR or /tmp (see tempfile.mkftemp for details).

ec2+shellcmd resource

The following configuration options are available for a resource of type ec2+shellcmd. If these options are omitted, then the default of the boto python library will be used, which at the time of writing means use the default region on Amazon.

  • ec2_url: The URL of the EC2 frontend. On a typical OpenStack

    installation this will look like: https://cloud.gc3.uzh.ch:8773/services/Cloud, while for amazon it’s something like https://ec2.us-east-1.amazonaws.com (this is valid for the zone us-east-1 of course). If no value is specified, the environment variable EC2_URL will be used, and if not found an error is raised.

  • ec2_region: the region you want to access to. Most OpenStack installations only have one region called nova.

  • keypair_name: the name of the keypair to use when creating a new instance on the cloud. If it’s not found, a new keypair with this name and the key stored in public_key will be used. Please note that if the keypair exists already on the cloud but the associated public key is different from the one stored in public_key, then an error is raised and the resource will not be used.

  • public_key: public key to use when creating the keypair. Please note that GC3Pie will assume that the corresponding private key is stored on a file with the same path but without the .pub extension. This private key is necessary in order to access the virtual machines created on the cloud. Amazon users: Please note that Amazon does not accept DSA keys; use RSA keys only for Amazon resources.

  • vm_auth: the name of a valid auth stanza used to connect to the virtual machine.

  • instance_type: the instance type (aka flavor, aka size) you want to use for your virtual machines by default.

  • <application>_instance_type: you can override the default instance type for a specific application by defining an entry in the configuration file for that application. For example:

    instance_type=m1.tiny
    gc_gps_instance_type=m1.large
    

    will use instance type m1.large for the gc_gps GC3Pie application, and m1.tiny for all the other applications.

  • image_id: the ami-id of the image you want to use. OpenStack users: please note that the ID you will find on the web interface is not the ami-id. To get the ami-id of an image you have to use the command euca-describe-images from the euca2ools package.

    For Hobbes users: all virtual machines distributed by the GC3 team are in this list of appliances with the corresponding ami-id.

  • <application>_image_id: you can override the default image id for a specific application by defining an entry in the configuration file for that specific application. For example:

    image_id=ami-00000048
    gc_gps_image_id=ami-0000002a

    will use the image ami-0000002a for gc_gps and image ami-00000048 for all other applications.

  • security_group_name: the name of the security group to use. If not found, it will be created using the rules found in security_group_rules. If the security group is found but some of the rules in security_group_rules are not present, they will be added to the security groups. Please note that if the security group defines some rule which is not listed in security_group_rules it will not be removed from the security group.

  • security_group_rules: comma separated list of security rules the security_group must have. Each rule is in the form:

    PROTOCOL:PORT_RANGE_START:PORT_RANGE_END:IP_NETWORK

    where:

    • PROTOCOL can be one of tcp, udp, icmp
    • PORT_RANGE_START and PORT_RANGE_END are integers and define the range of ports to allow. If PROTOCOL is icmp please use -1 for both values since in icmp there is no concept of port.
    • IP_NETWORK is a range of IP to allow in the form A.B.C.D/N.

    For instance, to allow access to the virtual machine from any machine in the internet you can use:

    tcp:22:22:0.0.0.0/0

    Please note that in order to be able to access the created virtual machines GC3Pie needs to be able to connect via ssh, so the above rule is probably necessary in any gc3pie configuration. (of course, you can allow only your IP address or the IPs of your institution)

  • vm_pool_max_size: the maximum number of Virtual Machine GC3Pie will start on this cloud. If 0, there is no predefined limit to the number of virtual machines GC3Pie will spawn.

  • user_data: the content of a script that will run after the startup of the machine. For instance, to automatically upgrade a ubuntu machine after startup you can use:

    user_data=#!/bin/bash
      aptitude -y update
      aptitude -y safe-upgrade

    Please note that if you need to span over multiple lines you have to indent the lines after user_data, as any indented line in a configuration file is interpreted as a continuation of the previous line.

  • <application>_user_data: you can override the default userdata for a specific application by defining an entry in the configuration file for that specific application. For example:

    # user_data=
    warholize_user_data = #!/bin/bash
      aptitude update && aptitude -y install imagemagick

    will install imagemagick only for the warholize application.

Example resource sections

Example 1. This configuration stanza defines a resource smscg representing the whole SMSCG infrastructure, accessed through the ARC (version 0.8.x) middleware:

[resource/smscg]
# A whole ARC-based Grid
type = arc0
auth = <voms_auth_name>
arc_ldap = ldap://giis.smscg.ch:2135/o=grid/mds-vo-name=Switzerland
# These values are correct as of 2011-02-28; please
# ask on the SMSCG mailing list if unsure.
max_cores_per_job = 256
max_memory_per_core = 3
max_walltime = 9999
ncores = 1200
architecture = x86_64, i686

Example 2. This configuration stanza shows how to access a single cluster through the ARC middleware (version 1.x) using the name idgc3grid01 (which is also the internet host name of the cluster front-end):

[resource/idgc3grid01]
# A single cluster, accessed through the ARC middleware
type = arc
auth = <auth_name> # pick a ``voms`` type auth
frontend = idgc3grid01.uzh.ch
name = gc3
arc_ldap = ldap://idgc3grid01.uzh.ch:2135/mds-vo-name=local,o=grid
max_cores_per_job = 32
max_memory_per_core = 2
max_walltime = 12
ncores = 80

Example 3. This configuration stanza defines a resource to submit jobs to the Grid Engine cluster whose front-end host is ocikbpra.uzh.ch:

[resource/ocikbpra]
# A single SGE cluster, accessed by SSH'ing to the front-end node
type = sge
auth = <auth_name> # pick an ``ssh`` type auth, e.g., "ssh1"
transport = ssh
frontend = ocikbpra.uzh.ch
gamess_location = /share/apps/gamess
max_cores_per_job = 80
max_memory_per_core = 2
max_walltime = 2
ncores = 80

Example 4. This configuration stanza defines a resource to submit jobs on virtual machines that will be automatically started by GC3Pie on Hobbes, the private OpenStack cloud of the University of Zurich:

[resource/hobbes]
enabled=yes
type=ec2+shellcmd
ec2_url=http://hobbes.gc3.uzh.ch:8773/services/Cloud
ec2_region=nova

auth=ec2hobbes
# These values my be overwritten by the remote resource
max_cores_per_job = 8
max_memory_per_core = 2
max_walltime = 8
max_cores = 32
architecture = x86_64

keypair_name=my_name
# If keypair does not exists, a new one will be created starting from
# `public_key`. Note that if the desired keypair exists, a check is
# done on its fingerprint and a warning is issued if it does not match
# with the one in `public_key`
public_key=~/.ssh/id_dsa.pub
vm_auth=gc3user_ssh
instance_type=m1.tiny
warholize_instance_type = m1.small
image_id=ami-00000048
warholize_image_id=ami-00000035
security_group_name=gc3pie_ssh
security_group_rules=tcp:22:22:0.0.0.0/0, icmp:-1:-1:0.0.0.0/0
vm_pool_max_size = 8
user_data=
warholize_user_data = #!/bin/bash
    aptitude update && aptitude install -u imagemagick

Enabling/disabling selected resources

Any resource can be disabled by adding a line enabled = false to its configuration stanza. Conversely, a line enabled = true will undo the effect of an enabled = false line (possibly found in a different configuration file).

This way, resources can be temporarily disabled (e.g., the cluster is down for maintenance) without having to remove them from the configuration file.

You can selectively disable or enable resources that are defined in the system-wide configuration file. Two main use cases are supported: the system-wide configuration file :file:/etc/gc3/gc3pie.conf lists and enables all available resources, and users can turn them off in their private configuration file :file:~/.gc3/gc3pie.conf; or the system-wide configuration can list all available resources but keep them disabled, and users can enable those they prefer in the private configuration file.