GC3Apps provide a script drive execution of multiple gnfs-cmd jobs each of them with a different parameter set. Allotogehter they form a single crypto simulation of a large parameter space. It uses the generic gc3libs.cmdline.SessionBasedScript framework.
The purpose of gcrypto is to execute several concurrent runs of gnfs-cmd on a parameter set. These runs are performed in parallel using every available GC3Pie resource; you can of course control how many runs should be executed and select what output files you want from each one.
Like in a for-loop, the gcrypto driver script takes as input three mandatory arguments:
For example:
# gcrypto 800000000 1200000000 1000
will produce 400000 jobs; the first job will perform calculations on the range 800000000 to 800000000+1000, the 2nd one will do the range 800001000 to 800002000, and so on.
Inputfile archive location (e.g. lfc://lfc.smscg.ch/crypto/lacal/input.tgz) can be specified with the ‘-i’ option. Otherwise a default filename ‘input.tgz’ will be searched in current directory.
Job progress is monitored and, when a job is done, output is retrieved back to submitting host in folders named: RANGE_START + (SLICE * ACTUAL_STEP) Where ACTUAL_STEP correspond to the position of the job in the overall execution.
The gcrypto command keeps a record of jobs (submitted, executed and pending) in a session file (set name with the ‘-s’ option); at each invocation of the command, the status of all recorded jobs is updated, output from finished jobs is collected, and a summary table of all known jobs is printed. New jobs are added to the session if new input files are added to the command line.
Options can specify a maximum number of jobs that should be in ‘SUBMITTED’ or ‘RUNNING’ state; gcrypto will delay submission of newly-created jobs so that this limit is never exceeded.
The gcrypto execute several runs of gnfs-cmd on a parameter set, and collect the generated output. These runs are performed in parallel, up to a limit that can be configured with the -J command-line option. You can of course control how many runs should be executed and select what output files you want from each one.
In more detail, gcrypto does the following:
Reads the session (specified on the command line with the --session option) and loads all stored jobs into memory. If the session directory does not exist, one will be created with empty contents.
Divide the initial parameter range, given in the command-line, into chunks taking the -J value as a reference. So from a coomand line argument like the following:
$ gcrypto 800000000 1200000000 1000 -J 200
gcrypto will generate an initial chunks of 200 jobs starting from the initial range 800000000 incrementing of 1000. All jobs will run gnfs-cmd on a specific parameter set (e.g. 800000000, 800001000, 800002000, ...). gcrypto will keep constant the number of simulatenous jobs running retrieving those terminated and submitting new ones untill the whole parameter range has been computed.
Updates the state of all existing jobs, collects output from finished jobs, and submits new jobs generated in step 2.
Finally, a summary table of all known jobs is printed. (To control the amount of printed information, see the -l command-line option in the Introduction to session-based scripts section.)
If the -C command-line option was given (see below), waits the specified amount of seconds, and then goes back to step 3.
The program gcrypto exits when all jobs have run to completion, i.e., when the whole paramenter range has been computed.
Execution can be interrupted at any time by pressing Ctrl+C. If the execution has been interrupted, it can be resumed at a later stage by calling gcrypto with exactly the same command-line options.
gcrypto requires a number of default input files common to every submited job. This list of input files is automatically fetched by gcrypto from a default storage repository. Those files are:
gnfs-lasieve6
M1019
M1019.st
M1037
M1037.st
M1051
M1051.st
M1067
M1067.st
M1069
M1069.st
M1093
M1093.st
M1109
M1109.st
M1117
M1117.st
M1123
M1123.st
M1147
M1147.st
M1171
M1171.st
M8e_1200
M8e_1500
M8e_200
M8e_2000
M8e_2500
M8e_300
M8e_3000
M8e_400
M8e_4200
M8e_600
M8e_800
tdsievemt
When gcrypto has to be executed with a different set of input files, an additional command line argument --input-files could be used to specify the locatin of a tar.gz archive containing the input files that gnfs-cmd will expect. Similarly, when a different version of gnfs-cmd command needs to be used, the command line argument --gnfs-cmd could be used to specify the location of the gnfs-cmd to be used.
The gcrypto script is based on GC3Pie’s session-based script model; please read also the Introduction to session-based scripts section for an introduction to sessions and generic command-line options.
A gcrypto command-line is constructed as follows: Like a for-loop, the gcrypto driver script takes as input three mandatory arguments:
Example 1. The following command-line invocation uses gcrypto to run gnfs-cmd on the parameter set ranging from 800000000 to 1200000000 with an increment of 1000.
$ gcrypto 800000000 1200000000 1000
In this case gcrypto will use the default values for determine the chunks size from the default value of the -J option (default value is 50 simulatenous jobs).
Example 2.
$ gcrypto --session SAMPLE_SESSION -c 4 -w 4 -m 8 800000000 1200000000 1000
In this example, job information is stored into session SAMPLE_SESSION (see the documentation of the --session option in Introduction to session-based scripts). The command above creates the jobs, submits them, and finally prints the following status report:
Status of jobs in the 'SAMPLE_SESSION' session: (at 10:53:46, 02/28/12)
NEW 0/50 (0.0%)
RUNNING 0/50 (0.0%)
STOPPED 0/50 (0.0%)
SUBMITTED 50/50 (100.0%)
TERMINATED 0/50 (0.0%)
TERMINATING 0/50 (0.0%)
total 50/50 (100.0%)
Note that the status report counts the number of jobs in the session, not the total number of jobs that would correspond to the whole parameter range. (Feel free to report this as a bug.)
Calling gcrypto over and over again will result in the same jobs being monitored;
The -C option tells gcrypto to continue running until all jobs have finished running and the output files have been correctly retrieved. On successful completion, the command given in example 2. above, would print:
Status of jobs in the 'SAMPLE_SESSION' session: (at 11:05:50, 02/28/12)
NEW 0/400k (0.0%)
RUNNING 0/400k (0.0%)
STOPPED 0/400k (0.0%)
SUBMITTED 0/400k (0.0%)
TERMINATED 50/400k (100.0%)
TERMINATING 0/400k (0.0%)
ok 400k/400k (100.0%)
total 400k/400k (100.0%)
Each job will be named after the parameter range it has computed (e.g. 800001000, 800002000, ... ) (you could see this by passing the -l option to gcrypto); each of these jobs will create an output directory named after the job.
For each job, the set of output files is automatically retrieved and placed in the locations described below.
Upon successful completion, the output directory of each gcrypto job contains:
Note
The number of .tgz files may depend on whether the execution of the gnfs-cmd command has completed or not (e.g. jobs may be killed by the batch system when exausting requested resources)
This section contains commented example sessions with gcrypto.
In typical operation, one calls gcrypto with the -C option and lets it manage a set of jobs until completion.
So, to compute a whole parameter range from 800000000 to 1200000000 with an increment of 1000, submitting 200 jobs simultaneously each of them requesting 4 computing cores, 8GB of memory and 4 hours of wall-clock time, one can use the following command-line invocation:
$ gcrypto -s example -C 120 -J 200 -c 4 -w 4 -m 8 800000000 1200000000 1000
The -s example option tells gcrypto to store information about the computational jobs in the example.jobs directory.
The -C 120 option tells gcrypto to update job state every 120 seconds; output from finished jobs is retrieved and new jobs are submitted at the same interval.
The above command will start by printing a status report like the following:
Status of jobs in the 'example.csv' session:
SUBMITTED 1/1 (100.0%)
It will continue printing an updated status report every 120 seconds until the requested parameter range has been computed.
In GC3Pie terminology when a job is finished and its output has been successfully retrieved, the job is marked as TERMINATED:
Status of jobs in the 'example.csv' session:
TERMINATED 1/1 (100.0%)
GC3Pie comes with a set of generic utilities that could be used as a complemet to the gcrypto command to better manage a entire session execution.
To cancel a running job, you can use the command gkill. For instance, to cancel job.16, you would type the following command into the terminal:
gkill job.16
or:
gkill -s example job.16
gkill could also be used to cancel jobs in a given state
gkill -s example -l UNKNOWN
Warning
There’s no way to undo a cancel operation! Once you have issued a gkill command, the job is deleted and it cannot be resumed. (You can still re-submit it with gresub, though.)
It is sometimes necessary, for debugging purposes, to print out all the details about a job; the ginfo command does just that: prints all the details that GC3Utils know about a single job.
For instance, to print out detailed information about job.13 in session example, you would type
ginfo -s example job.13
For a job in RUNNING or SUBMITTED state, only little information is known: basically, where the job is running, and when it was started:
$ ginfo -s example job.13
job.13
cores: 2
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:05 2012
Submitted to 'wsl' at Tue May 15 09:52:05 2012
RUNNING at Tue May 15 10:07:39 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/116613370683251353308673
lrms_jobname: LACAL_800001000
original_exitcode: -1
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069259.18
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
RUNNING: 1337069259.18
SUBMITTED: 1337068325.26
unknown_iteration: 0
used_cputime: 1380
used_memory: 3382706
If you omit the job number, information about all jobs in the session will be printed.
Most of the output is only useful if you are familiar with GC3Utils inner working. Nonetheless, ginfo output is definitely something you should include in any report about a misbehaving job!
For a finished job, the information is more complete and can include error messages in case the job has failed:
$ ginfo -c -s example job.13
job.13
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/crypto/results/example.out/8000001000
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/crypto/results/example.out/8000001000'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: LACAL_800001000
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
With option -v, ginfo output is even more verbose and complete, and includes information about the application itself, the input and output files, plus some backend-specific information:
$ ginfo -c -s example job.13
job.13
arguments: 800000800, 100, 2, input.tgz
changed: False
environment:
executable: gnfs-cmd
executables: gnfs-cmd
execution:
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/crypto/results/example.out/8000001000
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/crypto/results/example.out/8000001000'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: LACAL_800001000
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
inputs:
srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/gnfs-cmd_20120406: gnfs-cmd
srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/lacal_input_files.tgz: input.tgz
jobname: LACAL_800000900
join: True
output_base_url: None
output_dir: /data/crypto/results/example.out/8000001000
outputs:
@output.list: file, , @output.list, None, None, None, None
gcrypto.log: file, , gcrypto.log, None, None, None, None
persistent_id: job.1698503
requested_architecture: x86_64
requested_cores: 2
requested_memory: 4
requested_walltime: 4
stderr: None
stdin: None
stdout: gcrypto.log
tags: APPS/CRYPTO/LACAL-1.0