Output Files

Outputs are specified in an extremely similar manner to options:

outputs=[
    [
        'test_output',
        'Main output file of this utility',
        {
            'validate': 'File/Output',
            'required': True,
            'default': 'ggo_out.complex',
            'data_format': 'text/tabular',
            'default_format': 'TSV_U',
        }
    ]
],

As you’ll notice there are two options specified which are entirely redundant and pointless, but until a future release must be specified: validate and required.

The outputs variable must be a list of 3-member lists. The first parameter is the name of the command line flag, if it is used. The second parameter is the description of the file, used in the label in galaxy and on the command line:

Output Files:
  --test_output [TEST_OUTPUT]
                        Main output file of this utility (Default:
                        ggo_out.complex) [Required]
  --test_output_format [TEST_OUTPUT_FORMAT]
                        Associated Format for test_output (Default: TSV_U)
                        [Required]

As noted above, for every output file an additional $ofname_format parameter is generated. The format handler specified internally determines the extension which need not be specified in --test_output.

These files can be made use of through the galaxygetopt.outputfiles module.

from galaxygetopt.outputfiles import OutputFiles
of = OutputFiles(name='test_output', GGO=c)
data = {
    'Sheet1': {
        'header': ['Key', 'Value'],
        'data': [
            ['A1', 'B1'],
            ['A2', 'B2'],
        ]
    }
}
of.CRR(data=data)

Here we see a tabular data structure used. The OutputFiles class requires access to the GalaxyGetOpt object in order to know which outputs have been pre-declared. In the invocation

of = OutputFiles(name='test_output', GGO=c)

The name parameter references the first element in the list of the declared output files. This gives OutputFiles access to the default format, path information, and two variables completely hidden from user and developer: test_output_id and test_output_files_path, both of which are available in galaxy.

Output Formats

Format Name Available Handlers
Plain text text/plain TXT, CONF
Tabular Data text/tabular TSV, TSV_U, CSV, CSV_U

Planned handlers/formats

Format Name Available Handlers
Tabular Data text/tabular XLS, XLSX, ODS, JSON, YAML
HTML Data text/html HTML (any via pandoc?)
Archives archive tar.gz, zip, tar
Genomic Data TBD BioPython supported formats

Plain Text

Simply a string. Put whatever you want in it

data = """Hello,
Word"""

Tabular Data

The top level object is a dict, with names being sheet names. It is recommended that it match r'[A-Za-z0-9-]+', though that is not strictly enforced currently. This concept is used to represent NxMxO dimensional data, where NxM represents a single table or sheet.

Each sheet consists of a dict containing two keys: ‘headers’ and ‘data’. Headers should contain a list of strings, and data should contain a list of lists. Every value will be coerced into a string.

data = {
    'Sheet1': {
        'header': ['Key', 'Value'],
        'data': [
            ['some_key', 42],
        ]
    }
}

Multiple Output Files

Sometimes, you will not know many files you need to produce until runtime. The following examples will use this output

['test_output','Main output file of this utility',
    {
        'validate': 'File/Output',
        'required': True,
        'default': 'ggo_out.complex',
        'data_format': 'text/plain',
        'default_format': 'TXT',
    }
]

Separate History Items

This can be accomplished via varCRR:

of = OutputFiles(name='test_output', GGO=c)
for i in range(10):
    data = "file %s" % (i,)
    of.varCRR(data=data, filename="file-" + str(i))

It is the developer’s responsibility to ensure filename is specified and unique. At the CLI, this translates into files named file-0.txt to file-9.txt. Under the galaxy environment, these translate into names like $__new_file_path__/primary_10000_file-2_visible_txt. Be careful not to use underscores as it will completely screw up filenames! We currently don’t test for this, though that will likely be introduced in another release

Single History Item

of = OutputFiles(name='test_output', GGO=c)
for i in range(10):
    data = "file %s" % (i,)
    of.subCRR(data=data, filename="file-" + str(i))

Sub files are contained in a folder, the folder name comes from --test_output_files_path. By default this is sub.files_path. The names are accessible to you as a list in the return value of subCRR in order to facilitate proper linking. Support for this will be improved in the future.

Table Of Contents

Previous topic

Documenting Tools

Next topic

Test Cases

This Page