Output Files
============

Outputs are specified in an extremely similar manner to options:

.. code:: python

    outputs=[
        [
            'test_output',
            'Main output file of this utility',
            {
                'validate': 'File/Output',
                'required': True,
                'default': 'ggo_out.complex',
                'data_format': 'text/tabular',
                'default_format': 'TSV_U',
            }
        ]
    ],

As you'll notice there are two options specified which are entirely
redundant and pointless, but until a future release must be specified:
``validate`` and ``required``.

The ``outputs`` variable must be a list of 3-member lists. The first
parameter is the name of the command line flag, if it is used. The
second parameter is the description of the file, used in the label in
galaxy and on the command line:

.. code:: bash

    Output Files:
      --test_output [TEST_OUTPUT]
                            Main output file of this utility (Default:
                            ggo_out.complex) [Required]
      --test_output_format [TEST_OUTPUT_FORMAT]
                            Associated Format for test_output (Default: TSV_U)
                            [Required]

As noted above, for every output file an additional ``$ofname_format``
parameter is generated. The format handler specified internally
determines the extension which need not be specified in
``--test_output``.

These files can be made use of through the ``galaxygetopt.outputfiles``
module.

.. code:: python

    from galaxygetopt.outputfiles import OutputFiles
    of = OutputFiles(name='test_output', GGO=c)
    data = {
        'Sheet1': {
            'header': ['Key', 'Value'],
            'data': [
                ['A1', 'B1'],
                ['A2', 'B2'],
            ]
        }
    }
    of.CRR(data=data)

Here we see a tabular data structure used. The ``OutputFiles`` class
requires access to the GalaxyGetOpt object in order to know which
outputs have been pre-declared. In the invocation

.. code:: python

    of = OutputFiles(name='test_output', GGO=c)

The ``name`` parameter references the first element in the list of the
declared output files. This gives ``OutputFiles`` access to the default
format, path information, and two variables completely hidden from user
and developer: ``test_output_id`` and ``test_output_files_path``, both
of which are available in galaxy.

Output Formats
--------------

+----------------+--------------------+----------------------------+
| Format         | Name               | Available Handlers         |
+================+====================+============================+
| Plain text     | ``text/plain``     | TXT, CONF                  |
+----------------+--------------------+----------------------------+
| Tabular Data   | ``text/tabular``   | TSV, TSV\_U, CSV, CSV\_U   |
+----------------+--------------------+----------------------------+

Planned handlers/formats

+----------------+--------------------+-------------------------------+
| Format         | Name               | Available Handlers            |
+================+====================+===============================+
| Tabular Data   | ``text/tabular``   | XLS, XLSX, ODS, JSON, YAML    |
+----------------+--------------------+-------------------------------+
| HTML Data      | ``text/html``      | HTML (any via pandoc?)        |
+----------------+--------------------+-------------------------------+
| Archives       | ``archive``        | tar.gz, zip, tar              |
+----------------+--------------------+-------------------------------+
| Genomic Data   | TBD                | BioPython supported formats   |
+----------------+--------------------+-------------------------------+

Plain Text
~~~~~~~~~~

Simply a string. Put whatever you want in it

.. code:: python

    data = """Hello,
    Word"""

Tabular Data
~~~~~~~~~~~~

The top level object is a dict, with names being sheet names. It is
recommended that it match ``r'[A-Za-z0-9-]+'``, though that is not
strictly enforced currently. This concept is used to represent ``NxMxO``
dimensional data, where ``NxM`` represents a single table or sheet.

Each sheet consists of a dict containing two keys: 'headers' and 'data'.
Headers should contain a list of strings, and data should contain a list
of lists. Every value will be coerced into a string.

.. code:: python

    data = {
        'Sheet1': {
            'header': ['Key', 'Value'],
            'data': [
                ['some_key', 42],
            ]
        }
    }

Multiple Output Files
---------------------

Sometimes, you will not know many files you need to produce until
runtime. The following examples will use this output

.. code:: python

    ['test_output','Main output file of this utility',
        {
            'validate': 'File/Output',
            'required': True,
            'default': 'ggo_out.complex',
            'data_format': 'text/plain',
            'default_format': 'TXT',
        }
    ]

Separate History Items
~~~~~~~~~~~~~~~~~~~~~~

This can be accomplished via ``varCRR``:

.. code:: python

    of = OutputFiles(name='test_output', GGO=c)
    for i in range(10):
        data = "file %s" % (i,)
        of.varCRR(data=data, filename="file-" + str(i))

It is the developer's responsibility to ensure ``filename`` is specified
and unique. At the CLI, this translates into files named ``file-0.txt``
to ``file-9.txt``. Under the galaxy environment, these translate into
names like ``$__new_file_path__/primary_10000_file-2_visible_txt``. Be
careful not to use underscores as it will completely screw up filenames!
We currently don't test for this, though that will likely be introduced
in another release

Single History Item
~~~~~~~~~~~~~~~~~~~

.. code:: python

    of = OutputFiles(name='test_output', GGO=c)
    for i in range(10):
        data = "file %s" % (i,)
        of.subCRR(data=data, filename="file-" + str(i))

Sub files are contained in a folder, the folder name comes from
``--test_output_files_path``. By default this is ``sub.files_path``. The
names are accessible to you as a list in the return value of ``subCRR``
in order to facilitate proper linking. Support for this will be improved
in the future.