Extend Builtin Pipeline

We prepare a example under ngcloud_src/examples/gatk/ to show how to extend a builtin pipeline to display GATK pipeline results.

There are many reason to extend a exited pipeline:

  • To use the well-designed report layout
  • To Build a new pipeline by reusing existed parts
  • To extend certain pipeline’s functionality

In the following, we will introduce the difference from Custom a Pipeline from Ground up and explore more NGCloud’s APIs in detail.

Reues builtin templates

By extending existed pipeline, especially builtin ones, there are mainly two things to remember:

Using get_shared_template_root() and get_shared_static_root() makes sure you can always find the correct path to builtin files no matter how you install NGCloud.

Here is a simple example to show how to re-use builtin templates.

from pathlib import Path
from ngcloud.report import Stage, Report
from ngcloud.pipe import (
    get_shared_template_root, get_shared_static_root
)

class MyBaseStage(Stage):
    template_find_paths = [
        Path('myreport', 'templates'),
        get_shared_template_root(),
    ]

class MySomeStage(MyBaseStage):
    template_entraces = 'some_stage.html'

class MyReport(Report):
    stage_classnames = [MySomeStage]
    static_roots = [
        Path('myreport', 'templates'),
        get_shared_static_root(),
    ]

Highlighted lines are modifications from Custom a Pipeline from Ground up. For template_fine_paths, you should put your own paths before builtin to make sure your customs templates are always searched first.

Note

Notice the hieracrhical order of template find paths.

Templates are first looked up in the path coming first. If some templates are not found, the template engine will look up in the next find path. Finally a jinja2.TemplateNotFound is thrown after all find paths being searched in vain.

If the render results are unexpected, it may be the filename conflicts between builtin and custom tempaltes. A common situation is unexpectedly overwrite some templates that are needed for builtin report parts.

Reuse builtin stages

Reusing some stages from other pipelines is easy. Take quality control (QC) stage as example, almost every pipeline requires QC. Therefore using builtin QC can ease lots of builing-the-wheel efforts.

Currently the QCStage is defined inside ngcloud.pipe.tuxedo. The stage also ships with the stage-specific static files copying logics, so one don’t need to mind how to collect the figures to show in report.

To make sure full QCStage functionality works, one may first include the stage directly,

from ngcloud.pipe import tuxedo

class MyReport(Report):
    stage_classnames = [tuxedo.QCStage, MySomeStage]
    # ...

But it will generate some problems. First, path for tuxedo’s template find paths are not specified. Second, the wrong display of the stage pipes in report. Both will fail the qcstage to work properly.

To correctly display the our custom stage pipes, which is originally specified by _stage_pipe.html template, we could defined a new _stage_pipe.html in our template find path so it will override the builtin one.

So the new code looks like this,

from ngcloud.pipe import tuxedo

class MyQCStage(tuxedo.QCStage):
    template_find_paths = (
        MyBaseStage.template_find_paths[:1] +   # include only custom template find path
        tuxedo.QCStage.template_find_paths
    )

We inherit the QCStage class, and carefully treat the template find paths here. Only first path in MyBaseStage is included so the builtin shared templates will not come before tuxedo-specific templates.

Override builtin template behaviour

Warning

Do it with care. Builtin templates are less documented.

You could always change the builtin template’s behavior by overriding methods in subclass. But generally you would like to preserve most of their functionality.

If we want to extend the Stage.parse behavior of QCStage, here’s the Pythonic trick:

class MyQCStage(tuxedo.QCStage):
    # template_find_paths = ...

    def parse(self):
        super(tuxedo.QCStage, self).parse()
        # write the custom logics here
        self.result_info.update({
            'my_desired_property', None
        })

Then when a report is being generated, MyQCStage’s parse() will be called, in which the original QCStage’s parse() is first performed through super(), followed by our custom logics.

By making this trick, you could always insure that all original logics have been preserved.