Custom a Pipeline from Ground up

The whole process can be considered as three main steps:

  • Design Jinja2 templates to determine how your report looks like in a web browser
  • Translated all NGS tools that its output needs to display into a new class that inherits ngcloud.report.Stage.
  • Create a new class inherits ngcloud.report.Report to collect all your needed stages.

You will create your first pipe after this section. Here we are going to create a new pipeline with two stages. See ngcloud_src/examples/my_first_pipe for the final result.

Goal

In this pipeline, there will be two stages:

  • index: shows hello message and the job ID
  • mystage: lists sample in use and show the overall alignment rate

Overall alignment rate is logged inside a output file summary.txt, represented as some NGS results we’d like to extract and shown them in our reports.

General folder structure

We organize all the files needed in our recommended folder structure:

my_first_pipe/
├── report/
│   ├── static/
│   └── templates/
│       ├── base.html
│       ├── index.html
│       └── mystage.html
├── job_demo_result/
│   ├── my_stage/
│   │   └── summary.txt
│   └── job_info.yaml
└── mypipe.py

mypipe.py is where Python scripts and report logics are stored. The filename mypipe needs to match the Python module naming rule so later we could import it.

job_demo_result we fake a NGS result, assuming it follows the mypipe pipeline.

report/ folder keeps all the template-related files.

Insides report folder, all templates go under templates/, if you have some experience with web’s template engine you will find it familiar. In fact, Jinja2 is indeed a awesome template engine that used by many Python-powered websites.

Other files such as CSS or JS that decorate the reports go under static/. These are stuff that we used in the report but we aren’t likely to modify it. Therefore they are called as static files.

We will explain each part in the following sections.

Organize the NGS result

Since we only have a stage mystage that holds output files, the result folder is rather simple. We set the root folder as job_demo_result. The folder looks like this:

my_first_pipe/job_demo_result/
├── my_stage/
│   └── summary.txt
└── job_info.yaml

summary.txt holds the NGS output by my_stage, and is where we need to extract the overall alignment rate from.

Mis-alignment rate: 6.09%
Overall alignment rate: 80.56%

job_info.yaml stores how the result is performed. Currently only the sample info is stored.

job_type: mypipe
job_id: My First Stage DEMO
sample_list:
    - ngs_A:
        pair_end: R1
    - ngs_A:
        pair_end: R2
    - ngs_B:
        pair_end: False

Two samples are used, ngs_A is pair-end while ngs_B is single-ended. ID and job type are also stated.

Write stage template

First we look back on what we want for our first pipeline:

  • An index page (index.html) shows hello message
  • A stage page (mystage.html) prints some value from our fake NGS results.

Before we really touch these two templates, first we create a base template to store the common part. You will soon see the benefit maintaining such templates. We called it base.html.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width">
    <title>{% block title %}Base Template{% endblock %}</title>
  </head>
  <body>
    <p>| <a href="index.html">Index</a> | <a href="mystage.html">MyStage</a> |</p>
    {% block content %}{% endblock %}
  </body>
</html>

So base.html is just a mini working HTML5 file. By declaring blocks title and content, later base.html can be inherited by stage templates and their content can be override.

A simple site map, linking two stages together, is added here to spread across all stages. So report readers can use this site map to jump between these stages.

index.html will be built upon base.html.

{% extends 'base.html' %}

{% block title %}Homepage of My Pipeline{% endblock %}

{% block content %}
  <h2>Report for job id {{ job_info.id }}</h2>
  <p>Hello NGCloud report!</p>
{% endblock %}

First we extend the base templates and overwrite the title block. As for content block, we show a hello message, and we left a variable {{ job_info.id }} to represent the job id. When a report is being rendered, job_info, a JobInfo object, will be passed so we could use its attribute.

In mystage.html we will use this variable passing mechanism more extensively.

Since job_info contains the sample list, we would like to print them out. Also, we wish to display the overall alignment rate.

Here is how to get it done. First we make the sample list,

{% block content %}
  <h2>Sample being used</h2>
  <ul>
  {% for sample in job_info.sample_list %}
    <li>{{ sample.full_name }}</li>
  {% endfor %}
  </ul>
{% endblock %}

By using Jinja2’s for-loop control structure, we could extract the sample list from job_info, in each is a Sample object. So we use sample’s full name to display our sample list.

As for showing the overall alignment rate, since it is not some default information, we need to create our own variables and passed it into templates explicitly.

By default, a dict-like variable result_info will always be passed and all its keys will be exposed for access, and we could modify the key-value pairs it holds during ngcloud.report.Stage.parse(). We name the key representing the overall alignment rate as mapped_rate. Therefore in template we could

{% block content %}
  <h2>Alignment Summary</h2>
  <p>Overall mapped rate: {{ mapped_rate }}</p>
{% endblock %}

Then we join these two piece together in a same content block. And we finished our first pipeline template design.

See also

ngcloud.report.Stage.render() talks more about the mechanism passing job_info and keys of result_info.

Write the report logics

Finally, we are going to connect all parts together by writing a Python module mypipe.py that inherits NGCloud architecture. Stage and pipeline have their corresponding class in NGCloud: Stage and Report.

Warning

You should write the following Python code in a script file (in this example it is named as mypipe.py). Copy-pasting into a Python interpreter will easily fail:

>>> from pathlib import Path
>>> Path(__file__).parent
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '__file__' is not defined

Index stage

NGCloud has handled all necessary logics for rendering template, copy static files and finding NGS result files. We take this advantage by inheriting NGCloud’s predefined class.

First we deal with the index stage, name a new class IndexStage inheriting ngcloud.report.Stage. Then, specify the path to look up templates and the entrance template filename.

from pathlib import Path
from ngcloud.report import Stage, Report

here = Path(__file__).parent
template_root = here / "report" / "templates"

class IndexStage(Stage):
    template_find_paths = template_root
    template_entrances = "index.html"

Since the final rendered web page output of IndexStage is index.html, the template_entraces is set as index.html.

Then, folder that stores the related templates should be specifies, so the template engine can know where they are. This folder should be the same for all stages in this pipeline, so we create a variable template_root to store the location and passes to the attribute template_find_paths.

Note

Using __file__, which always points to the location of mypipe.py, can help set the path to template’s folder correctly.

IndexStage only uses builtin template variables (that is, job_info) to render its template. So that’s done! No further setting needed.

MyStage stage

In IndexStage we have specified most needed configuration, so MyStage, which deals with the second stage, can be easily set.

class MyStage(Stage):
    template_find_paths = template_root
    template_entrances = "mystage.html"

What’s different to IndexStage is we need to passed a custom template variable mapped_rate in MyStage to show the overall alignment rate.

First we achieve our goal in a cheating way,

class MyStage(Stage):
    template_find_paths = template_root
    template_entrances = "mystage.html"

    def parse(self):
        self.result_info['mapped_rate'] = "50%"

By adding a new key in result_info during MyStage’s parse(), the key will be passed as a template variable when rendering.

See also

See ngcloud.report.Stage for all the functionalities it provides.

Reading result info

Previous way is cheating, we here really parse the summary.txt to find out the overall alignment rate.

class MyStage(Stage):
    template_find_paths = template_root
    template_entrances = "mystage.html"

    def parse(self):
        mapped_rate = "(Unknown)"
        summary_txt = self.job_info.root_path / "my_stage" / "summary.txt"
        with summary_txt.open() as summary:
            for line in summary:
                if line.startswith("Overall"):
                    mapped_rate = line.strip().split(': ')[-1]

        self.result_info['mapped_rate'] = mapped_rate

Path to root folder of the results can be obtained by accessing job_info.root_path, which is a Path object. Then we could locate summary.txt correctly.

After parsing the txt file, we pass the alignment rate through result_info.

Pipeline MyReport

Finally, we combine all defined stages into our first pipeline, MyReport:

class MyReport(Report):
    stage_classnames = [IndexStage, MyStage]
    static_roots = here / "report" / "static"

The configuration is as simple as what we’ve done with stages. stage_classnames specifies the class name of stages to be used, and static_roots points to the static file folder.

Warning

Make sure it is class name not initiated objects specified in stage_classnames. Stage objects are created automatically during runtime.

See also

See ngcloud.report.Report for all the functionalities it provides.

Generate the report

Congrats! Now this is a fully working pipeline. Checkout your results with our example under examples/my_first_pipe. Get the result by running the following command at the same folder as mypipe.py:

ngreport -p mypipe.MyReport job_demo_result

The rendered report will be under ./output/report_job_info.id.

Further binding with NGCloud’s report

If you want to extend pipelines currently provided by NGCloud, or to use the NGCloud template theme. Go on reading Extend Builtin Pipeline to find out how.