Prepare Your NGS Result for NGCloud

Organize the result folder

A standard result folder is like:

job_tuxedo_minimal
├── 1_fastqc
│   ├── output
│   └── overall
├── 2_tophat_176
│   ├── output
│   └── run.log
├── 3_tophat_183
│   ├── output
│   └── run.log
│   (skip other stages) ...
└── job_info.yaml

Specify the job_info.yaml

File job_info.yaml follows YAML syntax, which stores how the NGS analysis pipeline is performed and the sample information. YAML is a human readable format to store natural data structure. The following is the YAML content of our minimal example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
job_type: tuxedo
job_id: 9527
sample_list:
# TODO: accept conditions, sample_group
    - SRR1027176_1:
        pair_end: R1
    - SRR1027176_2:
        pair_end: R2
    - SRR1027183_1:
        pair_end: R1
    - SRR1027183_2:
        pair_end: R2

It specifies:

  • job_type: pipeline that the NGS result is performed
  • job_id: an auto-generated job ID. If that’s a custom result, type in a summary name for this job.
  • sample_list: list of samples used in this job and their properties.

Sample list

Warning

Mind the format and indentation how a sample_list is specified.

sample_list:
    - <sample_name>:
        <property_A>: <value>
    - <sample_name>:
        <property_A>: <value>
        <property_C>: <value>

A general sample_list structure is like the above example. Some properties may be shared across samples such as pair_end, while some may be unique for certain samples such as stranded. Only the specified properties will be defined.

Pair-end

For now, pair_end, the pair-end type of a sample, is highly recommended to be specified for all samples. If a sample is single-ended sequenced, set the value to False:

- single_end_sample:
    pair_end: False

While for a pair-end sample, one needs to create separate records for both ends:

- a_pairend_sample:
    pair_end: R1
- a_pairend_sample:
    pair_end: R2

Of course, if only one end of a pair-end sample is used, only one record needs to be created.

To find out what properties can be set to a sample, see ngcloud.info.Sample for more info.

Note

Currently, there is nothing important but sample list to be specify in job_info.yaml. While as report templates evolve and use more arguments and information, there will be more requirements about the format.