Prepare Your NGS Result for NGCloud¶
Organize the result folder¶
A standard result folder is like:
job_tuxedo_minimal
├── 1_fastqc
│ ├── output
│ └── overall
├── 2_tophat_176
│ ├── output
│ └── run.log
├── 3_tophat_183
│ ├── output
│ └── run.log
│ (skip other stages) ...
└── job_info.yaml
Specify the job_info.yaml¶
File job_info.yaml follows YAML syntax, which stores how the NGS analysis pipeline is performed and the sample information. YAML is a human readable format to store natural data structure. The following is the YAML content of our minimal example:
1 2 3 4 5 6 7 8 9 10 11 12 | job_type: tuxedo
job_id: 9527
sample_list:
# TODO: accept conditions, sample_group
- SRR1027176_1:
pair_end: R1
- SRR1027176_2:
pair_end: R2
- SRR1027183_1:
pair_end: R1
- SRR1027183_2:
pair_end: R2
|
It specifies:
- job_type: pipeline that the NGS result is performed
- job_id: an auto-generated job ID. If that’s a custom result, type in a summary name for this job.
- sample_list: list of samples used in this job and their properties.
Sample list¶
Warning
Mind the format and indentation how a sample_list is specified.
sample_list:
- <sample_name>:
<property_A>: <value>
- <sample_name>:
<property_A>: <value>
<property_C>: <value>
A general sample_list structure is like the above example. Some properties may be shared across samples such as pair_end, while some may be unique for certain samples such as stranded. Only the specified properties will be defined.
Pair-end¶
For now, pair_end, the pair-end type of a sample, is highly recommended to be specified for all samples. If a sample is single-ended sequenced, set the value to False:
- single_end_sample:
pair_end: False
While for a pair-end sample, one needs to create separate records for both ends:
- a_pairend_sample:
pair_end: R1
- a_pairend_sample:
pair_end: R2
Of course, if only one end of a pair-end sample is used, only one record needs to be created.
To find out what properties can be set to a sample, see ngcloud.info.Sample for more info.
Note
Currently, there is nothing important but sample list to be specify in job_info.yaml. While as report templates evolve and use more arguments and information, there will be more requirements about the format.