Reference Databases Needed

To run the pipelines, you will need to have reference databases installed on your cluster. If you are using the AWS installation, these databases are provided for you. If you need to install your references, please install the ones below. Omics Pipe is compatible with all species genome files. Examples below are for hg19, but you can substitute them for the equivalent files from other species.

All Pipelines

Genome

Reference Annotation Files

You can use any reference annotations you would like, as long as they are GTF files.

Examples include:

  • gencode.v18.annotation.gtf
  • UCSC genes.gtf

Reference Data for Cancer Reporting Scripts (RNAseq cancer, TCGA pipelines)

For the cancer pipelines, please download the file from the link below, extract it and put the files in the respective directories. Reporting_data

In your omics pipe installation directory under omics_pipe/scripts/reporting/ref place the files.

In your omics pipe installation directory under omics_pipe/scripts/reporting/data place the remaining files.

  • brca_mol_class/*
  • DoG/*
  • geneLists/*
  • SPIA
  • deseq.tcga_brca.Rdata
  • loggeoameansBRCA.Rdata

References for Variants (RNA-seq cancer, RNA-seq cancer TCGA, WES and WGS pipelines)

For pipelines performing variant calling, please download the references below and put them in the specified directories.

You can put these files in any directory. You will point to their location in the parameters file.

Available within the GATK recource bundle v.2.5:

  • dbsnp_137.hg19.vcf
  • Mills_and_1000G_gold_standard.indels.hg19.vcf
  • 1000G_phase1.indels.hg19.vcf
  • hapmap_3.3.hg19.vcf
  • 1000G_omni2.5.hg19.vcf

In your omics pipe installation directory under omics_pipe/scripts/reporting/ref place these files.

from PharmGKB:

  • pharmgkbAllele.tsv
  • pharmgkbRSID.csv

ChIP-seq Pipelines

SNPiR Pipelines (RNA-seq cancer and RNA-seq cancer TCGA pipelines)

  • BWA Index
  • RNA editing sites (Human_AG_all_hg19.bed)
  • RepeatMasker.bed
  • anno_combined_sorted
  • knowngene.bed