ChangesΒΆ

Version 1.3.3

  • Disabled spliced alignments by default

Version 1.3.2

  • Optimized the mapping and annotation steps

Version 1.3.1

  • Homopolymer miss-matches is a parameter now
  • Removing now also PolyNs (parameter)

Version 1.3.0

  • Added more methods to cluster UMIs
  • Optimized the UMI counting algorithm
  • Optimized the memory use

Version 1.2.6

  • Take into account soft-clipped bases when computing start/end positions

Version 1.2.5 * Changed the limit range of some parameters

Version 1.2.4

  • Fixed small bugs
  • Small improvements in st_qa.py and convertEnsemblToNames.py

Version 1.2.3

  • Bumped TaggD version
  • Added more stats to the dataset output
  • Added scripts to compute stats
  • Added new option for TaggD

Version 1.2.2

  • Fixed bugs in convertEnsemblToNames
  • Added some parameters for TaggD demultiplexing
  • Bumped version of TaggD

Version 1.2.1

  • Made homopolymers filters enabled by default
  • Added a test dataset to the docs

Version 1.2.0

  • Fixed a small bug in the deletion of the tmp folder

Version 1.1.7

  • Make sure to remove tmp files even if an error happens

Version 1.1.6

  • Fixed bug that would leave some files in /tmp
  • Allowed mis-matches when removing adaptors is now 2

Version 1.1.5

  • Removed some un-necessary parameters

Version 1.1.1

  • Simplified the two pass mode

Version 1.1.0

  • Added flag to discard reads mapping to anti-sense strand
  • Parameters for GC content filter instead of using the same value as AT content filter
  • Fixed a small bug in the logging of some parameters

Version 1.0.4

  • When removing adaptors (homopolymers streches) allow to up to 3 missmatches
  • Added GC content filter (same % as AT content)

Version 1.0.3

  • Fixed a minor bug in the counting of UMIs or - strand

Version 1.0.2

  • If no temp folder is given a new unique one is created on top of the execution folder
  • integrate createDataset.py into the code of the pipeline
  • Adjusted some parameters names and descriptions (no UMI is default)
  • Added sliding window when counting unique molecules
  • Added support for bzip

Version 1.0.1

  • Fixed small bug in the parsing of the umi quality parameter

Version 1.0.0

  • Added option to check for UMI quality
  • Optimized the UMI template check code
  • Optimized how the unique molecules are counted
  • Better stats for the quality filter step
  • Updated convertEnsemblToNames script
  • Updated stringdocs

Version 0.9.9

  • Small bug fixes

Version 0.9.6

  • Fixed a bug with the non ambiguous option
  • Fix a bug in the saturation computation

Version 0.9.5

  • When a R2 is trimmed its correspondant R1 is trimmed as well

Version 0.9.4

  • Fixed a stupid bug in the compute saturation option

Version 0.9.3

  • Changed the rRNA filter so the BAM output does not need to be sorted

Version 0.9.2

  • Fixed a bug in the parsing of parameters

Version 0.9.1

  • Fixed a small bug with the location of discarded files

Version 0.9.0

  • Replaced JSON for data frame in the output format
  • Replaced python gzip for system call (faster)
  • Changed the logic of how the filenames are stored and handled

Version 0.8.9

  • Improved the error messages and error handling

Version 0.8.8

  • Removed barcodes IDs from the output file

Version 0.8.7

  • Updated comments, manual and license
  • Small improvements

Version 0.8.5

  • Fixed a bug in the computation of saturation curves

Version 0.8.4

  • Added a normal hash with INT keys to increase speed and reduce memory
  • Using the gene_id for annotation again

Version 0.8.3

  • Added parameter for strandness in annotation (yes by default)
  • Simplified a bit the quality trimming step (do not account for user input trimmed bases)

Version 0.8.2

  • Added stats for annotated reads
  • Replaced shelve dict for sqldict
  • Fixed some small bugs in the annotation

Version 0.8.1

  • Removed the pair mode keep option
  • Removed un-neccessary pair mode and mapped checks after alignment

Version 0.8.0

  • Added option to do the STAR 2 pass mode
  • Removed option to run pipeline without IDs
  • Speed improvements
  • Perform demultiplex after mapping
  • No attaching the barcode to reverse reads
  • Removing some parameters
  • Some improvements in stDataPlotter
  • Option to use BAM format
  • Removed annotation filtering step
  • Removed forward trimming parameters
  • Output gene names even with ENSEMBL

Version 0.7.7

  • Small memory improvements
  • Updates in plotting script

Version 0.7.6

  • End coordinates now contain the whole read length
  • Make annotation strand aware (reverse)
  • Updated to STAR 2.5

Version 0.7.5

  • Fixed a small bug

Version 0.7.4

  • Added some memory improvements

Version 0.7.3

  • Added parameters for inverse trimming
  • Memory and speed optimizations in createDatasets
  • Added option for low_memory use

Version 0.7.2

  • Added unique genes to saturation points
  • Added option to keep non-annotated reads

Version 0.7.1

  • Fixed some small bugs

Version 0.7.0

  • Fixed a bug in the saturation points
  • Removed counttrie as option for clustering
  • Updated and improved CTTS scripts
  • Updated datfa plotter color list

Version 0.6.9

  • Fixed a bug in the saturation points

Version 0.6.8

  • Improved speed and memory in createDatasets
  • Changed saturation points to fixed values that grow exp
  • Improved speed in computation of saturation points
  • Small bug fixes
  • Upgraded json2Scatter with many improvements
  • Rename json2scatter to stDataPlotter

Version 0.6.7

  • Fixed a bug in the hierarchical clustering
  • Added the input parameter to qa_stats
  • Append experiment name to output files
  • Added option to compute saturation points
  • Added tool to plot stdata and clusters with aligned image

Version 0.6.6

  • Fixed a bug in the hierarchical clustering
  • Fixed a bug in the printed stats

Version 0.6.5

  • Fixed a bug in retrieving the version of the software
  • Added time stamps in different steps
  • Added a UMI template quality filter

Version 0.6.4

  • Fixed a bug in counttrie clustering method
  • Improved sorting of molecular barcodes prior clustering
  • Added hiearachical clustering option

Version 0.6.3

  • Removed reads.json
  • Added qa_stats.json to the output
  • Restored old versioning system
  • Removed hadoop related stuff
  • Added support for gziped input files

Version 0.6.2

  • Improved the log a bit
  • Added parameters for max,min intron size and max gap size

Version 0.6.1

  • Fixed some bugs in the prefix tree

Version 0.5.9

  • Added an option to find molecular barcodes clusters using a prefix tree

Version 0.5.8

  • Fixed a bug in the function to retrieve the pipeline version

Version 0.5.7

  • Fixed a bug with --disable-multimap option

Version 0.5.6

  • Fixed a typo in a parameter
  • Fixed a bug that caused some parameters to not work

Version 0.5.5

  • Added some extra debugging info in createDatasets
  • Output the read name in the BED output file
  • Changed --allowed-kimera for --allowed-kmer
  • Added version as parameter and log message

Version 0.5.4

  • Added parameter to disable soft clipping in mapping
  • Disable softclipping in rRNA filter
  • Make sure that discarded reads after rRNA filter are replaced by Ns
  • Improved stats info a bit

Version 0.5.3

  • Bumped Taggd to 0.2.2

Version 0.5.2

  • Fixed a bug in the rRNA filter that would cause to not discard rRNA mapped reads

Version 0.5.1

  • Added check when UMI is the same as barcode
  • Added more stats
  • Added percentiles distributiosn stats for createDAtaset
  • Added support for BAM and SAM (not functional now)
  • Added option to disable multiple aligned reads
  • Fixed a bug in the bed file

Version 0.5.0

  • Added AT content filter in quality trimming
  • Added min mapped length filter after mapping
  • Make sure one of the multiple aligned reads is set as not multiple

aligned so it can be annotated * Discard the other multiple aligned reads after mapping * Disable sorting * Restored back to use gene_id as column for annotation

Version 0.4.9

  • Changed naming convention
  • Added support for normal RNA analysis

Version 0.4.8

  • Improved STAR configuration
  • Added mapping post processing to filter out and adjust reversed reads
  • Changed to use gene_name for annotation
  • Fixed some bugs and some improvements
  • Fixed bugs in the trimming

Version 0.4.7

  • Improved stats
  • Fixed a bug that would remove original input files
  • Added a script to convert ENSEMBL ids to gene names

Version 0.4.6

  • Fixed a bug that would not compute the number of discarded reads when using molecular barcodes

Version 0.4.5

  • Fixed a bug in the barcodes JSON output

Version 0.4.4

  • Fixed a bug in the molecular barcodes algorithm
  • Fixed a bug that would keep the original fastq reads in the system
  • Update taggd version

Version 0.4.3

  • Small improvements with error checking and log in the mapping
  • Fixed a bug that would remove the file after filtering annoted reads
  • Make the sorting by name instead by position due to a bug in htseq-count

Version 0.4.2

  • Fixed a bug in the capture of parameters

Version 0.4.1

  • Improved the logs
  • Fixed few bugs

Version 0.4.0

  • Added back taggd
  • Added BED file to output
  • Added STAR
  • Optimized workflow
  • do rRNA filter first
  • Optimized annotation
  • Optimized trimming
  • Output reads do not contain duplicates

Version 0.3.9

  • Allowing molecular barcodes to be before the barcodes

Version 0.3.8

  • Added back findIndexes

Version 0.3.7

  • Removed cutadapt dependency

Version 0.3.6

  • Fixed a bug in the installation

Version 0.3.5

  • Added options to remove PolyC fix bugs in adaptors removal

Version 0.3.4

  • Added test for STAR and STAR binary to dependencies
  • Added TAGGD and removed findIndexes
  • Improved install script
  • Added options to remove adaptors (PolyA, PolyT and PolyG)
  • Exchanged Bowtie as primary mapper with STAR.

Version 0.3.3

  • Added option to keep files with discarded reads/barcodes
  • Internal refactoring and optimization

Version 0.3.2

  • Outputted reads JSON now only has the portion of the read that was used to map
  • Cutadapt is integrated but only using the quality trimming for now
  • Internal refactoring and optimizations

Version 0.3.1

  • Added small unit-test for molecular barcodes
  • Added more molecular barcodes algorithms (using a naive one for now)
  • Fixed small issues in JSON parsing libraries

Version 0.3.0

  • Rewrite createDatasets.py
  • Clean up repository and deprecated files
  • Change the unit-test library and structure
  • Refactor the unit-test (use pipeline API instead of command line calls)
  • Ensure unit-test remove tmp files when failing
  • Add better error handling
  • Add unit-test for Molecular Barcodes
  • Add Molecular Barcodes functionality
  • General refactor and clean up
  • Add invoke options (clean, build, install)
  • Fix an important bug in createDatasets that caused incorrect computation of reads counts

Version 0.2.5

  • Improved installers
  • Small bug fixes
  • Added basic unit-test to do a run of the pipeline

Version 0.2.4

  • Some optimizations and bug fixes

Version 0.2.3

  • Fixed a error with new version of HTSeq-count that will discard more reads

Version 0.2.2

  • Added extra parameters
  • Fixed some typos
  • Fixed a bug that caused to remove some bases from the barcode ID in the rv reads

Version 0.2.1

  • code refactored and modularized
  • add argparse for parameters parsing
  • add API for Amazon EMR and terminal version
  • better error handling
  • optimized code
  • new version of FindIndexes
  • remove dependencies
  • added proper installers and documentation