Changes
-------
**Version 1.3.3**

* Disabled spliced alignments by default

**Version 1.3.2**

* Optimized the mapping and annotation steps

**Version 1.3.1**

* Homopolymer miss-matches is a parameter now
* Removing now also PolyNs (parameter)

**Version 1.3.0**

* Added more methods to cluster UMIs
* Optimized the UMI counting algorithm
* Optimized the memory use

**Version 1.2.6**

* Take into account soft-clipped bases when computing start/end positions

**Version 1.2.5**
* Changed the limit range of some parameters

**Version 1.2.4**

* Fixed small bugs
* Small improvements in st_qa.py and convertEnsemblToNames.py

**Version 1.2.3**

* Bumped TaggD version
* Added more stats to the dataset output
* Added scripts to compute stats
* Added new option for TaggD

**Version 1.2.2**

* Fixed bugs in convertEnsemblToNames
* Added some parameters for TaggD demultiplexing
* Bumped version of TaggD

**Version 1.2.1**

* Made homopolymers filters enabled by default
* Added a test dataset to the docs


**Version 1.2.0**

* Fixed a small bug in the deletion of the tmp folder

**Version 1.1.7**

* Make sure to remove tmp files even if an error happens

**Version 1.1.6**

* Fixed bug that would leave some files in /tmp
* Allowed mis-matches when removing adaptors is now 2

**Version 1.1.5**

* Removed some un-necessary parameters

**Version 1.1.1**

* Simplified the two pass mode

**Version 1.1.0**

* Added flag to discard reads mapping to anti-sense strand
* Parameters for GC content filter instead of using the same value as AT content filter
* Fixed a small bug in the logging of some parameters

**Version 1.0.4**

* When removing adaptors (homopolymers streches) allow to up to 3 missmatches
* Added GC content filter (same % as AT content)

**Version 1.0.3**

* Fixed a minor bug in the counting of UMIs or - strand

**Version 1.0.2**

* If no temp folder is given a new unique one is created on top of the execution folder
* integrate createDataset.py into the code of the pipeline
* Adjusted some parameters names and descriptions (no UMI is default)
* Added sliding window when counting unique molecules
* Added support for bzip

**Version 1.0.1**

* Fixed small bug in the parsing of the umi quality parameter

**Version 1.0.0**

* Added option to check for UMI quality
* Optimized the UMI template check code
* Optimized how the unique molecules are counted
* Better stats for the quality filter step
* Updated convertEnsemblToNames script
* Updated stringdocs

**Version 0.9.9**

* Small bug fixes

**Version 0.9.6**

* Fixed a bug with the non ambiguous option
* Fix a bug in the saturation computation

**Version 0.9.5**

* When a R2 is trimmed its correspondant R1 is trimmed as well

**Version 0.9.4**

* Fixed a stupid bug in the compute saturation option

**Version 0.9.3**

* Changed the rRNA filter so the BAM output does not need to be sorted

**Version 0.9.2**

* Fixed a bug in the parsing of parameters

**Version 0.9.1**

* Fixed a small bug with the location of discarded files

**Version 0.9.0**

* Replaced JSON for data frame in the output format
* Replaced python gzip for system call (faster)
* Changed the logic of how the filenames are stored and handled

**Version 0.8.9**

* Improved the error messages and error handling

**Version 0.8.8**

* Removed barcodes IDs from the output file

**Version 0.8.7**

* Updated comments, manual and license
* Small improvements

**Version 0.8.5**

* Fixed a bug in the computation of saturation curves

**Version 0.8.4**

* Added a normal hash with INT keys to increase speed and reduce memory
* Using the gene_id for annotation again

**Version 0.8.3**

* Added parameter for strandness in annotation (yes by default)
* Simplified a bit the quality trimming step (do not account for user input trimmed bases)

**Version 0.8.2**

* Added stats for annotated reads
* Replaced shelve dict for sqldict
* Fixed some small bugs in the annotation

**Version 0.8.1**

* Removed the pair mode keep option
* Removed un-neccessary pair mode and mapped checks after alignment

**Version 0.8.0**

* Added option to do the STAR 2 pass mode
* Removed option to run pipeline without IDs
* Speed improvements
* Perform demultiplex after mapping
* No attaching the barcode to reverse reads
* Removing some parameters
* Some improvements in stDataPlotter
* Option to use BAM format
* Removed annotation filtering step
* Removed forward trimming parameters
* Output gene names even with ENSEMBL

**Version 0.7.7**

* Small memory improvements
* Updates in plotting script

**Version 0.7.6**

* End coordinates now contain the whole read length
* Make annotation strand aware (reverse)
* Updated to STAR 2.5

**Version 0.7.5**

* Fixed a small bug

**Version 0.7.4**

* Added some memory improvements

**Version 0.7.3**

* Added parameters for inverse trimming
* Memory and speed optimizations in createDatasets
* Added option for low_memory use

**Version 0.7.2**

* Added unique genes to saturation points
* Added option to keep non-annotated reads

**Version 0.7.1**

* Fixed some small bugs

**Version 0.7.0**

* Fixed a bug in the saturation points
* Removed counttrie as option for clustering
* Updated and improved CTTS scripts
* Updated datfa plotter color list

**Version 0.6.9**

* Fixed a bug in the saturation points

**Version 0.6.8**

* Improved speed and memory in createDatasets
* Changed saturation points to fixed values that grow exp
* Improved speed in computation of saturation points
* Small bug fixes
* Upgraded json2Scatter with many improvements
* Rename json2scatter to stDataPlotter

**Version 0.6.7**

* Fixed a bug in the hierarchical clustering
* Added the input parameter to qa_stats
* Append experiment name to output files
* Added option to compute saturation points
* Added tool to plot stdata and clusters with aligned image

**Version 0.6.6**

* Fixed a bug in the hierarchical clustering
* Fixed a bug in the printed stats

**Version 0.6.5**

* Fixed a bug in retrieving the version of the software
* Added time stamps in different steps
* Added a UMI template quality filter

**Version 0.6.4**

* Fixed a bug in counttrie clustering method
* Improved sorting of molecular barcodes prior clustering
* Added hiearachical clustering option

**Version 0.6.3**

* Removed reads.json
* Added qa_stats.json to the output
* Restored old versioning system
* Removed hadoop related stuff
* Added support for gziped input files

**Version 0.6.2**

* Improved the log a bit
* Added parameters for max,min intron size and max gap size

**Version 0.6.1**

* Fixed some bugs in the prefix tree

**Version 0.5.9**

* Added an option to find molecular barcodes clusters using a prefix tree

**Version 0.5.8**

* Fixed a bug in the function to retrieve the pipeline version

**Version 0.5.7**

* Fixed a bug with --disable-multimap option

**Version 0.5.6**

* Fixed a typo in a parameter
* Fixed a bug that caused some parameters to not work

**Version 0.5.5**

* Added some extra debugging info in createDatasets
* Output the read name in the BED output file
* Changed --allowed-kimera for --allowed-kmer
* Added version as parameter and log message

**Version 0.5.4**

* Added parameter to disable soft clipping in mapping
* Disable softclipping in rRNA filter
* Make sure that discarded reads after rRNA filter are replaced by Ns
* Improved stats info a bit

**Version 0.5.3**

* Bumped Taggd to 0.2.2

**Version 0.5.2**

* Fixed a bug in the rRNA filter that would cause to not discard rRNA mapped reads

**Version 0.5.1**

* Added check when UMI is the same as barcode
* Added more stats
* Added percentiles distributiosn stats for createDAtaset
* Added support for BAM and SAM (not functional now)
* Added option to disable multiple aligned reads
* Fixed a bug in the bed file

**Version 0.5.0**

* Added AT content filter in quality trimming
* Added min mapped length filter after mapping
* Make sure one of the multiple aligned reads is set as not multiple
aligned so it can be annotated
* Discard the other multiple aligned reads after mapping
* Disable sorting
* Restored back to use gene_id as column for annotation

**Version 0.4.9**

* Changed naming convention
* Added support for normal RNA analysis

**Version 0.4.8**

* Improved STAR configuration
* Added mapping post processing to filter out and adjust reversed reads
* Changed to use gene_name for annotation
* Fixed some bugs and some improvements
* Fixed bugs in the trimming

**Version 0.4.7**

* Improved stats
* Fixed a bug that would remove original input files
* Added a script to convert ENSEMBL ids to gene names

**Version 0.4.6**

* Fixed a bug that would not compute the number of discarded reads when using molecular barcodes

**Version 0.4.5**

* Fixed a bug in the barcodes JSON output

**Version 0.4.4**

* Fixed a bug in the molecular barcodes algorithm
* Fixed a bug that would keep the original fastq reads in the system
* Update taggd version

**Version 0.4.3**

* Small improvements with error checking and log in the mapping
* Fixed a bug that would remove the file after filtering annoted reads
* Make the sorting by name instead by position due to a bug in htseq-count

**Version 0.4.2**

* Fixed a bug in the capture of parameters

**Version 0.4.1**

* Improved the logs
* Fixed few bugs

**Version 0.4.0**

* Added back taggd
* Added BED file to output
* Added STAR
* Optimized workflow
* do rRNA filter first
* Optimized annotation
* Optimized trimming
* Output reads do not contain duplicates

**Version 0.3.9**

* Allowing molecular barcodes to be before the barcodes

**Version 0.3.8**

* Added back findIndexes

**Version 0.3.7**

* Removed cutadapt dependency

**Version 0.3.6**

* Fixed a bug in the installation

**Version 0.3.5**

* Added options to remove PolyC fix bugs in adaptors removal

**Version 0.3.4**

* Added test for STAR and STAR binary to dependencies
* Added TAGGD and removed findIndexes
* Improved install script
* Added options to remove adaptors (PolyA, PolyT and PolyG)
* Exchanged Bowtie as primary mapper with STAR.

**Version 0.3.3**

* Added option to keep files with discarded reads/barcodes
* Internal refactoring and optimization

**Version 0.3.2**

* Outputted reads JSON now only has the portion of the read that was used to map
* Cutadapt is integrated but only using the quality trimming for now
* Internal refactoring and optimizations

**Version 0.3.1**

* Added small unit-test for molecular barcodes
* Added more molecular barcodes algorithms (using a naive one for now)
* Fixed small issues in JSON parsing libraries

**Version 0.3.0**

* Rewrite createDatasets.py
* Clean up repository and deprecated files
* Change the unit-test library and structure
* Refactor the unit-test (use pipeline API instead of command line calls)
* Ensure unit-test remove tmp files when failing
* Add better error handling
* Add unit-test for Molecular Barcodes
* Add Molecular Barcodes functionality
* General refactor and clean up
* Add invoke options (clean, build, install)
* Fix an important bug in createDatasets that caused incorrect computation of reads counts

**Version 0.2.5**

* Improved installers
* Small bug fixes
* Added basic unit-test to do a run of the pipeline

**Version 0.2.4**

* Some optimizations and bug fixes

**Version 0.2.3**

* Fixed a error with new version of HTSeq-count that will discard more reads

**Version 0.2.2**

* Added extra parameters
* Fixed some typos
* Fixed a bug that caused to remove some bases from the barcode ID in the rv reads

**Version 0.2.1**

* code refactored and modularized
* add argparse for parameters parsing
* add API for Amazon EMR and terminal version
* better error handling
* optimized code
* new version of FindIndexes
* remove dependencies
* added proper installers and documentation