The continuing growth and development of nucleic acid sequencing technologies has led to many insights, but with this progress comes challenges. The amount of data produced by each sequencing experiment is not inconsequential, therefore efficient storage methods need to be employed. Due to the magnitude of this data, another bottleneck in every bioinformatics pipeline is data transfer. Ex nihilo, using one of Compute Canada’s HPCs (Cedar) and the help of Advanced Research Computing, the Sequencing and Bioinformatics Consortium set up a bioinformatics data quality control pipeline. Reproducibility is crucial, therefore a combination of a Conda environment for version control, and Snakemake for pipelining was implemented.
Bioinformatics Specialist, University of British Columbia