.. _`bio/fastqc`: FASTQC ====== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/fastqc?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/fastqc Generate fastq qc statistics using fastqc. **URL**: https://github.com/s-andrews/FastQC Example ------- This wrapper can be used in the following way: .. code-block:: python rule fastqc: input: "reads/{sample}.fastq" output: html="qc/fastqc/{sample}.html", zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename params: extra = "--quiet" log: "logs/fastqc/{sample}.log" threads: 1 resources: mem_mb = 1024 wrapper: "v3.0.1/bio/fastqc" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``fastqc=0.12.1`` * ``snakemake-wrapper-utils=0.6.2`` Input/Output ------------ **Input:** * fastq file **Output:** * html file containing statistics * zip file containing statistics Authors ------- * Julian de Ruiter Code ---- .. code-block:: python """Snakemake wrapper for fastqc.""" __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path import re from tempfile import TemporaryDirectory from snakemake.shell import shell from snakemake_wrapper_utils.snakemake import get_mem extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Define memory per thread (https://github.com/s-andrews/FastQC/blob/master/fastqc#L201-L222) mem_mb = int(get_mem(snakemake, "MiB") / snakemake.threads) def basename_without_ext(file_path): """Returns basename of file path, without the file extension.""" base = path.basename(file_path) # Remove file extension(s) (similar to the internal fastqc approach) base = re.sub("\\.gz$", "", base) base = re.sub("\\.bz2$", "", base) base = re.sub("\\.txt$", "", base) base = re.sub("\\.fastq$", "", base) base = re.sub("\\.fq$", "", base) base = re.sub("\\.sam$", "", base) base = re.sub("\\.bam$", "", base) return base # If you have multiple input files fastqc doesn't know what to do. Taking silently only first gives unapreciated results if len(snakemake.input) > 1: raise IOError("Got multiple input files, I don't know how to process them!") # Run fastqc, since there can be race conditions if multiple jobs # use the same fastqc dir, we create a temp dir. with TemporaryDirectory() as tempdir: shell( "fastqc" " --threads {snakemake.threads}" " --memory {mem_mb}" " {extra}" " --outdir {tempdir:q}" " {snakemake.input[0]:q}" " {log}" ) # Move outputs into proper position. output_base = basename_without_ext(snakemake.input[0]) html_path = path.join(tempdir, output_base + "_fastqc.html") zip_path = path.join(tempdir, output_base + "_fastqc.zip") if snakemake.output.html != html_path: shell("mv {html_path:q} {snakemake.output.html:q}") if snakemake.output.zip != zip_path: shell("mv {zip_path:q} {snakemake.output.zip:q}") .. |nl| raw:: html