.. _`bio/fastqc`:

FASTQC
======


.. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/fastqc?label=version%20update%20pull%20requests
   :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/fastqc

Generate fastq qc statistics using fastqc.


**URL**: https://github.com/s-andrews/FastQC

Example
-------

This wrapper can be used in the following way:

.. code-block:: python

    rule fastqc:
        input:
            "reads/{sample}.fastq"
        output:
            html="qc/fastqc/{sample}.html",
            zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
        params:
            extra = "--quiet"
        log:
            "logs/fastqc/{sample}.log"
        threads: 1
        resources:
            mem_mb = 1024
        wrapper:
            "v3.0.1/bio/fastqc"

Note that input, output and log file paths can be chosen freely.

When running with

.. code-block:: bash

    snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.


Software dependencies
---------------------

* ``fastqc=0.12.1``
* ``snakemake-wrapper-utils=0.6.2``

Input/Output
------------
**Input:**

* fastq file

**Output:**

* html file containing statistics
* zip file containing statistics


Authors
-------

* Julian de Ruiter


Code
----

.. code-block:: python

    """Snakemake wrapper for fastqc."""

    __author__ = "Julian de Ruiter"
    __copyright__ = "Copyright 2017, Julian de Ruiter"
    __email__ = "julianderuiter@gmail.com"
    __license__ = "MIT"


    from os import path
    import re
    from tempfile import TemporaryDirectory
    from snakemake.shell import shell
    from snakemake_wrapper_utils.snakemake import get_mem

    extra = snakemake.params.get("extra", "")
    log = snakemake.log_fmt_shell(stdout=True, stderr=True)
    # Define memory per thread (https://github.com/s-andrews/FastQC/blob/master/fastqc#L201-L222)
    mem_mb = int(get_mem(snakemake, "MiB") / snakemake.threads)


    def basename_without_ext(file_path):
        """Returns basename of file path, without the file extension."""

        base = path.basename(file_path)
        # Remove file extension(s) (similar to the internal fastqc approach)
        base = re.sub("\\.gz$", "", base)
        base = re.sub("\\.bz2$", "", base)
        base = re.sub("\\.txt$", "", base)
        base = re.sub("\\.fastq$", "", base)
        base = re.sub("\\.fq$", "", base)
        base = re.sub("\\.sam$", "", base)
        base = re.sub("\\.bam$", "", base)

        return base


    # If you have multiple input files fastqc doesn't know what to do. Taking silently only first gives unapreciated results

    if len(snakemake.input) > 1:
        raise IOError("Got multiple input files, I don't know how to process them!")

    # Run fastqc, since there can be race conditions if multiple jobs
    # use the same fastqc dir, we create a temp dir.
    with TemporaryDirectory() as tempdir:
        shell(
            "fastqc"
            " --threads {snakemake.threads}"
            " --memory {mem_mb}"
            " {extra}"
            " --outdir {tempdir:q}"
            " {snakemake.input[0]:q}"
            " {log}"
        )

        # Move outputs into proper position.
        output_base = basename_without_ext(snakemake.input[0])
        html_path = path.join(tempdir, output_base + "_fastqc.html")
        zip_path = path.join(tempdir, output_base + "_fastqc.zip")

        if snakemake.output.html != html_path:
            shell("mv {html_path:q} {snakemake.output.html:q}")

        if snakemake.output.zip != zip_path:
            shell("mv {zip_path:q} {snakemake.output.zip:q}")


.. |nl| raw:: html

   <br>