ART_PROFILER_ILLUMINA

Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.

Example

This wrapper can be used in the following way:

rule art_profiler_illumina:
    input:
        "data/{sample}.fq",
    output:
        "profiles/{sample}.txt"
    log:
        "logs/art_profiler_illumina/{sample}.log"
    params: ""
    threads: 2
    wrapper:
        "v0.75.0/bio/art/profiler_illumina"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • art==2016.06.05

Authors

  • David Laehnemann
  • Victoria Sack

Code

__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


from snakemake.shell import shell
import os
import tempfile
import re


# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
    # ensure that .fastq and .fastq.gz input files work, as well
    filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")

    # figure out the exact file extension after the above substitution
    ext = re.search("fq(\.gz)?$", filename)
    if ext:
        fq_extension = ext.group(0)
    else:
        raise IOError(
            "Incompatible extension: This art_profiler_illumina "
            "wrapper requires input files with one of the following "
            "extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
            "your input and the invocation of the wrapper accordingly."
        )

    os.symlink(
        # snakemake paths are relative, but the symlink needs to be absolute
        os.path.abspath(snakemake.input[0]),
        # the following awkward file name generation has reasons:
        # * the file name needs to be unique to the execution of the
        #   rule, as art will create and mv temporary files with its basename
        #   in the output directory, which causes utter confusion when
        #   executing instances of the rule in parallel
        # * temp file name cannot have any read infixes before the file
        #   extension, because otherwise art does read enumeration magic
        #   that messes up output file naming
        os.path.join(
            temp_input,
            filename.replace(
                "." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
            ),
        ),
    )

    # include output folder name in the profile_name command line argument and
    # strip off the file extension, as art will add its own ".txt"
    profile_name = os.path.join(
        os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
    )

    shell(
        "( art_profiler_illumina {snakemake.params} {profile_name}"
        " {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
    )