FASTQ_SCREEN

https://img.shields.io/badge/wrapper_version-v7.1.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/fastq_screen?label=version%20update%20pull%20requests&color=1cb481

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

URL: https://github.com/StevenWingett/FastQ-Screen

Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq",
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png",
    params:
        fastq_screen_config="fastq_screen.conf",
        subset=100000,
        aligner="bowtie2",
    log:
        "logs/fastq_screen/{sample}.log",
    threads: 8
    wrapper:
        "v7.1.0/bio/fastq_screen"


rule fastq_screen_with_config:
    input:
        "samples/{sample}.fastq",
    output:
        txt="qc/{sample}.fastq_screen_conf.txt",
        png="qc/{sample}.fastq_screen_conf.png",
        conf="qc/{sample}.config.txt",
    params:
        fastq_screen_config={
            "aligner_paths": {"bowtie2": "bowtie2"},
            "database": {"test": {"bowtie2": "genome"}},
        },
        subset=100000,
        aligner="bowtie2",
    log:
        "logs/fastq_screen/{sample}_conf.log",
    threads: 8
    wrapper:
        "v7.1.0/bio/fastq_screen"


rule fastq_screen_without_png:
    input:
        "samples/{sample}.fastq",
    output:
        txt="qc/{sample}.fastq_screen_nopng.txt",
    params:
        fastq_screen_config={
            "aligner_paths": {"bowtie2": "bowtie2"},
            "database": {"test": {"bowtie2": "genome"}},
        },
        subset=100000,
        aligner="bowtie2",
    log:
        "logs/fastq_screen/{sample}_nopng.log",
    threads: 8
    wrapper:
        "v7.1.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.

  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.

  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.

Software dependencies

  • fastq-screen=0.16.0

  • perl-gdgraph-histogram=1.1

Input/Output

Input:

  • A FASTQ file, gzipped or not.

Output:

  • txt: Fraction of reads mapping to each provided index (optional)

  • png: Barplot of the contents of txt, saved as a PNG file (optional)

  • conf: Configuration used by fastq screen (optional)

Authors

  • Ryan Dale

  • Thibault Dayris

Code

import os
import re
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale, Thibault Dayris"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov, thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

_config = snakemake.params["fastq_screen_config"]

subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]


with tempfile.TemporaryDirectory() as tempdir:
    # snakemake.params.fastq_screen_config can be either a dict or a string. If
    # string, interpret as a filename pointing to the fastq_screen config file.
    # Otherwise, create a new tempfile out of the contents of the dict:
    if isinstance(_config, dict):
        config_file = f"{tempdir}/fastq_screen_config.txt"

        with open(config_file, "w") as fout:
            for label, indexes in _config["database"].items():
                for aligner, index in indexes.items():
                    fout.write(
                        "\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
                    )
            for aligner, path in _config["aligner_paths"].items():
                fout.write("\t".join([aligner.upper(), path]) + "\n")
    else:
        config_file = _config

    shell(
        "fastq_screen"
        " --threads {snakemake.threads}"
        " --aligner {aligner}"
        " --conf {config_file}"
        " --subset {subset}"
        " {extra}"
        " --force"
        " --outdir {tempdir}"
        " {snakemake.input[0]}"
        " {log}"
    )

    # Move output to the filenames specified by the rule if user expects them
    txt = snakemake.output.get("txt")
    if txt:
        shell("mv --verbose {tempdir}/{prefix}_screen.txt {txt} {log}")

    png = snakemake.output.get("png")
    if png:
        shell("mv --verbose {tempdir}/{prefix}_screen.png {png} {log}")

    conf = snakemake.output.get("conf")
    if conf:
        shell("mv --verbose {config_file} {conf} {log}")