FASTQ_SCREEN

https://img.shields.io/badge/blacklisted-wrapper%20does%20not%20generate%20expected%20png%20file-red https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/fastq_screen?label=version%20update%20pull%20requests

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq"
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png"
    params:
        fastq_screen_config="fastq_screen.conf",
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "v4.6.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.

  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.

  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.

Software dependencies

  • fastq-screen=0.15.3

  • bowtie2=2.5.4

  • bowtie=1.3.1

Input/Output

Input:

  • A FASTQ file, gzipped or not.

Output:

  • txt: a text file containing the fraction of reads mapping to each provided index

  • png: a bar plot of the contents of txt, saved as a PNG file

Authors

  • Ryan Dale

Code

import os
import re
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"

_config = snakemake.params["fastq_screen_config"]

subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
    tmp = tempfile.NamedTemporaryFile(delete=False).name
    with open(tmp, "w") as fout:
        for label, indexes in _config["database"].items():
            for aligner, index in indexes.items():
                fout.write(
                    "\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
                )
        for aligner, path in _config["aligner_paths"].items():
            fout.write("\t".join([aligner.upper(), path]) + "\n")
    config_file = tmp
else:
    config_file = _config

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]

tempdir = tempfile.mkdtemp()

shell(
    "fastq_screen --outdir {tempdir} "
    "--force "
    "--aligner {aligner} "
    "--conf {config_file} "
    "--subset {subset} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{snakemake.input[0]} "
    "{log}"
)

# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")

# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
    shell("rm {tmp}")