FASTQ_SCREEN
fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE2
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE
DATABASE hg19 /data/hg19/Bowtie2Index/genome BOWTIE2
DATABASE mm10 /data/mm10/Bowtie2Index/genome BOWTIE2
BOWTIE /path/to/bowtie
BOWTIE2 /path/to/bowtie2
becomes:
fastq_screen_config = {
'database': {
'ecoli': {
'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
'hg19': {
'bowtie2': '/data/hg19/Bowtie2Index/genome'},
'mm10': {
'bowtie2': '/data/mm10/Bowtie2Index/genome'}
},
'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}
By default, the wrapper will use bowtie2 as the aligner and a subset of 100000
reads. These can be overridden using params.aligner and params.subset
respectively. Furthermore, params.extra can be used to pass additional
arguments verbatim to fastq_screen, for example extra="--illumina1_3" or
extra="--bowtie2 '--trim5=8'".
URL: https://github.com/StevenWingett/FastQ-Screen
Example
This wrapper can be used in the following way:
rule fastq_screen:
input:
"samples/{sample}.fastq",
output:
txt="qc/{sample}.fastq_screen.txt",
png="qc/{sample}.fastq_screen.png",
params:
fastq_screen_config="fastq_screen.conf",
subset=100000,
aligner="bowtie2",
log:
"logs/fastq_screen/{sample}.log",
threads: 8
wrapper:
"v7.1.0/bio/fastq_screen"
rule fastq_screen_with_config:
input:
"samples/{sample}.fastq",
output:
txt="qc/{sample}.fastq_screen_conf.txt",
png="qc/{sample}.fastq_screen_conf.png",
conf="qc/{sample}.config.txt",
params:
fastq_screen_config={
"aligner_paths": {"bowtie2": "bowtie2"},
"database": {"test": {"bowtie2": "genome"}},
},
subset=100000,
aligner="bowtie2",
log:
"logs/fastq_screen/{sample}_conf.log",
threads: 8
wrapper:
"v7.1.0/bio/fastq_screen"
rule fastq_screen_without_png:
input:
"samples/{sample}.fastq",
output:
txt="qc/{sample}.fastq_screen_nopng.txt",
params:
fastq_screen_config={
"aligner_paths": {"bowtie2": "bowtie2"},
"database": {"test": {"bowtie2": "genome"}},
},
subset=100000,
aligner="bowtie2",
log:
"logs/fastq_screen/{sample}_nopng.log",
threads: 8
wrapper:
"v7.1.0/bio/fastq_screen"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
fastq_screenhard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.While the dictionary form of
fastq_screen_configis convenient, the unordered nature of the dictionary may causesnakemake --list-params-changedto incorrectly report changed parameters even though the contents remain the same. If you plan on using--list-params-changedthen it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.When providing the dictionary form of
fastq_screen_config, the wrapper will write a temp file using Python’stempfilemodule. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something likeshell.prefix("export TMPDIR=/scratch; ")to the snakefile calling this wrapper.
Software dependencies
fastq-screen=0.16.0perl-gdgraph-histogram=1.1
Input/Output
Input:
A FASTQ file, gzipped or not.
Output:
txt: Fraction of reads mapping to each provided index (optional)png: Barplot of the contents oftxt, saved as a PNG file (optional)conf: Configuration used by fastq screen (optional)
Code
import os
import re
from snakemake.shell import shell
import tempfile
__author__ = "Ryan Dale, Thibault Dayris"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov, thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
_config = snakemake.params["fastq_screen_config"]
subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]
with tempfile.TemporaryDirectory() as tempdir:
# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
config_file = f"{tempdir}/fastq_screen_config.txt"
with open(config_file, "w") as fout:
for label, indexes in _config["database"].items():
for aligner, index in indexes.items():
fout.write(
"\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
)
for aligner, path in _config["aligner_paths"].items():
fout.write("\t".join([aligner.upper(), path]) + "\n")
else:
config_file = _config
shell(
"fastq_screen"
" --threads {snakemake.threads}"
" --aligner {aligner}"
" --conf {config_file}"
" --subset {subset}"
" {extra}"
" --force"
" --outdir {tempdir}"
" {snakemake.input[0]}"
" {log}"
)
# Move output to the filenames specified by the rule if user expects them
txt = snakemake.output.get("txt")
if txt:
shell("mv --verbose {tempdir}/{prefix}_screen.txt {txt} {log}")
png = snakemake.output.get("png")
if png:
shell("mv --verbose {tempdir}/{prefix}_screen.png {png} {log}")
conf = snakemake.output.get("conf")
if conf:
shell("mv --verbose {config_file} {conf} {log}")