ART_PROFILER_ILLUMINA¶
Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.
URL: https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm
Example¶
This wrapper can be used in the following way:
rule art_profiler_illumina:
input:
"data/{sample}.fq",
output:
"profiles/{sample}.txt"
log:
"logs/art_profiler_illumina/{sample}.log"
params: ""
threads: 2
wrapper:
"v1.31.1/bio/art/profiler_illumina"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Your input file must have one of the following extensions: fastq, fastq.gz, fq or fq.gz
Software dependencies¶
art=2016.06.05
Input/Output¶
Input:
- Path to fastq-formatted input file (first place in the input list of files)
Output:
- Path to txt formatted profile (first place in the output list of files)
Params¶
Extra parameters (no keyword mapped parameter)
:
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
import tempfile
import re
# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
# ensure that .fastq and .fastq.gz input files work, as well
filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")
# figure out the exact file extension after the above substitution
ext = re.search("fq(\.gz)?$", filename)
if ext:
fq_extension = ext.group(0)
else:
raise IOError(
"Incompatible extension: This art_profiler_illumina "
"wrapper requires input files with one of the following "
"extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
"your input and the invocation of the wrapper accordingly."
)
os.symlink(
# snakemake paths are relative, but the symlink needs to be absolute
os.path.abspath(snakemake.input[0]),
# the following awkward file name generation has reasons:
# * the file name needs to be unique to the execution of the
# rule, as art will create and mv temporary files with its basename
# in the output directory, which causes utter confusion when
# executing instances of the rule in parallel
# * temp file name cannot have any read infixes before the file
# extension, because otherwise art does read enumeration magic
# that messes up output file naming
os.path.join(
temp_input,
filename.replace(
"." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
),
),
)
# include output folder name in the profile_name command line argument and
# strip off the file extension, as art will add its own ".txt"
profile_name = os.path.join(
os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
)
shell(
"( art_profiler_illumina {snakemake.params} {profile_name}"
" {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
)