SRA-TOOLS FASTERQ-DUMP
Download FASTQ files from SRA.
Example
This wrapper can be used in the following way:
rule get_fastq_pe:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq",
"data/pe/{accession}_2.fastq",
log:
"logs/pe/{accession}.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
rule get_fastq_pe_gz:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq.gz",
"data/pe/{accession}_2.fastq.gz",
log:
"logs/pe/{accession}.gz.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
rule get_fastq_pe_bz2:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq.bz2",
"data/pe/{accession}_2.fastq.bz2",
log:
"logs/pe/{accession}.bz2.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
rule get_fastq_se:
output:
"data/se/{accession}.fastq"
log:
"logs/se/{accession}.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
rule get_fastq_se_gz:
output:
"data/se/{accession}.fastq.gz"
log:
"logs/se/{accession}.gz.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
rule get_fastq_se_bz2:
output:
"data/se/{accession}.fastq.bz2"
log:
"logs/se/{accession}.bz2.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v3.9.0/bio/sra-tools/fasterq-dump"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The output format is automatically detected and, if needed, files compressed with either gzip or bzip2.
Currently only supports PE samples
The extra param alllows for additional program arguments.
More information in, https://github.com/ncbi/sra-tools
Software dependencies
sra-tools=3.1.0
pigz=2.8
pbzip2=1.1.13
snakemake-wrapper-utils=0.6.2
Code
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Parse memory
mem_mb = get_mem(snakemake, "MiB")
# Outdir
outdir = os.path.dirname(snakemake.output[0])
if outdir:
outdir = f"--outdir {outdir}"
# Output compression
compress = ""
mem = f"-m{mem_mb}" if mem_mb else ""
for output in snakemake.output:
out_name, out_ext = os.path.splitext(output)
if out_ext == ".gz":
compress += f"pigz -p {snakemake.threads} {out_name}; "
elif out_ext == ".bz2":
compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; "
with tempfile.TemporaryDirectory() as tmpdir:
mem = f"--mem {mem_mb}M" if mem_mb else ""
shell(
"(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} "
"{extra} {outdir} {snakemake.wildcards.accession}; "
"{compress}"
") {log}"
)