SRA-TOOLS FASTERQ-DUMP

Download FASTQ files from SRA.

URL:

Example

This wrapper can be used in the following way:

rule get_fastq_pe:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq",
        "data/pe/{accession}_2.fastq",
    log:
        "logs/pe/{accession}.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"


rule get_fastq_pe_gz:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq.gz",
        "data/pe/{accession}_2.fastq.gz",
    log:
        "logs/pe/{accession}.gz.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"


rule get_fastq_pe_bz2:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq.bz2",
        "data/pe/{accession}_2.fastq.bz2",
    log:
        "logs/pe/{accession}.bz2.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se:
    output:
        "data/se/{accession}.fastq"
    log:
        "logs/se/{accession}.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se_gz:
    output:
        "data/se/{accession}.fastq.gz"
    log:
        "logs/se/{accession}.gz.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se_bz2:
    output:
        "data/se/{accession}.fastq.bz2"
    log:
        "logs/se/{accession}.bz2.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v1.2.0/bio/sra-tools/fasterq-dump"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • sra-tools>2.9.1
  • pigz>=2.6
  • pbzip2>=1.1
  • snakemake-wrapper-utils=0.3

Notes

  • The output format is automatically detected and, if needed, files compressed with either gzip or bzip2.
  • Currently only supports PE samples
  • The extra param alllows for additional program arguments.
  • More information in, https://github.com/ncbi/sra-tools

Authors

  • Johannes Köster
  • Derek Croote
  • Filipe G. Vieira

Code

__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")


# Parse memory
mem_mb = get_mem(snakemake, "MiB")


# Outdir
outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = f"--outdir {outdir}"


# Output compression
compress = ""
mem = f"-m{mem_mb}" if mem_mb else ""

for output in snakemake.output:
    out_name, out_ext = os.path.splitext(output)
    if out_ext == ".gz":
        compress += f"pigz -p {snakemake.threads} {out_name}; "
    elif out_ext == ".bz2":
        compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; "


with tempfile.TemporaryDirectory() as tmpdir:
    mem = f"--mem {mem_mb}M" if mem_mb else ""

    shell(
        "(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} "
        "{extra} {outdir} {snakemake.wildcards.accession}; "
        "{compress}"
        ") {log}"
    )