SAMTOOLS FASTQ SEPARATE

Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools fastq documentation), as are secondary (0x100) and supplementary reads (0x800).

URL:

Example

This wrapper can be used in the following way:

rule samtools_fastq_separate:
    input:
        "mapped/{sample}.bam",
    output:
        "reads/{sample}.1.fq",
        "reads/{sample}.2.fq",
    log:
        "{sample}.separate.log",
    params:
        sort="-m 4G",
        fastq="-n",
    # Remember, this is the number of samtools' additional threads. At least 2 threads have to be requested on cluster sumbission. This value - 2 will be sent to samtools sort -@ argument.
    threads: 3
    wrapper:
        "v0.86.0/bio/samtools/fastq/separate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • samtools=1.14

Notes

Authors

  • David Laehnemann
  • Victoria Sack
  • Filipe G. Vieira

Code

__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
import tempfile
from snakemake.shell import shell

params_sort = snakemake.params.get("sort", "")
params_fastq = snakemake.params.get("fastq", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

prefix = os.path.splitext(snakemake.output[0])[0]

# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools fastq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = "" if snakemake.threads <= 2 else " -@ {} ".format(snakemake.threads - 2)

with tempfile.NamedTemporaryFile() as tmpfile:
    shell(
        "(samtools sort -n "
        " {threads} "
        " -T {tmpfile.name} "
        " {params_sort} "
        " {snakemake.input[0]} | "
        "samtools fastq "
        " {params_fastq} "
        " -1 {snakemake.output[0]} "
        " -2 {snakemake.output[1]} "
        " -0 /dev/null "
        " -s /dev/null "
        " -F 0x900 "
        " - "
        ") {log}"
    )