SORTMERNA

SortMeRNA is a local sequence alignment tool for filtering, mapping and OTU clustering.

URL: https://sortmerna.readthedocs.io/

Example

This wrapper can be used in the following way:

rule sortmerna_pe:
    input:
        ref=["database1.fa", "database2.fa"],
        reads=["reads_1.fq.gz", "reads_2.fq.gz"],
    output:
        aligned=["aligned_1.fastq.gz", "aligned_2.fastq.gz"],
        other=["unpaired_1.fastq.gz", "unpaired_2.fastq.gz"],
        stats="sortmerna_pe_stats.log",
    params:
        extra="--idx-dir idx --paired_in --out2",
    threads: 16
    resources:
        mem_mb=3072,  # amount of memory for building the index
    log:
        "logs/sortmerna/reads_pe.log",
    wrapper:
        "v3.9.0/bio/sortmerna"


rule sortmerna_se:
    input:
        ref=["database1.fa", "database2.fa"],
        reads="reads.fq",
    output:
        aligned="aligned.fastq",
        other="unpaired.fastq",
        stats="sortmerna_se_stats.log",
    params:
        extra="--idx-dir idx",
    threads: 16
    resources:
        mem_mb=3072,  # amount of memory for building the index
    log:
        "logs/sortmerna/reads_se.log",
    wrapper:
        "v3.9.0/bio/sortmerna"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

The kvdb (key value database) will be created in a temporary directory. Then, you don’t have to prune it before each run. If you want to re-use the index, you can specify the idx-dir parameter.

Software dependencies

sortmerna=4.3.6

Input/Output

Input:

Reference FASTA files (one or more)
Query FASTA file (single or paired-end)

Output:

Aligned reads
Unaligned reads (other)

Params

extra: aditional program arguments

Authors

Curro Campuzano Jiménez

Code

__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

ref = snakemake.input.ref
reads = snakemake.input.reads
aligned = snakemake.output.get("aligned", None)
other = snakemake.output.get("other", None)
stats = snakemake.output.get("stats", None)
mem_mb = snakemake.resources.get("mem_mb", 3072)  # Default value

if isinstance(ref, list):
    ref = " --ref ".join(ref)

if aligned:
    if isinstance(aligned, list):
        assert (
            len(aligned) == 2
        ), "if paired input, aligned must be a list of two files, if any"
        assert isinstance(
            reads, list
        ), "if paired input, reads must be a list of two files"

if other:
    if isinstance(other, list):
        assert (
            len(other) == 2
        ), "if paired input, other must be a list of two files, if any"
        assert isinstance(
            reads, list
        ), "if paired input, reads must be a list of two files"
    extra = f"--fastx {extra}"

is_paired = False
if isinstance(reads, list):
    assert len(reads) == 2, "if paired input, reads must be a list of two files"
    reads = " --reads ".join(reads)
    is_paired = True

if stats:
    assert isinstance(stats, str), "stats must be a single file"


with tempfile.TemporaryDirectory() as temp_workdir:
    shell(
        " sortmerna --ref {ref}"
        " --reads {reads}"
        " --workdir {temp_workdir}"
        " --threads {snakemake.threads}"
        " -m {mem_mb}"
        " --aligned {temp_workdir}/aligned_reads"
        " --other {temp_workdir}/other_reads"
        " {extra}"
        " {log}"
    )

    if is_paired:
        if aligned:
            # Handle the case were no alignment
            shell("mv {temp_workdir}/aligned_reads_fwd.* {aligned[0]}")
            shell("mv {temp_workdir}/aligned_reads_rev.* {aligned[1]}")
        if other:
            shell("mv {temp_workdir}/other_reads_fwd.* {other[0]}")
            shell("mv {temp_workdir}/other_reads_rev.* {other[1]}")
    else:
        if aligned:
            shell("mv {temp_workdir}/aligned_reads.f* {aligned}")
        if other:
            shell("mv {temp_workdir}/other_reads.f* {other}")
    if stats:
        shell("mv {temp_workdir}/aligned_reads.log {stats}")