VSEARCH

Versatile open-source tool for microbiome analysis.

URL: https://github.com/torognes/vsearch

Example

This wrapper can be used in the following way:

rule vsearch_cluster_fast:
    input:
        cluster_fast="reads/{sample}.fasta",
    output:
        profile="out/cluster_fast/{sample}.profile",
    log:
        "logs/vsearch/cluster_fast/{sample}.log",
        vsearch="out/maskfasta/{sample}.log",
    params:
        extra="--id 0.2 --sizeout --minseqlength 5",
    threads: 1
    wrapper:
        "v3.9.0/bio/vsearch"


rule vsearch_maskfasta:
    input:
        maskfasta="reads/{sample}.fasta",
    output:
        output="out/maskfasta/{sample}.fasta",
    log:
        "logs/vsearch/maskfasta/{sample}.log",
        vsearch="out/maskfasta/{sample}.log",
    params:
        extra="--hardmask",
    threads: 1
    wrapper:
        "v3.9.0/bio/vsearch"


rule vsearch_fastx_uniques:
    input:
        fastx_uniques="reads/{sample}.fastq",
    output:
        fastqout="out/fastx_uniques/{sample}.fastq",
    log:
        "logs/vsearch/fastx_uniques/{sample}.log",
        vsearch="out/fastx_uniques/{sample}.log",
    params:
        extra="--strand both --minseqlength 5",
    threads: 1
    wrapper:
        "v3.9.0/bio/vsearch"


rule vsearch_fastx_uniques_gzip:
    input:
        fastx_uniques="reads/{sample}.fastq",
    output:
        fastqout="out/fastx_uniques/{sample}.fastq.gz",
    log:
        "logs/vsearch/fastx_uniques/{sample}.log",
        vsearch="out/fastx_uniques/{sample}.log",
    params:
        extra="--strand both --minseqlength 5",
    threads: 2
    wrapper:
        "v3.9.0/bio/vsearch"


rule vsearch_fastx_uniques_bzip2:
    input:
        fastx_uniques="reads/{sample}.fastq",
    output:
        fastqout="out/fastx_uniques/{sample}.fastq.bz2",
    log:
        "logs/vsearch/fastx_uniques/{sample}.log",
        vsearch="out/fastx_uniques/{sample}.log",
    params:
        extra="--strand both --minseqlength 5",
    threads: 2
    wrapper:
        "v3.9.0/bio/vsearch"


rule vsearch_fastq_convert:
    input:
        fastq_convert="reads/{sample}.fastq",
    output:
        fastqout="out/fastq_convert/{sample}.fastq",
    log:
        "logs/vsearch/fastq_convert/{sample}.log",
        vsearch="out/fastq_convert/{sample}.log",
    params:
        extra="--fastq_ascii 33 --fastq_asciiout 64",
    threads: 2
    wrapper:
        "v3.9.0/bio/vsearch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Keys for input and output files need to match vsearch arguments, (e.g. input) uchime_denovo, cluster_fast, fastx_uniques, maskfasta, fastq_convert, fastq_mergepairs, or (e.g. output) chimeras, fastaout, fastqout, output.
An extra log file (named vsearch) can be specified that will be passed to vsearch option –log.

Software dependencies

vsearch=2.27.1
pigz
pbzip2

Input/Output

Input:

input file(s)

Output:

output file(s)

Params

extra: additional program arguments

Authors

Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log.get("vsearch", "")
if log:
    extra += f" --log {log}"


# Parse input files
input = " ".join([f"--{key} {value}" for key, value in snakemake.input.items()])

# Parse output files
out_list = list()
for key, value in snakemake.output.items():
    if value.endswith(".gz"):
        out_list.append(
            f"--{key} /dev/stdout | pigz --processes {snakemake.threads} --stdout > {value}"
        )
    elif value.endswith(".bz2"):
        out_list.append(
            f"--{key} /dev/stdout | pbzip2 -p{snakemake.threads} --compress --stdout > {value}"
        )
    else:
        out_list.append(f"--{key} {value}")

# Check which output files are to be compressed
out_gz = [out.endswith(".gz") for out in out_list]
out_bz2 = [out.endswith(".bz2") for out in out_list]
assert sum(out_gz + out_bz2) <= 1, "only one output can be compressed"

# Move compressed file (if any) to last
output = [
    out for _, out in sorted(zip([x | y for x, y in zip(out_gz, out_bz2)], out_list))
]


shell(
    "(vsearch --threads {snakemake.threads} {input} {extra} {output}) 2> {snakemake.log[0]}"
)