VSEARCH
Versatile open-source tool for microbiome analysis.
URL: https://github.com/torognes/vsearch
Example
This wrapper can be used in the following way:
rule vsearch_cluster_fast:
input:
cluster_fast="reads/{sample}.fasta",
output:
profile="out/cluster_fast/{sample}.profile",
log:
"logs/vsearch/cluster_fast/{sample}.log",
vsearch="out/maskfasta/{sample}.log",
params:
extra="--id 0.2 --sizeout --minseqlength 5",
threads: 1
wrapper:
"v5.8.0/bio/vsearch"
rule vsearch_maskfasta:
input:
maskfasta="reads/{sample}.fasta",
output:
output="out/maskfasta/{sample}.fasta",
log:
"logs/vsearch/maskfasta/{sample}.log",
vsearch="out/maskfasta/{sample}.log",
params:
extra="--hardmask",
threads: 1
wrapper:
"v5.8.0/bio/vsearch"
rule vsearch_fastx_uniques:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
vsearch="out/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 1
wrapper:
"v5.8.0/bio/vsearch"
rule vsearch_fastx_uniques_gzip:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.gz",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
vsearch="out/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v5.8.0/bio/vsearch"
rule vsearch_fastx_uniques_bzip2:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.bz2",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
vsearch="out/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v5.8.0/bio/vsearch"
rule vsearch_fastq_convert:
input:
fastq_convert="reads/{sample}.fastq",
output:
fastqout="out/fastq_convert/{sample}.fastq",
log:
"logs/vsearch/fastq_convert/{sample}.log",
vsearch="out/fastq_convert/{sample}.log",
params:
extra="--fastq_ascii 33 --fastq_asciiout 64",
threads: 2
wrapper:
"v5.8.0/bio/vsearch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Keys for input and output files need to match vsearch arguments, (e.g. input) uchime_denovo, cluster_fast, fastx_uniques, maskfasta, fastq_convert, fastq_mergepairs, or (e.g. output) chimeras, fastaout, fastqout, output.
An extra log file (named vsearch) can be specified that will be passed to vsearch option –log.
Software dependencies
vsearch=2.29.3
pigz
pbzip2
Input/Output
Input:
input file(s)
Output:
output file(s)
Params
extra
: additional program arguments
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log.get("vsearch", "")
if log:
extra += f" --log {log}"
# Parse input files
input = " ".join([f"--{key} {value}" for key, value in snakemake.input.items()])
# Parse output files
out_list = list()
for key, value in snakemake.output.items():
if value.endswith(".gz"):
out_list.append(
f"--{key} /dev/stdout | pigz --processes {snakemake.threads} --stdout > {value}"
)
elif value.endswith(".bz2"):
out_list.append(
f"--{key} /dev/stdout | pbzip2 -p{snakemake.threads} --compress --stdout > {value}"
)
else:
out_list.append(f"--{key} {value}")
# Check which output files are to be compressed
out_gz = [out.endswith(".gz") for out in out_list]
out_bz2 = [out.endswith(".bz2") for out in out_list]
assert sum(out_gz + out_bz2) <= 1, "only one output can be compressed"
# Move compressed file (if any) to last
output = [
out for _, out in sorted(zip([x | y for x, y in zip(out_gz, out_bz2)], out_list))
]
shell(
"(vsearch --threads {snakemake.threads} {input} {extra} {output}) 2> {snakemake.log[0]}"
)