VSEARCH#
Versatile open-source tool for microbiome analysis.
URL: https://github.com/torognes/vsearch
Example#
This wrapper can be used in the following way:
rule vsearch_cluster_fast:
input:
cluster_fast="reads/{sample}.fasta",
output:
profile="out/cluster_fast/{sample}.profile",
log:
"logs/vsearch/cluster_fast/{sample}.log",
params:
extra="--id 0.2 --sizeout --minseqlength 5",
threads: 1
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
rule vsearch_maskfasta:
input:
maskfasta="reads/{sample}.fasta",
output:
output="out/maskfasta/{sample}.fasta",
log:
"logs/vsearch/maskfasta/{sample}.log",
params:
extra="--hardmask",
threads: 1
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
rule vsearch_fastx_uniques:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
rule vsearch_fastx_uniques_gzip:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.gz",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
rule vsearch_fastx_uniques_bzip2:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.bz2",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
rule vsearch_fastq_convert:
input:
fastq_convert="reads/{sample}.fastq",
output:
fastqout="out/fastq_convert/{sample}.fastq",
log:
"logs/vsearch/fastq_convert/{sample}.log",
params:
extra="--fastq_ascii 33 --fastq_asciiout 64",
threads: 2
wrapper:
"v3.0.4-12-g561ccaf/bio/vsearch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes#
Keys for input and output files need to match vsearch arguments, (e.g. input) uchime_denovo, cluster_fast, fastx_uniques, maskfasta, fastq_convert, fastq_mergepairs, or (e.g. output) chimeras, fastaout, fastqout, output.
Software dependencies#
vsearch=2.26.1
gzip
bzip2
Input/Output#
Input:
input file(s)
Output:
output file(s)
Params#
extra
: additional program arguments
Code#
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
if snakemake.log:
log = f"--log {snakemake.log}"
input = " ".join([f"--{key} {value}" for key, value in snakemake.input.items()])
out_list = list()
for key, value in snakemake.output.items():
if value.endswith(".gz"):
out_list.append(f"--{key} /dev/stdout | gzip > {value}")
elif value.endswith(".bz2"):
out_list.append(f"--{key} /dev/stdout | bzip2 > {value}")
else:
out_list.append(f"--{key} {value}")
# Check which output files are to be compressed
out_gz = [out.endswith(".gz") for out in out_list]
out_bz2 = [out.endswith(".bz2") for out in out_list]
assert sum(out_gz + out_bz2) <= 1, "only one output can be compressed"
# Move compressed file (if any) to last
output = [out for _, out in sorted(zip(out_gz or out_bz2, out_list))]
shell("vsearch --threads {snakemake.threads} {input} {extra} {log} {output}")