.. _`bio/vsearch`: VSEARCH ======= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/vsearch?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/vsearch Versatile open-source tool for microbiome analysis. **URL**: https://github.com/torognes/vsearch Example ------- This wrapper can be used in the following way: .. code-block:: python rule vsearch_cluster_fast: input: cluster_fast="reads/{sample}.fasta", output: profile="out/cluster_fast/{sample}.profile", log: "logs/vsearch/cluster_fast/{sample}.log", params: extra="--id 0.2 --sizeout --minseqlength 5", threads: 1 wrapper: "v3.0.1/bio/vsearch" rule vsearch_maskfasta: input: maskfasta="reads/{sample}.fasta", output: output="out/maskfasta/{sample}.fasta", log: "logs/vsearch/maskfasta/{sample}.log", params: extra="--hardmask", threads: 1 wrapper: "v3.0.1/bio/vsearch" rule vsearch_fastx_uniques: input: fastx_uniques="reads/{sample}.fastq", output: fastqout="out/fastx_uniques/{sample}.fastq", log: "logs/vsearch/fastx_uniques/{sample}.log", params: extra="--strand both --minseqlength 5", threads: 2 wrapper: "v3.0.1/bio/vsearch" rule vsearch_fastx_uniques_gzip: input: fastx_uniques="reads/{sample}.fastq", output: fastqout="out/fastx_uniques/{sample}.fastq.gz", log: "logs/vsearch/fastx_uniques/{sample}.log", params: extra="--strand both --minseqlength 5", threads: 2 wrapper: "v3.0.1/bio/vsearch" rule vsearch_fastx_uniques_bzip2: input: fastx_uniques="reads/{sample}.fastq", output: fastqout="out/fastx_uniques/{sample}.fastq.bz2", log: "logs/vsearch/fastx_uniques/{sample}.log", params: extra="--strand both --minseqlength 5", threads: 2 wrapper: "v3.0.1/bio/vsearch" rule vsearch_fastq_convert: input: fastq_convert="reads/{sample}.fastq", output: fastqout="out/fastq_convert/{sample}.fastq", log: "logs/vsearch/fastq_convert/{sample}.log", params: extra="--fastq_ascii 33 --fastq_asciiout 64", threads: 2 wrapper: "v3.0.1/bio/vsearch" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * Keys for `input` and `output` files need to match `vsearch` arguments, (e.g. input) `uchime_denovo`, `cluster_fast`, `fastx_uniques`, `maskfasta`, `fastq_convert`, `fastq_mergepairs`, or (e.g. output) `chimeras`, `fastaout`, `fastqout`, `output`. Software dependencies --------------------- * ``vsearch=2.25.0`` * ``gzip`` * ``bzip2`` Input/Output ------------ **Input:** * input file(s) **Output:** * output file(s) Params ------ * ``extra``: additional program arguments Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2021, Filipe G. Vieira" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") if snakemake.log: log = f"--log {snakemake.log}" input = " ".join([f"--{key} {value}" for key, value in snakemake.input.items()]) out_list = list() for key, value in snakemake.output.items(): if value.endswith(".gz"): out_list.append(f"--{key} /dev/stdout | gzip > {value}") elif value.endswith(".bz2"): out_list.append(f"--{key} /dev/stdout | bzip2 > {value}") else: out_list.append(f"--{key} {value}") # Check which output files are to be compressed out_gz = [out.endswith(".gz") for out in out_list] out_bz2 = [out.endswith(".bz2") for out in out_list] assert sum(out_gz + out_bz2) <= 1, "only one output can be compressed" # Move compressed file (if any) to last output = [out for _, out in sorted(zip(out_gz or out_bz2, out_list))] shell("vsearch --threads {snakemake.threads} {input} {extra} {log} {output}") .. |nl| raw:: html