.. _`bio/seqkit`: SEQKIT GENERIC WRAPPER ====================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/seqkit?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/seqkit Run SeqKit. **URL**: https://bioinf.shenwei.me/seqkit/usage/ Example ------- This wrapper can be used in the following way: .. code-block:: python rule seqkit_seq: input: fasta="data/{sample}.fa", output: fasta="out/seq/{sample}.fa.gz", log: "logs/seq/{sample}.log", params: command="seq", extra="--min-len 10", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_subseq_bed: input: fasta="data/{sample}.fa", bed="data/{sample}.bed", output: fasta="out/subseq/bed/{sample}.fa.gz", log: "logs/subseq/bed/{sample}.log", params: command="subseq", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_subseq_gtf: input: fasta="data/{sample}.fa", gtf="data/{sample}.gtf", output: fasta="out/subseq/gtf/{sample}.fa.gz", log: "logs/subseq/gtf/{sample}.log", params: command="subseq", extra="--feature CDS", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_subseq_region: input: fasta="data/{sample}.fa", output: fasta="out/subseq/region/{sample}.fa.gz", log: "logs/subseq/region/{sample}.log", params: command="subseq", extra="--region 1:12", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_fx2tab: input: fastx="data/{sample}.fastq", output: tsv="out/fx2tab/{sample}.tsv", log: "logs/fx2tab/{sample}.log", params: command="fx2tab", extra="--name", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_grep_name: input: fastx="data/{sample}.fastq", pattern="data/name.txt", output: fastx="out/grep/name/{sample}.fastq.gz", log: "logs/grep/name/{sample}.log", params: command="grep", extra="--by-name", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_grep_seq: input: fastx="data/{sample}.fastq", pattern="data/seq.txt", output: fastx="out/grep/seq/{sample}.fastq.gz", log: "logs/grep/seq/{sample}.log", params: command="grep", extra="--by-seq", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_rmdup_name: input: fastx="data/{sample}.fastq", output: fastx="out/rmdup/name/{sample}.fastq.gz", dup_num="out/rmdup/name/{sample}.num.txt", dup_seqs="out/rmdup/name/{sample}.seq.txt", log: "logs/rmdup/name/{sample}.log", params: command="rmdup", extra="--by-name", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_rmdup_seq: input: fastx="data/{sample}.fastq", output: fastx="out/rmdup/seq/{sample}.fastq.gz", dup_num="out/rmdup/seq/{sample}.num.txt", dup_seqs="out/rmdup/seq/{sample}.seq.txt", log: "logs/rmdup/seq/{sample}.log", params: command="rmdup", extra="--by-seq", threads: 2 wrapper: "v3.0.1/bio/seqkit" rule seqkit_stats: input: fastx="data/{sample}.fastq", output: stats="out/stats/{sample}.tsv", log: "logs/stats/{sample}.log", params: command="stats", extra="--all --tabular", threads: 2 wrapper: "v3.0.1/bio/seqkit" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * First `input` and `output` file is considered to be the main one. * Keys for extra `input` and `output` files need to match `seqkit` arguments without the `-file` suffix (if present). Software dependencies --------------------- * ``seqkit=2.6.1`` Input/Output ------------ **Input:** * input file(s) **Output:** * output file(s) Params ------ * ``command``: SeqKit command to use. * ``extra``: Optional parameters. Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2023, Filipe G. Vieira" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra_input = " ".join( [ f"--{key.replace('_','-')} {value}" if key in ["bed", "gtf"] else f"--{key.replace('_','-')}-file {value}" for key, value in snakemake.input.items() ][1:] ) extra_output = " ".join( [ f"--{key.replace('_','-')} {value}" if key in ["read1", "read2"] else f"--{key.replace('_','-')}-file {value}" for key, value in snakemake.output.items() ][1:] ) shell( "seqkit {snakemake.params.command}" " --threads {snakemake.threads}" " {extra_input}" " {extra_output}" " {extra}" " --out-file {snakemake.output[0]}" " {snakemake.input[0]}" " {log}" ) .. |nl| raw:: html