SEQKIT GENERIC WRAPPER

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/seqkit?label=version%20update%20pull%20requests

Run SeqKit.

URL: https://bioinf.shenwei.me/seqkit/

Example

This wrapper can be used in the following way:

rule seqkit_seq:
    input:
        fasta="data/{sample}.fa",
    output:
        fasta="out/seq/{sample}.fa.gz",
    log:
        "logs/seq/{sample}.log",
    params:
        command="seq",
        extra="--min-len 10",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_subseq_bed:
    input:
        fasta="data/{sample}.fa",
        bed="data/{sample}.bed",
    output:
        fasta="out/subseq/bed/{sample}.fa.gz",
    log:
        "logs/subseq/bed/{sample}.log",
    params:
        command="subseq",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_subseq_gtf:
    input:
        fasta="data/{sample}.fa",
        gtf="data/{sample}.gtf",
    output:
        fasta="out/subseq/gtf/{sample}.fa.gz",
    log:
        "logs/subseq/gtf/{sample}.log",
    params:
        command="subseq",
        extra="--feature CDS",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_subseq_region:
    input:
        fasta="data/{sample}.fa",
    output:
        fasta="out/subseq/region/{sample}.fa.gz",
    log:
        "logs/subseq/region/{sample}.log",
    params:
        command="subseq",
        extra="--region 1:12",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_fx2tab:
    input:
        fastx="data/{sample}.fastq",
    output:
        tsv="out/fx2tab/{sample}.tsv",
    log:
        "logs/fx2tab/{sample}.log",
    params:
        command="fx2tab",
        extra="--name",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_grep_name:
    input:
        fastx="data/{sample}.fastq",
        pattern="data/name.txt",
    output:
        fastx="out/grep/name/{sample}.fastq.gz",
    log:
        "logs/grep/name/{sample}.log",
    params:
        command="grep",
        extra="--by-name",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_grep_seq:
    input:
        fastx="data/{sample}.fastq",
        pattern="data/seq.txt",
    output:
        fastx="out/grep/seq/{sample}.fastq.gz",
    log:
        "logs/grep/seq/{sample}.log",
    params:
        command="grep",
        extra="--by-seq",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_rmdup_name:
    input:
        fastx="data/{sample}.fastq",
    output:
        fastx="out/rmdup/name/{sample}.fastq.gz",
        dup_num="out/rmdup/name/{sample}.num.txt",
        dup_seqs="out/rmdup/name/{sample}.seq.txt",
    log:
        "logs/rmdup/name/{sample}.log",
    params:
        command="rmdup",
        extra="--by-name",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_rmdup_seq:
    input:
        fastx="data/{sample}.fastq",
    output:
        fastx="out/rmdup/seq/{sample}.fastq.gz",
        dup_num="out/rmdup/seq/{sample}.num.txt",
        dup_seqs="out/rmdup/seq/{sample}.seq.txt",
    log:
        "logs/rmdup/seq/{sample}.log",
    params:
        command="rmdup",
        extra="--by-seq",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"


rule seqkit_stats:
    input:
        fastx="data/{sample}.fastq",
    output:
        stats="out/stats/{sample}.tsv",
    log:
        "logs/stats/{sample}.log",
    params:
        command="stats",
        extra="--all --tabular",
    threads: 2
    wrapper:
        "v3.8.0/bio/seqkit"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

First input and output file is considered to be the main one.
Keys for extra input and output files need to match seqkit arguments without the -file suffix (if present).

Software dependencies

seqkit=2.8.1

Input/Output

Input:

input file(s)

Output:

output file(s)

Params

command: SeqKit command to use.
extra: Optional parameters.

Authors

Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


extra_input = " ".join(
    [
        (
            f"--{key.replace('_','-')} {value}"
            if key in ["bed", "gtf"]
            else f"--{key.replace('_','-')}-file {value}"
        )
        for key, value in snakemake.input.items()
    ][1:]
)

extra_output = " ".join(
    [
        (
            f"--{key.replace('_','-')} {value}"
            if key in ["read1", "read2"]
            else f"--{key.replace('_','-')}-file {value}"
        )
        for key, value in snakemake.output.items()
    ][1:]
)


shell(
    "seqkit {snakemake.params.command}"
    " --threads {snakemake.threads}"
    " {extra_input}"
    " {extra_output}"
    " {extra}"
    " --out-file {snakemake.output[0]}"
    " {snakemake.input[0]}"
    " {log}"
)