SEQKIT GENERIC WRAPPER

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/seqkit?label=version%20update%20pull%20requests

Run SeqKit.

URL: https://bioinf.shenwei.me/seqkit/

Example

This wrapper can be used in the following way:

rule seqkit_seq:
    input:
        fasta="data/{sample}.fa",
    output:
        fasta="out/seq/{sample}.fa.gz",
    log:
        "logs/seq/{sample}.log",
    params:
        command="seq",
        extra="--min-len 10",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_subseq_bed:
    input:
        fasta="data/{sample}.fa",
        bed="data/{sample}.bed",
    output:
        fasta="out/subseq/bed/{sample}.fa.gz",
    log:
        "logs/subseq/bed/{sample}.log",
    params:
        command="subseq",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_subseq_gtf:
    input:
        fasta="data/{sample}.fa",
        gtf="data/{sample}.gtf",
    output:
        fasta="out/subseq/gtf/{sample}.fa.gz",
    log:
        "logs/subseq/gtf/{sample}.log",
    params:
        command="subseq",
        extra="--feature CDS",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_subseq_region:
    input:
        fasta="data/{sample}.fa",
    output:
        fasta="out/subseq/region/{sample}.fa.gz",
    log:
        "logs/subseq/region/{sample}.log",
    params:
        command="subseq",
        extra="--region 1:12",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_fx2tab:
    input:
        fastx="data/{sample}.fastq",
    output:
        tsv="out/fx2tab/{sample}.tsv",
    log:
        "logs/fx2tab/{sample}.log",
    params:
        command="fx2tab",
        extra="--name",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_grep_name:
    input:
        fastx="data/{sample}.fastq",
        pattern="data/name.txt",
    output:
        fastx="out/grep/name/{sample}.fastq.gz",
    log:
        "logs/grep/name/{sample}.log",
    params:
        command="grep",
        extra="--by-name",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_grep_seq:
    input:
        fastx="data/{sample}.fastq",
        pattern="data/seq.txt",
    output:
        fastx="out/grep/seq/{sample}.fastq.gz",
    log:
        "logs/grep/seq/{sample}.log",
    params:
        command="grep",
        extra="--by-seq",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_rmdup_name:
    input:
        fastx="data/{sample}.fastq",
    output:
        fastx="out/rmdup/name/{sample}.fastq.gz",
        dup_num="out/rmdup/name/{sample}.num.txt",
        dup_seqs="out/rmdup/name/{sample}.seq.txt",
    log:
        "logs/rmdup/name/{sample}.log",
    params:
        command="rmdup",
        extra="--by-name",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_rmdup_seq:
    input:
        fastx="data/{sample}.fastq",
    output:
        fastx="out/rmdup/seq/{sample}.fastq.gz",
        dup_num="out/rmdup/seq/{sample}.num.txt",
        dup_seqs="out/rmdup/seq/{sample}.seq.txt",
    log:
        "logs/rmdup/seq/{sample}.log",
    params:
        command="rmdup",
        extra="--by-seq",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"


rule seqkit_stats:
    input:
        fastx="data/{sample}.fastq",
    output:
        stats="out/stats/{sample}.tsv",
    log:
        "logs/stats/{sample}.log",
    params:
        command="stats",
        extra="--all --tabular",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/seqkit"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • First input and output file is considered to be the main one.

  • Keys for extra input and output files need to match seqkit arguments without the -file suffix (if present).

Software dependencies

  • seqkit=2.8.1

Input/Output

Input:

  • input file(s)

Output:

  • output file(s)

Params

  • command: SeqKit command to use.

  • extra: Optional parameters.

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


extra_input = " ".join(
    [
        (
            f"--{key.replace('_','-')} {value}"
            if key in ["bed", "gtf"]
            else f"--{key.replace('_','-')}-file {value}"
        )
        for key, value in snakemake.input.items()
    ][1:]
)

extra_output = " ".join(
    [
        (
            f"--{key.replace('_','-')} {value}"
            if key in ["read1", "read2"]
            else f"--{key.replace('_','-')}-file {value}"
        )
        for key, value in snakemake.output.items()
    ][1:]
)


shell(
    "seqkit {snakemake.params.command}"
    " --threads {snakemake.threads}"
    " {extra_input}"
    " {extra_output}"
    " {extra}"
    " --out-file {snakemake.output[0]}"
    " {snakemake.input[0]}"
    " {log}"
)