SEQTK

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/seqtk?label=version%20update%20pull%20requests

Toolkit for processing sequences in FASTA/Q formats

URL: https://github.com/lh3/seqtk

Example

This wrapper can be used in the following way:

rule seqtk_seq_fq2fas:
    input:
        "reads/{prefix}.fastq",
    output:
        "results/fq2fas/{prefix}.fasta",
    log:
        "logs/fq2fas/{prefix}.log",
    params:
        command="seq",
        extra="-A",
    wrapper:
        "v3.4.1/bio/seqtk"


rule seqtk_seq_convBQ:
    input:
        "reads/{prefix}.fastq",
    output:
        "results/convBQ/{prefix}.fasta",
    log:
        "logs/convBQ/{prefix}.log",
    params:
        command="seq",
        extra="-aQ 64 -q 20 -n N",
    wrapper:
        "v3.4.1/bio/seqtk"


rule seqtk_subseq_list:
    input:
        "reads/{prefix}.fastq",
        "reads/id.list",
    output:
        "results/subseq_list/{prefix}.fq.gz",
    log:
        "logs/subseq_list/{prefix}.log",
    params:
        command="subseq",
        extra="",
    wrapper:
        "v3.4.1/bio/seqtk"


rule seqtk_mergepe:
    input:
        r1="reads/{sample}.1.fastq.gz",
        r2="reads/{sample}.2.fastq.gz",
    output:
        merged="results/mergepe/{sample}.fastq.gz",
    log:
        "logs/mergepe/{sample}.log",
    params:
        command="mergepe",
        compress_lvl=9,
    threads: 2
    wrapper:
        "v3.4.1/bio/seqtk"


rule seqtk_sample_se:
    input:
        "reads/{sample}.fastq.gz",
    output:
        "results/sample_se/{sample}.fastq.gz",
    log:
        "logs/sample_se/{sample}.log",
    params:
        command="sample",
        n=3,
        extra="-s 12345",
    threads: 1
    wrapper:
        "v3.4.1/bio/seqtk"


rule seqtk_sample_pe:
    input:
        f1="reads/{sample}.1.fastq.gz",
        f2="reads/{sample}.2.fastq.gz",
    output:
        f1="results/sample_pe/{sample}.1.fastq.gz",
        f2="results/sample_pe/{sample}.2.fastq.gz",
    log:
        "logs/sample_pe/{sample}.log",
    params:
        command="sample",
        n=3,
        extra="-s 12345",
    threads: 1
    wrapper:
        "v3.4.1/bio/seqtk"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • Multiple threads can be used during compression of the output file.

Software dependencies

  • seqtk=1.4

  • pigz

Input/Output

Input:

  • fastx file(s) (can be gzip bcompressed)

Output:

  • fastn files (can be gzip bcompressed)

Params

  • n: number of reads after subsampling (for sample)

  • extra: additional program options (e.g. -s for sample or -b/-e for trimfq)

  • compress_lvl: compression level (see gzip manual for details)

Authors

  • Filipe G. Vieira

Code

"""Snakemake wrapper for SeqTk."""

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=False)
extra = snakemake.params.get("extra", "")
compress_lvl = snakemake.params.get("compress_lvl", "6")

pipe_comp = (
    f"| pigz --processes {snakemake.threads} -{compress_lvl} --stdout"
    if snakemake.output[0].endswith(".gz")
    else ""
)

if snakemake.params.command == "sample":
    n_reads = snakemake.params.get("n", "")
    assert len(snakemake.input) == len(
        snakemake.output
    ), "Command 'sample' requires same number of input and output files."
    for in_fx, out_fx in zip(snakemake.input, snakemake.output):
        shell(
            "(seqtk {snakemake.params.command} {extra} {in_fx} {n_reads} {pipe_comp} > {out_fx}) {log}"
        )
else:
    shell(
        "(seqtk {snakemake.params.command} {extra} {snakemake.input} {pipe_comp} > {snakemake.output}) {log}"
    )