SEQTK
Toolkit for processing sequences in FASTA/Q formats
URL: https://github.com/lh3/seqtk
Example
This wrapper can be used in the following way:
rule seqtk_seq_fq2fas:
input:
"reads/{prefix}.fastq",
output:
"results/fq2fas/{prefix}.fasta",
log:
"logs/fq2fas/{prefix}.log",
params:
command="seq",
extra="-A",
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
rule seqtk_seq_convBQ:
input:
"reads/{prefix}.fastq",
output:
"results/convBQ/{prefix}.fasta",
log:
"logs/convBQ/{prefix}.log",
params:
command="seq",
extra="-aQ 64 -q 20 -n N",
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
rule seqtk_subseq_list:
input:
"reads/{prefix}.fastq",
"reads/id.list",
output:
"results/subseq_list/{prefix}.fq.gz",
log:
"logs/subseq_list/{prefix}.log",
params:
command="subseq",
extra="",
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
rule seqtk_mergepe:
input:
r1="reads/{sample}.1.fastq.gz",
r2="reads/{sample}.2.fastq.gz",
output:
merged="results/mergepe/{sample}.fastq.gz",
log:
"logs/mergepe/{sample}.log",
params:
command="mergepe",
compress_lvl=9,
threads: 2
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
rule seqtk_sample_se:
input:
"reads/{sample}.fastq.gz",
output:
"results/sample_se/{sample}.fastq.gz",
log:
"logs/sample_se/{sample}.log",
params:
command="sample",
n=3,
extra="-s 12345",
threads: 1
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
rule seqtk_sample_pe:
input:
f1="reads/{sample}.1.fastq.gz",
f2="reads/{sample}.2.fastq.gz",
output:
f1="results/sample_pe/{sample}.1.fastq.gz",
f2="results/sample_pe/{sample}.2.fastq.gz",
log:
"logs/sample_pe/{sample}.log",
params:
command="sample",
n=3,
extra="-s 12345",
threads: 1
wrapper:
"v5.5.2-17-g33d5b76/bio/seqtk"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Multiple threads can be used during compression of the output file.
Software dependencies
seqtk=1.4
pigz
Input/Output
Input:
fastx file(s) (can be gzip compressed)
Output:
fastn files (can be gzip compressed)
Params
n
: number of reads after subsampling (for sample)extra
: additional program options (e.g. -s for sample or -b/-e for trimfq)compress_lvl
: compression level (see gzip manual for details)
Code
"""Snakemake wrapper for SeqTk."""
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=False)
extra = snakemake.params.get("extra", "")
compress_lvl = snakemake.params.get("compress_lvl", "6")
pipe_comp = (
f"| pigz --processes {snakemake.threads} -{compress_lvl} --stdout"
if snakemake.output[0].endswith(".gz")
else ""
)
if snakemake.params.command == "sample":
n_reads = snakemake.params.get("n", "")
assert len(snakemake.input) == len(
snakemake.output
), "Command 'sample' requires same number of input and output files."
for in_fx, out_fx in zip(snakemake.input, snakemake.output):
shell(
"(seqtk {snakemake.params.command} {extra} {in_fx} {n_reads} {pipe_comp} > {out_fx}) {log}"
)
else:
shell(
"(seqtk {snakemake.params.command} {extra} {snakemake.input} {pipe_comp} > {snakemake.output}) {log}"
)