SEQKIT GENERIC WRAPPER
Run SeqKit.
URL: https://bioinf.shenwei.me/seqkit/
Example
This wrapper can be used in the following way:
rule seqkit_seq:
input:
fasta="data/{sample}.fa",
output:
fasta="out/seq/{sample}.fa.gz",
log:
"logs/seq/{sample}.log",
params:
command="seq",
extra="--min-len 10",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_subseq_bed:
input:
fasta="data/{sample}.fa",
bed="data/{sample}.bed",
output:
fasta="out/subseq/bed/{sample}.fa.gz",
log:
"logs/subseq/bed/{sample}.log",
params:
command="subseq",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_subseq_gtf:
input:
fasta="data/{sample}.fa",
gtf="data/{sample}.gtf",
output:
fasta="out/subseq/gtf/{sample}.fa.gz",
log:
"logs/subseq/gtf/{sample}.log",
params:
command="subseq",
extra="--feature CDS",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_subseq_region:
input:
fasta="data/{sample}.fa",
output:
fasta="out/subseq/region/{sample}.fa.gz",
log:
"logs/subseq/region/{sample}.log",
params:
command="subseq",
extra="--region 1:12",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_fx2tab:
input:
fastx="data/{sample}.fastq",
output:
tsv="out/fx2tab/{sample}.tsv",
log:
"logs/fx2tab/{sample}.log",
params:
command="fx2tab",
extra="--name",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_grep_name:
input:
fastx="data/{sample}.fastq",
pattern="data/name.txt",
output:
fastx="out/grep/name/{sample}.fastq.gz",
log:
"logs/grep/name/{sample}.log",
params:
command="grep",
extra="--by-name",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_grep_seq:
input:
fastx="data/{sample}.fastq",
pattern="data/seq.txt",
output:
fastx="out/grep/seq/{sample}.fastq.gz",
log:
"logs/grep/seq/{sample}.log",
params:
command="grep",
extra="--by-seq",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_rmdup_name:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/name/{sample}.fastq.gz",
dup_num="out/rmdup/name/{sample}.num.txt",
dup_seqs="out/rmdup/name/{sample}.seq.txt",
log:
"logs/rmdup/name/{sample}.log",
params:
command="rmdup",
extra="--by-name",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_rmdup_seq:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/seq/{sample}.fastq.gz",
dup_num="out/rmdup/seq/{sample}.num.txt",
dup_seqs="out/rmdup/seq/{sample}.seq.txt",
log:
"logs/rmdup/seq/{sample}.log",
params:
command="rmdup",
extra="--by-seq",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
rule seqkit_stats:
input:
fastx="data/{sample}.fastq",
output:
stats="out/stats/{sample}.tsv",
log:
"logs/stats/{sample}.log",
params:
command="stats",
extra="--all --tabular",
threads: 2
wrapper:
"v3.8.0/bio/seqkit"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
First input and output file is considered to be the main one.
Keys for extra input and output files need to match seqkit arguments without the -file suffix (if present).
Software dependencies
seqkit=2.8.1
Input/Output
Input:
input file(s)
Output:
output file(s)
Params
command
: SeqKit command to use.extra
: Optional parameters.
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra_input = " ".join(
[
(
f"--{key.replace('_','-')} {value}"
if key in ["bed", "gtf"]
else f"--{key.replace('_','-')}-file {value}"
)
for key, value in snakemake.input.items()
][1:]
)
extra_output = " ".join(
[
(
f"--{key.replace('_','-')} {value}"
if key in ["read1", "read2"]
else f"--{key.replace('_','-')}-file {value}"
)
for key, value in snakemake.output.items()
][1:]
)
shell(
"seqkit {snakemake.params.command}"
" --threads {snakemake.threads}"
" {extra_input}"
" {extra_output}"
" {extra}"
" --out-file {snakemake.output[0]}"
" {snakemake.input[0]}"
" {log}"
)