SEQKIT GENERIC WRAPPER¶
Run SeqKit.
URL: https://bioinf.shenwei.me/seqkit/usage/
Example¶
This wrapper can be used in the following way:
rule seqkit_seq:
input:
fasta="data/{sample}.fa",
output:
fasta="out/seq/{sample}.fa.gz",
log:
"logs/seq/{sample}.log",
params:
command="seq",
extra="--min-len 10",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_subseq_bed:
input:
fasta="data/{sample}.fa",
bed="data/{sample}.bed",
output:
fasta="out/subseq/bed/{sample}.fa.gz",
log:
"logs/subseq/bed/{sample}.log",
params:
command="subseq",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_subseq_gtf:
input:
fasta="data/{sample}.fa",
gtf="data/{sample}.gtf",
output:
fasta="out/subseq/gtf/{sample}.fa.gz",
log:
"logs/subseq/gtf/{sample}.log",
params:
command="subseq",
extra="--feature CDS",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_subseq_region:
input:
fasta="data/{sample}.fa",
output:
fasta="out/subseq/region/{sample}.fa.gz",
log:
"logs/subseq/region/{sample}.log",
params:
command="subseq",
extra="--region 1:12",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_fx2tab:
input:
fastx="data/{sample}.fastq",
output:
tsv="out/fx2tab/{sample}.tsv",
log:
"logs/fx2tab/{sample}.log",
params:
command="fx2tab",
extra="--name",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_grep_name:
input:
fastx="data/{sample}.fastq",
pattern="data/name.txt",
output:
fastx="out/grep/name/{sample}.fastq.gz",
log:
"logs/grep/name/{sample}.log",
params:
command="grep",
extra="--by-name",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_grep_seq:
input:
fastx="data/{sample}.fastq",
pattern="data/seq.txt",
output:
fastx="out/grep/seq/{sample}.fastq.gz",
log:
"logs/grep/seq/{sample}.log",
params:
command="grep",
extra="--by-seq",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_rmdup_name:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/name/{sample}.fastq.gz",
dup_num="out/rmdup/name/{sample}.num.txt",
dup_seqs="out/rmdup/name/{sample}.seq.txt",
log:
"logs/rmdup/name/{sample}.log",
params:
command="rmdup",
extra="--by-name",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_rmdup_seq:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/seq/{sample}.fastq.gz",
dup_num="out/rmdup/seq/{sample}.num.txt",
dup_seqs="out/rmdup/seq/{sample}.seq.txt",
log:
"logs/rmdup/seq/{sample}.log",
params:
command="rmdup",
extra="--by-seq",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
rule seqkit_stats:
input:
fastx="data/{sample}.fastq",
output:
stats="out/stats/{sample}.tsv",
log:
"logs/stats/{sample}.log",
params:
command="stats",
extra="--all --tabular",
threads: 2
wrapper:
"v2.6.0/bio/seqkit"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- First input and output file is considered to be the main one.
- Keys for extra input and output files need to match seqkit arguments without the -file suffix (if present).
Software dependencies¶
seqkit=2.5.0
Params¶
command
: SeqKit command to use.extra
: Optional parameters.
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra_input = " ".join(
[
f"--{key.replace('_','-')} {value}"
if key in ["bed", "gtf"]
else f"--{key.replace('_','-')}-file {value}"
for key, value in snakemake.input.items()
][1:]
)
extra_output = " ".join(
[
f"--{key.replace('_','-')} {value}"
if key in ["read1", "read2"]
else f"--{key.replace('_','-')}-file {value}"
for key, value in snakemake.output.items()
][1:]
)
shell(
"seqkit {snakemake.params.command}"
" --threads {snakemake.threads}"
" {extra_input}"
" {extra_output}"
" {extra}"
" --out-file {snakemake.output[0]}"
" {snakemake.input[0]}"
" {log}"
)