SEQTK-SUBSAMPLE-PE¶
Subsample reads from paired FASTQ files
Example¶
This wrapper can be used in the following way:
rule seqtk_subsample_pe:
input:
f1="{sample}.1.fastq.gz",
f2="{sample}.2.fastq.gz"
output:
f1="{sample}.1.subsampled.fastq.gz",
f2="{sample}.2.subsampled.fastq.gz"
params:
n=3,
seed=12345
log:
"logs/seqtk_subsample/{sample}.log"
threads:
1
wrapper:
"v1.21.1/bio/seqtk/subsample/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
seqtk==1.3
pigz=2.3
Input/Output¶
Input:
- paired fastq files (can be gzip compressed)
Output:
- subsampled paired fastq files (gzip compressed)
Params¶
n
: number of reads after subsamplingseed
: seed to initialize a pseudorandom number generator
Authors¶
- Fabian Kilpert
Code¶
"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"( "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f1} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f1} "
"&& "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f2} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f2} "
") {log} "
)