SEQTK-SUBSAMPLE-PE

Subsample reads from paired FASTQ files

URL:

Example

This wrapper can be used in the following way:

rule seqtk_subsample_pe:
    input:
        f1="{sample}.1.fastq.gz",
        f2="{sample}.2.fastq.gz"
    output:
        f1="{sample}.1.subsampled.fastq.gz",
        f2="{sample}.2.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "v1.2.1/bio/seqtk/subsample/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • seqtk==1.3
  • pigz=2.3

Input/Output

Input:

  • paired fastq files (can be gzip compressed)

Output:

  • subsampled paired fastq files (gzip compressed)

Params

  • n: number of reads after subsampling
  • seed: seed to initialize a pseudorandom number generator

Authors

  • Fabian Kilpert

Code

"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f1} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f1} "
    "&& "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f2} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f2} "
    ") {log} "
)