SEQTK-SUBSAMPLE-SE

Subsample reads from FASTQ file

Example

This wrapper can be used in the following way:

rule seqtk_subsample_se:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "v1.9.0/bio/seqtk/subsample/se"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • seqtk==1.3
  • pigz=2.3

Input/Output

Input:

  • fastq file (can be gzip compressed)

Output:

  • subsampled fastq file (gzip compressed)

Params

  • n: number of reads after subsampling
  • seed: seed to initialize a pseudorandom number generator

Authors

  • Fabian Kilpert

Code

"""Snakemake wrapper for subsampling reads from FASTQ file using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output} "
    ") {log} "
)