RASUSA

Randomly subsample sequencing reads to a specified coverage using rasusa.

Example

This wrapper can be used in the following way:

rule subsample:
    input:
        r1="{sample}.r1.fq",
        r2="{sample}.r2.fq",
    output:
        r1="{sample}.subsampled.r1.fq",
        r2="{sample}.subsampled.r2.fq",
    params:
        options="--seed 15",
        genome_size="3mb", # required
        coverage=20, # required
    log:
        "logs/subsample/{sample}.log",
    wrapper:
        "0.75.0/bio/rasusa"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • rasusa==0.3.0

Authors

  • Michael Hall

Code

__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


options = snakemake.params.get("options", "")


shell(
    "rasusa {options} -i {snakemake.input} -o {snakemake.output} "
    "-c {snakemake.params.coverage} -g {snakemake.params.genome_size} "
    "2> {snakemake.log}"
)