RASUSA#
Randomly subsample sequencing reads to a specified coverage.
URL: https://github.com/mbhall88/rasusa
Example#
This wrapper can be used in the following way:
rule subsample:
input:
r1="{sample}.r1.fq",
r2="{sample}.r2.fq",
output:
r1="{sample}.subsampled.r1.fq",
r2="{sample}.subsampled.r2.fq",
params:
options="--seed 15",
genome_size="3mb", # required, unless `bases` is given
coverage=20, # required, unless `bases is given
#bases="2gb"
log:
"logs/subsample/{sample}.log",
wrapper:
"v3.0.2-2-g0dea6a1/bio/rasusa"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies#
rasusa=0.7.1
Input/Output#
Input:
Reads to subsample in FASTA/Q format. Input files can be named or unnamed.
Output:
File paths to write subsampled reads to. If using paired-end data, make sure there are two output files in the same order as the input.
Params#
bases
: Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB
If this option is given,coverage
andgenome_size
are ignoredcoverage
: The desired coverage to sub-sample the reads to.
Ifbases
is not provided, this option andgenome_size
are requiredgenome_size
: Genome size to calculate coverage with respect to. e.g., 4.3kb, 7Tb, 9000, 4.1MB
Alternatively, a FASTA/Q index file can be provided and the genome size will be set to the sum of all reference sequences.
Ifbases
is not provided, this option andcoverage
are requiredoptions
: Any other options as listed in the docs.
Code#
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
options = snakemake.params.get("options", "")
bases = snakemake.params.get("bases")
if bases is not None:
options += " -b {}".format(bases)
else:
covg = snakemake.params.get("coverage")
gsize = snakemake.params.get("genome_size")
if covg is None or gsize is None:
raise ValueError(
"If `bases` is not given, then `coverage` and `genome_size` must be"
)
options += " -g {gsize} -c {covg}".format(gsize=gsize, covg=covg)
shell("rasusa {options} -i {snakemake.input} -o {snakemake.output} 2> {snakemake.log}")