.. _`bio/rasusa`: RASUSA ====== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/rasusa?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/rasusa Randomly subsample sequencing reads to a specified coverage. **URL**: https://github.com/mbhall88/rasusa Example ------- This wrapper can be used in the following way: .. code-block:: python rule subsample: input: r1="{sample}.r1.fq", r2="{sample}.r2.fq", output: r1="{sample}.subsampled.r1.fq", r2="{sample}.subsampled.r2.fq", params: options="--seed 15", genome_size="3mb", # required, unless `bases` is given coverage=20, # required, unless `bases is given #bases="2gb" log: "logs/subsample/{sample}.log", wrapper: "v3.0.1/bio/rasusa" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``rasusa=0.7.1`` Input/Output ------------ **Input:** * Reads to subsample in FASTA/Q format. Input files can be named or unnamed. **Output:** * File paths to write subsampled reads to. If using paired-end data, make sure there are two output files in the same order as the input. Params ------ * ``bases``: Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB |nl| If this option is given, ``coverage`` and ``genome_size`` are ignored * ``coverage``: The desired coverage to sub-sample the reads to. |nl| If ``bases`` is not provided, this option and ``genome_size`` are required * ``genome_size``: Genome size to calculate coverage with respect to. e.g., 4.3kb, 7Tb, 9000, 4.1MB |nl| Alternatively, a FASTA/Q index file can be provided and the genome size will be set to the sum of all reference sequences. |nl| If ``bases`` is not provided, this option and ``coverage`` are required * ``options``: Any other options as listed in `the docs `_. Authors ------- * Michael Hall Code ---- .. code-block:: python __author__ = "Michael Hall" __copyright__ = "Copyright 2020, Michael Hall" __email__ = "michael@mbh.sh" __license__ = "MIT" from snakemake.shell import shell options = snakemake.params.get("options", "") bases = snakemake.params.get("bases") if bases is not None: options += " -b {}".format(bases) else: covg = snakemake.params.get("coverage") gsize = snakemake.params.get("genome_size") if covg is None or gsize is None: raise ValueError( "If `bases` is not given, then `coverage` and `genome_size` must be" ) options += " -g {gsize} -c {covg}".format(gsize=gsize, covg=covg) shell("rasusa {options} -i {snakemake.input} -o {snakemake.output} 2> {snakemake.log}") .. |nl| raw:: html