.. _`bio/bedtools/split`: BEDTOOLS SPLIT ============== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/bedtools/split?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/bedtools/split Splits a BED file balancing the number of subfiles not just by number of lines, but also by total number of base pairs in each sub file. **URL**: https://bedtools.readthedocs.io/ Example ------- This wrapper can be used in the following way: .. code-block:: python scattergather: n_bed=2, rule bedtools_split: input: bed="a.bed", output: scatter.n_bed("results/a.{scatteritem}.bed"), log: "logs/a.split.log", params: ## Add optional parameters for spliting order extra="--algorithm size", wrapper: "v3.0.1/bio/bedtools/split" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * This program/wrapper does not handle multi-threading. * 'bedtools split' is currently undocumented, even though it was added to 'bedtools' on version `2.23.0` (https://bedtools.readthedocs.io/en/latest/content/history.html#version-2-23-0-22-feb-2015). Software dependencies --------------------- * ``bedtools=2.31.1`` Input/Output ------------ **Input:** * ``bed``: Path to BED file **Output:** * Several BED files Params ------ * ``extra``: additional program arguments (except for `-i`, `-n`, or `-p`) Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2023, Filipe G. Vieira" __license__ = "MIT" import tempfile from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) n_subfiles = len(snakemake.output) with tempfile.TemporaryDirectory() as tmpdir: shell( "bedtools split" " --input {snakemake.input.bed}" " --number {n_subfiles}" " {extra}" " --prefix {tmpdir}/out" " {log}" ) for i in range(n_subfiles): out_tmp = f"{tmpdir}/out.{i+1:05d}.bed" out = snakemake.output[i] shell("cat {out_tmp} > {out}") .. |nl| raw:: html