BEDTOOLS SPLIT

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/bedtools/split?label=version%20update%20pull%20requests

Splits a BED file balancing the number of subfiles not just by number of lines, but also by total number of base pairs in each sub file.

URL: https://bedtools.readthedocs.io/

Example

This wrapper can be used in the following way:

scattergather:
    n_bed=2,


rule bedtools_split:
    input:
        bed="a.bed",
    output:
        scatter.n_bed("results/a.{scatteritem}.bed"),
    log:
        "logs/a.split.log",
    params:
        ## Add optional parameters for spliting order
        extra="--algorithm size",
    wrapper:
        "v3.9.0/bio/bedtools/split"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Software dependencies

  • bedtools=2.31.1

Input/Output

Input:

  • bed: Path to BED file

Output:

  • Several BED files

Params

  • extra: additional program arguments (except for -i, -n, or -p)

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"

import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n_subfiles = len(snakemake.output)

with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "bedtools split"
        " --input {snakemake.input.bed}"
        " --number {n_subfiles}"
        " {extra}"
        " --prefix {tmpdir}/out"
        " {log}"
    )

    for i in range(n_subfiles):
        out_tmp = f"{tmpdir}/out.{i+1:05d}.bed"
        out = snakemake.output[i]
        shell("cat {out_tmp} > {out}")