BEDTOOLS SPLIT#
Splits a BED file balancing the number of subfiles not just by number of lines, but also by total number of base pairs in each sub file.
URL: https://bedtools.readthedocs.io/
Example#
This wrapper can be used in the following way:
scattergather:
n_bed=2,
rule bedtools_split:
input:
bed="a.bed",
output:
scatter.n_bed("results/a.{scatteritem}.bed"),
log:
"logs/a.split.log",
params:
## Add optional parameters for spliting order
extra="--algorithm size",
wrapper:
"v3.0.2-2-g0dea6a1/bio/bedtools/split"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes#
This program/wrapper does not handle multi-threading.
‘bedtools split’ is currently undocumented, even though it was added to ‘bedtools’ on version 2.23.0 (https://bedtools.readthedocs.io/en/latest/content/history.html#version-2-23-0-22-feb-2015).
Software dependencies#
bedtools=2.31.1
Input/Output#
Input:
bed
: Path to BED file
Output:
Several BED files
Params#
extra
: additional program arguments (except for -i, -n, or -p)
Code#
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n_subfiles = len(snakemake.output)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"bedtools split"
" --input {snakemake.input.bed}"
" --number {n_subfiles}"
" {extra}"
" --prefix {tmpdir}/out"
" {log}"
)
for i in range(n_subfiles):
out_tmp = f"{tmpdir}/out.{i+1:05d}.bed"
out = snakemake.output[i]
shell("cat {out_tmp} > {out}")