MERGEBED
Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/merge.html
Example
This wrapper can be used in the following way:
rule bedtools_merge:
input:
# Multiple bed-files can be added as list
"A.bed"
output:
"A.merged.bed"
params:
## Add optional parameters
extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
log:
"logs/merge/A.log"
wrapper:
"v5.7.0/bio/bedtools/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Warning: If multiple files are provided in input, then this wrapper requires exactly 3 threads. Else, it requires exactly one thread.
Software dependencies
bedtools=2.31.1
Input/Output
Input:
Path or list of paths to interval(s) file(s) (BED/GFF/VCF/BAM)
Output:
Path to merged interval(s) file.
Params
extra
: additional program arguments (except for -i)
Code
__author__ = "Jan Forster, Felix Mölder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if len(snakemake.input) > 1:
if all(f.endswith(".gz") for f in snakemake.input):
cat = "zcat"
elif all(not f.endswith(".gz") for f in snakemake.input):
cat = "cat"
else:
raise ValueError("Input files must be all compressed or uncompressed.")
shell(
"({cat} {snakemake.input} | "
"sort -k1,1 -k2,2n | "
"bedtools merge {extra} "
"-i stdin > {snakemake.output}) "
" {log}"
)
else:
shell(
"( bedtools merge"
" {extra}"
" -i {snakemake.input}"
" > {snakemake.output})"
" {log}"
)