SORTBED
Sorts bed, vcf or gff files by chromosome and other criteria.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/sort.html
Example
This wrapper can be used in the following way:
rule bedtools_sort:
input:
in_file="a.bed"
output:
"results/bed-sorted/a.sorted.bed"
params:
## Add optional parameters for sorting order
extra="-sizeA"
log:
"logs/a.sorted.bed.log"
wrapper:
"v3.0.0-4-ga3709f0/bio/bedtools/sort"
rule bedtools_sort_bed:
input:
in_file="a.bed",
# an optional sort file can be set as genomefile by the variable genome or
# as fasta index file by the variable faidx
genome="dummy.genome"
output:
"results/bed-sorted/a.sorted_by_file.bed"
params:
## Add optional parameters
extra=""
log:
"logs/a.sorted.bed.log"
wrapper:
"v3.0.0-4-ga3709f0/bio/bedtools/sort"
rule bedtools_sort_vcf:
input:
in_file="a.vcf",
# an optional sort file can be set either as genomefile by the variable genome or
# as fasta index file by the variable faidx
faidx="genome.fasta.fai"
output:
"results/vcf-sorted/a.sorted_by_file.vcf"
params:
## Add optional parameters
extra=""
log:
"logs/a.sorted.vcf.log"
wrapper:
"v3.0.0-4-ga3709f0/bio/bedtools/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
This program/wrapper does not handle multi-threading.
Software dependencies
bedtools=2.31.1
Input/Output
Input:
in_file
: Path to interval file (BED/GFF/VCF formatted)genome
: optional a tab separating file that determines the sorting order and contains the chromosome names in the first columnfaidx
: optional a fasta index file
Output:
Path to the sorted interval file (BED/GFF/VCF formatted)
Params
extra
: additional program arguments (except for -i, -g, or –faidx)
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if genome:
extra += " -g {}".format(genome)
elif faidx:
extra += " -faidx {}".format(faidx)
shell(
"(bedtools sort"
" {extra}"
" -i {snakemake.input.in_file}"
" > {snakemake.output[0]})"
" {log}"
)