SORTBED

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/bedtools/sort?label=version%20update%20pull%20requests

Sorts bed, vcf or gff files by chromosome and other criteria.

URL: https://bedtools.readthedocs.io/en/latest/content/tools/sort.html

Example

This wrapper can be used in the following way:

rule bedtools_sort:
    input:
        in_file="a.bed"
    output:
        "results/bed-sorted/a.sorted.bed"
    params:
        ## Add optional parameters for sorting order
        extra="-sizeA"
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v3.9.0-1-gc294552/bio/bedtools/sort"

rule bedtools_sort_bed:
    input:
        in_file="a.bed",
        # an optional sort file can be set as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        genome="dummy.genome"
    output:
        "results/bed-sorted/a.sorted_by_file.bed"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v3.9.0-1-gc294552/bio/bedtools/sort"

rule bedtools_sort_vcf:
    input:
        in_file="a.vcf",
        # an optional sort file can be set either as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        faidx="genome.fasta.fai"
    output:
        "results/vcf-sorted/a.sorted_by_file.vcf"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.vcf.log"
    wrapper:
        "v3.9.0-1-gc294552/bio/bedtools/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • This program/wrapper does not handle multi-threading.

Software dependencies

  • bedtools=2.31.1

Input/Output

Input:

  • in_file: Path to interval file (BED/GFF/VCF formatted)

  • genome: optional a tab separating file that determines the sorting order and contains the chromosome names in the first column

  • faidx: optional a fasta index file

Output:

  • Path to the sorted interval file (BED/GFF/VCF formatted)

Params

  • extra: additional program arguments (except for -i, -g, or –faidx)

Authors

  • Antonie Vietor

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

if genome:
    extra += " -g {}".format(genome)
elif faidx:
    extra += " -faidx {}".format(faidx)

shell(
    "(bedtools sort"
    " {extra}"
    " -i {snakemake.input.in_file}"
    " > {snakemake.output[0]})"
    " {log}"
)