BEDTOOLS SORT

Sorts bed, vcf or gff files by chromosome and other criteria, for more information please see bedtools sort documentation.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_sort:
    input:
        in_file="a.bed"
    output:
        "results/bed-sorted/a.sorted.bed"
    params:
        ## Add optional parameters for sorting order
        extra="-sizeA"
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v1.1.0/bio/bedtools/sort"

rule bedtools_sort_bed:
    input:
        in_file="a.bed",
        # an optional sort file can be set as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        genome="dummy.genome"
    output:
        "results/bed-sorted/a.sorted_by_file.bed"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v1.1.0/bio/bedtools/sort"

rule bedtools_sort_vcf:
    input:
        in_file="a.vcf",
        # an optional sort file can be set either as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        faidx="genome.fasta.fai"
    output:
        "results/vcf-sorted/a.sorted_by_file.vcf"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.vcf.log"
    wrapper:
        "v1.1.0/bio/bedtools/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bedtools=2.29

Input/Output

Input:

  • BED/GFF/VCF files
  • optional a tab separating file that determines the sorting order and contains the chromosome names in the first column
  • optional a fasta index file

Output:

  • complemented BED/GFF/VCF file

Authors

  • Antonie Vietor

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

if genome:
    extra += " -g {}".format(genome)
elif faidx:
    extra += " -faidx {}".format(faidx)

shell(
    "(bedtools sort"
    " {extra}"
    " -i {snakemake.input.in_file}"
    " > {snakemake.output[0]})"
    " {log}"
)