BEDTOOLS SORT

Sorts bed, vcf or gff files by chromosome and other criteria, for more information please see bedtools sort documentation.

Example

This wrapper can be used in the following way:

rule bedtools_sort:
    input:
        in_file="a.bed"
    output:
        "results/bed-sorted/a.sorted.bed"
    params:
        ## Add optional parameters for sorting order
        extra="-sizeA"
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "0.76.0/bio/bedtools/sort"

rule bedtools_sort_bed:
    input:
        in_file="a.bed",
        # an optional sort file can be set as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        genome="dummy.genome"
    output:
        "results/bed-sorted/a.sorted_by_file.bed"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "0.76.0/bio/bedtools/sort"

rule bedtools_sort_vcf:
    input:
        in_file="a.vcf",
        # an optional sort file can be set either as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        faidx="genome.fasta.fai"
    output:
        "results/vcf-sorted/a.sorted_by_file.vcf"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.vcf.log"
    wrapper:
        "0.76.0/bio/bedtools/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bedtools=2.29

Input/Output

Input:

  • BED/GFF/VCF files
  • optional a tab separating file that determines the sorting order and contains the chromosome names in the first column
  • optional a fasta index file

Output:

  • complemented BED/GFF/VCF file

Authors

  • Antonie Vietor

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

if genome:
    extra += " -g {}".format(genome)
elif faidx:
    extra += " -faidx {}".format(faidx)

shell(
    "(bedtools sort"
    " {extra}"
    " -i {snakemake.input.in_file}"
    " > {snakemake.output[0]})"
    " {log}"
)