GRIDSS PREPROCESS

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess pre-processes input BAM files. Can be run per input file. Documentation at: https://github.com/PapenfussLab/gridss

Example

This wrapper can be used in the following way:

WORKING_DIR="working_dir"

rule gridss_preprocess:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
    output:
        multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics")
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
    threads:
        8
    wrapper:
        "v1.9.0/bio/gridss/preprocess"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • gridss==2.9.4

Authors

  • Christopher Schröder

Code

"""Snakemake wrapper for gridss preprocess"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s preprocess "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "
    "--workingdir {snakemake.params.workingdir} "
    "{snakemake.input.bam} "
    "{extra}) {log}"
)