GRIDSS PREPROCESS¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess
pre-processes input BAM files. Can be run per input file. Documentation at: https://github.com/PapenfussLab/gridss
Example¶
This wrapper can be used in the following way:
WORKING_DIR="working_dir"
rule gridss_preprocess:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
output:
multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics")
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
threads:
8
wrapper:
"v1.19.2/bio/gridss/preprocess"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gridss==2.9.4
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss preprocess"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s preprocess " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} "
"--workingdir {snakemake.params.workingdir} "
"{snakemake.input.bam} "
"{extra}) {log}"
)