.. _`bio/gridss/assemble`: GRIDSS ASSEMBLE =============== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gridss/assemble?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gridss/assemble GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. ``assemble`` performs GRIDSS breakend assembly. Documentation at: https://github.com/PapenfussLab/gridss Example ------- This wrapper can be used in the following way: .. code-block:: python WORKING_DIR = "working_dir" samples = ["A", "B"] preprocess_endings = ( ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics", ) assembly_endings = ( ".cigar_metrics", ".coverage.blacklist.bed", ".downsampled_0.bed", ".excluded_0.bed", ".idsv_metrics", ".mapq_metrics", ".quality_distribution.pdf", ".quality_distribution_metrics", ".subsetCalled_0.bed", ".sv.bam", ".sv.bam.bai", ".tag_metrics", ) reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img") rule gridss_assemble: input: bams=expand("mapped/{sample}.bam", sample=samples), bais=expand("mapped/{sample}.bam.bai", sample=samples), reference="reference/genome.fasta", dictionary="reference/genome.dict", indices=multiext("reference/genome.fasta", *reference_index_endings), preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings) output: assembly="assembly/group.bam", assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings) params: extra="--jvmheap 1g", workingdir=WORKING_DIR log: "log/gridss/assemble/group.log" threads: 100 wrapper: "v3.0.1/bio/gridss/assemble" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``gridss=2.13.2`` Authors ------- * Christopher Schröder Code ---- .. code-block:: python """Snakemake wrapper for gridss assemble""" __author__ = "Christopher Schröder" __copyright__ = "Copyright 2020, Christopher Schröder" __email__ = "christopher.schroede@tu-dortmund.de" __license__ = "MIT" from snakemake.shell import shell from os import path # Creating log log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Placeholder for optional parameters extra = snakemake.params.get("extra", "") # Check inputs/arguments. reference = snakemake.input.get("reference") if not snakemake.params.workingdir: raise ValueError("Please set params.workingdir to provide a working directory.") if not snakemake.input.reference: raise ValueError("Please set input.reference to provide reference genome.") for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"): if not path.exists("{}{}".format(reference, ending)): raise ValueError( "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format( reference=reference, ending=ending ) ) dictionary = path.splitext(reference)[0] + ".dict" if not path.exists(dictionary): raise ValueError( "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( dictionary=dictionary ) ) shell( "(gridss -s assemble " # Tool "--reference {reference} " # Reference "--threads {snakemake.threads} " # Threads "--workingdir {snakemake.params.workingdir} " # Working directory "--assembly {snakemake.output.assembly} " # Assembly output "{snakemake.input.bams} " "{extra}) {log}" ) .. |nl| raw:: html