.. _`bio/gridss/preprocess`:

GRIDSS PREPROCESS
=================


.. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gridss/preprocess?label=version%20update%20pull%20requests
   :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gridss/preprocess

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. ``preprocess`` pre-processes input BAM files (can be run per file).


**URL**: https://github.com/PapenfussLab/gridss

Example
-------

This wrapper can be used in the following way:

.. code-block:: python

    WORKING_DIR="working_dir"

    rule gridss_preprocess:
        input:
            bam="mapped/{sample}.bam",
            bai="mapped/{sample}.bam.bai",
            reference="reference/genome.fasta",
            dictionary="reference/genome.dict",
            refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
        output:
            multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".computesamtags.changes.tsv", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.csi", ".tag_metrics")
        params:
            extra="--jvmheap 1g",
            workingdir=WORKING_DIR
        log:
            "log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
        threads:
            8
        wrapper:
            "v3.0.1/bio/gridss/preprocess"

Note that input, output and log file paths can be chosen freely.

When running with

.. code-block:: bash

    snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.


Software dependencies
---------------------

* ``gridss=2.13.2``


Authors
-------

* Christopher Schröder


Code
----

.. code-block:: python

    """Snakemake wrapper for gridss preprocess"""

    __author__ = "Christopher Schröder"
    __copyright__ = "Copyright 2020, Christopher Schröder"
    __email__ = "christopher.schroede@tu-dortmund.de"
    __license__ = "MIT"

    from snakemake.shell import shell
    from os import path

    # Creating log
    log = snakemake.log_fmt_shell(stdout=True, stderr=True)

    # Placeholder for optional parameters
    extra = snakemake.params.get("extra", "")

    # Check inputs/arguments.
    reference = snakemake.input.get("reference")
    dictionary = snakemake.input.get("dictionary")
    if not snakemake.params.workingdir:
        raise ValueError("Please set params.workingdir to provide a working directory.")

    if not snakemake.input.reference:
        raise ValueError("Please set input.reference to provide reference genome.")

    for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
        if not path.exists("{}{}".format(reference, ending)):
            raise ValueError(
                "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                    reference=reference, ending=ending
                )
            )

    dictionary = path.splitext(reference)[0] + ".dict"
    if not path.exists(dictionary):
        raise ValueError(
            "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
                dictionary=dictionary
            )
        )

    shell(
        "(gridss -s preprocess "  # Tool
        "--reference {reference} "  # Reference
        "--threads {snakemake.threads} "
        "--workingdir {snakemake.params.workingdir} "
        "{snakemake.input.bam} "
        "{extra}) {log}"
    )


.. |nl| raw:: html

   <br>