.. _`bio/gatk/splitintervals`: GATK SPLITINTERVALS =================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gatk/splitintervals?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gatk/splitintervals This tool takes in intervals via the standard arguments of IntervalArgumentCollection and splits them into interval files for scattering. The resulting files contain equal number of bases. Standard GATK engine arguments include -L and -XL, interval padding, and interval set rule etc. For example, for the -L argument, the tool accepts GATK-style intervals (.list or .intervals), BED files and VCF files. See --subdivision-mode parameter for more options. **URL**: https://gatk.broadinstitute.org/hc/en-us/articles/9570513631387-SplitIntervals Example ------- This wrapper can be used in the following way: .. code-block:: python rule gatk_split_interval_list: input: intervals="genome.interval_list", ref="genome.fasta", output: bed=multiext("out/genome", ".00.bed", ".01.bed", ".02.bed"), log: "logs/genome.log", params: extra="--subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW", java_opts="", # optional resources: mem_mb=1024, wrapper: "v3.0.1/bio/gatk/splitintervals" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * The `java_opts` param allows for additional arguments to be passed to the java compiler, e.g. "-XX:ParallelGCThreads=10" (not for `-XmX` or `-Djava.io.tmpdir`, since they are handled automatically). * The `extra` param allows for additional program arguments, but not `--scatter-count`, `--output`, `--interval-file-prefix`, `--interval-file-num-digits`, or `--extension` (automatically inferred from output files). Software dependencies --------------------- * ``gatk4=4.4.0.0`` * ``snakemake-wrapper-utils=0.6.2`` Input/Output ------------ **Input:** * Intervals/BED file **Output:** * Several Intervals/BED files Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2022, Filipe G. Vieira" __license__ = "MIT" import os import tempfile from pathlib import Path from snakemake.shell import shell from snakemake_wrapper_utils.java import get_java_opts extra = snakemake.params.get("extra", "") java_opts = get_java_opts(snakemake) log = snakemake.log_fmt_shell(stdout=True, stderr=True) n_out_files = len(snakemake.output) assert n_out_files > 1, "you need to specify more than 2 output files!" prefix = Path(os.path.commonprefix(snakemake.output)) suffix = os.path.commonprefix([file[::-1] for file in snakemake.output])[::-1] chunk_labels = [ out.removeprefix(str(prefix)).removesuffix(suffix) for out in snakemake.output ] assert all( [chunk_label.isnumeric() for chunk_label in chunk_labels] ), "all chunk labels have to be numeric!" len_chunk_labels = set([len(chunk_label) for chunk_label in chunk_labels]) assert len(len_chunk_labels) == 1, "all chunk labels must have the same length!" with tempfile.TemporaryDirectory() as tmpdir: shell( "gatk --java-options '{java_opts}' SplitIntervals" " --intervals {snakemake.input.intervals}" " --reference {snakemake.input.ref}" " --scatter-count {n_out_files}" " {extra}" " --tmp-dir {tmpdir}" " --output {prefix.parent}" " --interval-file-prefix {prefix.name:q}" " --interval-file-num-digits {len_chunk_labels}" " --extension {suffix:q}" " {log}" ) .. |nl| raw:: html