CNVKIT BATCH

https://img.shields.io/badge/wrapper_version-v9.2.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/cnvkit/batch?label=version%20update%20pull%20requests&color=1cb481

Run the CNVkit pipeline on one or more BAM files to call cnv alterations. Or use it to generate a new reference file

URL: https://cnvkit.readthedocs.io/en/stable/pipeline.html

Example

This wrapper can be used in the following way:

rule cnvkit_batch_build_reference:
    input:
        bam=["mapped/a.bam"],
        bai=["mapped/a.bam.bai"],
        fasta="ref/ref.fa",
        antitarget="ref/a.antitarget.bed",
        target="ref/a.target.bed",
    output:
        reference="cnvkit/reference.cnn",
    log:
        "cnvkit/a.reference.cnn.log",
    params:
        method='hybrid',  # optional
        extra="--target-avg-size 10",  # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v9.2.0/bio/cnvkit/batch"

rule cnvkit_batch:
    input:
        bam=["mapped/a.bam"],
        bai=["mapped/a.bam.bai"],
        reference="ref/reference.cnn",
    output:
        antitarget_coverage="cnvkit/a.antitargetcoverage.cnn",
        bins="cnvkit/a.bintest.cns",
        regions="cnvkit/a.cnr",
        segments="cnvkit/a.cns",
        segments_called="cnvkit/a.call.cns",
        target_coverage="cnvkit/a.targetcoverage.cnn",
    log:
        "cnvkit/a.antitargetcoverage.cnn.log",
    params:
        method='hybrid',  # optional
        extra="",  # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v9.2.0/bio/cnvkit/batch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • cnvkit=0.9.12

  • snakemake-wrapper-utils=0.8.0

Input/Output

Input:

  • bam: one or more bam files

  • reference: copy reference file, when calling samples

  • fasta: reference gneome, when building new reference

  • antitarget: bed antitarget file, when building new reference

  • target: bed target file, when building new reference

  • mappability: mappability file, when building new reference

Output:

  • a set of cnvkit generated files, multiple types of cns, cnr and cnn files

  • or a new reference file

Params

  • method: Optional, if no value is set cnvkits default value will be use

  • extra: Any additional arguments to pass

Authors

  • Patrik Smeds

Code

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"

from os import listdir
from os.path import join
from tempfile import TemporaryDirectory
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import move_files


log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

input_bam_files = f"{snakemake.input.bam}"

target = ""
antitarget = ""
ref_fasta = ""
reference_cnv = ""
mappability = ""

create_reference = snakemake.output.get("reference", False)
if create_reference:
    input_bam_files = f"-n {input_bam_files}"
    ref_fasta = f"-f {snakemake.input.fasta}"
    target = f"-t {snakemake.input.target}"
    antitarget = f"-a {snakemake.input.antitarget}"

    if "mappability" in snakemake.input:
        mappability = f"-g {snakemake.input.mappability}"

    if len(snakemake.output) > 1:
        exception_message = (
            "Only one output expected when creating a reference, multiple output "
            f"files defined as output: [{snakemake.output}]"
        )
        raise Exception(exception_message)
else:
    reference_cnv = f"-r {snakemake.input.reference}"

method = snakemake.params.get("method", "hybrid")
if method:
    method = f"-m {method}"

extra = snakemake.params.get("extra", "")

with TemporaryDirectory() as tmpdirname:
    if create_reference:
        output = f"--output-reference {join(tmpdirname, 'reference.cnn')}"
    else:
        output = f"-d {tmpdirname} "
    shell(
        "(cnvkit.py batch {input_bam_files} "
        "{reference_cnv} "
        "{method} "
        "{ref_fasta} "
        "{target} "
        "{antitarget} "
        "{output} "
        "{extra}) {log}"
    )

    # actual generated files
    temp_files = listdir(tmpdirname)

    # mapping of file suffixes to snakemake.output attributes
    file_map = {
        ".antitargetcoverage.cnn": "antitarget_coverage",
        ".bintest.cns": "bins",
        ".cnr": "regions",
        ".call.cns": "segments_called",
        ".targetcoverage.cnn": "target_coverage",
        ".cns": "segments",
        "reference.cnn": "reference",
    }

    mapping = {}

    # find matches btw generated files and snakemake output
    for suffix, attr in file_map.items():
        if not snakemake.output.get(attr):
            continue
        for file in temp_files:
            if file.endswith(suffix):
                # Skip ambiguous matches
                if attr == "segments" and any(
                    x in file for x in ["call.cns", "bintest.cns"]
                ):
                    continue
                mapping[attr] = join(tmpdirname, file)
                break  # stop after first match

    for file in move_files(snakemake, mapping):
        shell("{file} {log}")