CNVKIT BATCH

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/cnvkit/batch?label=version%20update%20pull%20requests

Run the CNVkit pipeline on one or more BAM files to call cnv alterations. Or use it to generate a new reference file

URL: https://cnvkit.readthedocs.io/en/stable/pipeline.html

Example

This wrapper can be used in the following way:

rule cnvkit_batch_build_reference:
    input:
        bam=["mapped/a.bam"],
        bai=["mapped/a.bam.bai"],
        fasta="ref/ref.fa",
        antitarget="ref/a.antitarget.bed",
        target="ref/a.target.bed",
    output:
        reference="cnvkit/reference.cnn",
    log:
        "cnvkit/a.reference.cnn.log",
    params:
        method='hybrid',  # optional
        extra="--target-avg-size 10",  # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v3.8.0-1-g149ef14/bio/cnvkit/batch"

rule cnvkit_batch:
    input:
        bam=["mapped/a.bam"],
        bai=["mapped/a.bam.bai"],
        reference="ref/reference.cnn",
    output:
        antitarget_coverage="cnvkit/a.antitargetcoverage.cnn",
        bins="cnvkit/a.bintest.cns",
        regions="cnvkit/a.cnr",
        segments="cnvkit/a.cns",
        segments_called="cnvkit/a.call.cns",
        target_coverage="cnvkit/a.targetcoverage.cnn",
    log:
        "cnvkit/a.antitargetcoverage.cnn.log",
    params:
        method='hybrid',  # optional
        extra="",  # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v3.8.0-1-g149ef14/bio/cnvkit/batch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • cnvkit=0.9.10

Input/Output

Input:

  • bam: one or more bam files

  • reference: copy reference file, when calling samples

  • fasta: reference gneome, when building new reference

  • antitarget: bed antitarget file, when building new reference

  • target: bed target file, when building new reference

  • mappability: mappability file, when building new reference

Output:

  • a set of cnvkit generated files, multiple types of cns, cnr and cnn files

  • or a new reference file

Params

  • method: Optional, if no value is set cnvkits default value will be use

  • extra: Any additional arguments to pass

Authors

  • Patrik Smeds

Code

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"

import logging
from os import listdir
from os.path import basename
from os.path import dirname
from os.path import join
from tempfile import TemporaryDirectory
import shutil
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

input_bam_files = f"{snakemake.input.bam}"

target = ""
antitarget = ""
ref_fasta = ""
reference_cnv = ""
mappability = ""

create_reference = snakemake.output.get("reference", False)
if create_reference:
    input_bam_files = f"-n {input_bam_files}"
    ref_fasta = f"-f {snakemake.input.fasta}"
    target = f"-t {snakemake.input.target}"
    antitarget = f"-a {snakemake.input.antitarget}"

    if "mappability" in snakemake.input:
        mappability = f"-g {snakemake.input.mappability}"

    if len(snakemake.output) > 1:
        exception_message = (
            "Only one output expected when creating a reference, multiple output "
            f"files defined as output: [{snakemake.output}]"
        )
        raise Exception(exception_message)
else:
    reference_cnv = f"-r {snakemake.input.reference}"

method = snakemake.params.get("method", "hybrid")
if method:
    method = f"-m {method}"

extra = snakemake.params.get("extra", "")

with TemporaryDirectory() as tmpdirname:
    if create_reference:
        output = f"--output-reference {join(tmpdirname, 'reference.cnn')}"
        output_list = [snakemake.output.reference]
    else:
        output_list = [value for key, value in snakemake.output.items()]
        output = f"-d {tmpdirname} "
    shell(
        "(cnvkit.py batch {input_bam_files} "
        "{reference_cnv} "
        "{method} "
        "{ref_fasta} "
        "{target} "
        "{antitarget} "
        "{output} "
        "{extra}) {log}"
    )

    for output_file in output_list:
        filename = basename(output_file)
        parent = dirname(output_file)

        try:
            shutil.copy2(join(tmpdirname, filename), output_file)
        except FileNotFoundError as e:
            temp_files = listdir(tmpdirname)
            logging.error(
                f"Couldn't locate file {basename} possible files are {[basename(f) for f in temp_files]}"
            )
            raise e