CNVKIT BATCH
Run the CNVkit pipeline on one or more BAM files to call cnv alterations. Or use it to generate a new reference file
URL: https://cnvkit.readthedocs.io/en/stable/pipeline.html
Example
This wrapper can be used in the following way:
rule cnvkit_batch_build_reference:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
fasta="ref/ref.fa",
antitarget="ref/a.antitarget.bed",
target="ref/a.target.bed",
output:
reference="cnvkit/reference.cnn",
log:
"cnvkit/a.reference.cnn.log",
params:
method='hybrid', # optional
extra="--target-avg-size 10", # optional
resources:
mem_mb=1024,
wrapper:
"v9.2.0/bio/cnvkit/batch"
rule cnvkit_batch:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
reference="ref/reference.cnn",
output:
antitarget_coverage="cnvkit/a.antitargetcoverage.cnn",
bins="cnvkit/a.bintest.cns",
regions="cnvkit/a.cnr",
segments="cnvkit/a.cns",
segments_called="cnvkit/a.call.cns",
target_coverage="cnvkit/a.targetcoverage.cnn",
log:
"cnvkit/a.antitargetcoverage.cnn.log",
params:
method='hybrid', # optional
extra="", # optional
resources:
mem_mb=1024,
wrapper:
"v9.2.0/bio/cnvkit/batch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
cnvkit=0.9.12snakemake-wrapper-utils=0.8.0
Input/Output
Input:
bam: one or more bam filesreference: copy reference file, when calling samplesfasta: reference gneome, when building new referenceantitarget: bed antitarget file, when building new referencetarget: bed target file, when building new referencemappability: mappability file, when building new reference
Output:
a set of cnvkit generated files, multiple types of cns, cnr and cnn files
or a new reference file
Params
method: Optional, if no value is set cnvkits default value will be useextra: Any additional arguments to pass
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
from os import listdir
from os.path import join
from tempfile import TemporaryDirectory
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import move_files
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
input_bam_files = f"{snakemake.input.bam}"
target = ""
antitarget = ""
ref_fasta = ""
reference_cnv = ""
mappability = ""
create_reference = snakemake.output.get("reference", False)
if create_reference:
input_bam_files = f"-n {input_bam_files}"
ref_fasta = f"-f {snakemake.input.fasta}"
target = f"-t {snakemake.input.target}"
antitarget = f"-a {snakemake.input.antitarget}"
if "mappability" in snakemake.input:
mappability = f"-g {snakemake.input.mappability}"
if len(snakemake.output) > 1:
exception_message = (
"Only one output expected when creating a reference, multiple output "
f"files defined as output: [{snakemake.output}]"
)
raise Exception(exception_message)
else:
reference_cnv = f"-r {snakemake.input.reference}"
method = snakemake.params.get("method", "hybrid")
if method:
method = f"-m {method}"
extra = snakemake.params.get("extra", "")
with TemporaryDirectory() as tmpdirname:
if create_reference:
output = f"--output-reference {join(tmpdirname, 'reference.cnn')}"
else:
output = f"-d {tmpdirname} "
shell(
"(cnvkit.py batch {input_bam_files} "
"{reference_cnv} "
"{method} "
"{ref_fasta} "
"{target} "
"{antitarget} "
"{output} "
"{extra}) {log}"
)
# actual generated files
temp_files = listdir(tmpdirname)
# mapping of file suffixes to snakemake.output attributes
file_map = {
".antitargetcoverage.cnn": "antitarget_coverage",
".bintest.cns": "bins",
".cnr": "regions",
".call.cns": "segments_called",
".targetcoverage.cnn": "target_coverage",
".cns": "segments",
"reference.cnn": "reference",
}
mapping = {}
# find matches btw generated files and snakemake output
for suffix, attr in file_map.items():
if not snakemake.output.get(attr):
continue
for file in temp_files:
if file.endswith(suffix):
# Skip ambiguous matches
if attr == "segments" and any(
x in file for x in ["call.cns", "bintest.cns"]
):
continue
mapping[attr] = join(tmpdirname, file)
break # stop after first match
for file in move_files(snakemake, mapping):
shell("{file} {log}")