CNVKIT BATCH
Run the CNVkit pipeline on one or more BAM files to call cnv alterations. Or use it to generate a new reference file
URL: https://cnvkit.readthedocs.io/en/stable/pipeline.html
Example
This wrapper can be used in the following way:
rule cnvkit_batch_build_reference:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
fasta="ref/ref.fa",
antitarget="ref/a.antitarget.bed",
target="ref/a.target.bed",
output:
reference="cnvkit/reference.cnn",
log:
"cnvkit/a.reference.cnn.log",
params:
method='hybrid', # optional
extra="--target-avg-size 10", # optional
resources:
mem_mb=1024,
wrapper:
"v3.9.0-14-g476823b/bio/cnvkit/batch"
rule cnvkit_batch:
input:
bam=["mapped/a.bam"],
bai=["mapped/a.bam.bai"],
reference="ref/reference.cnn",
output:
antitarget_coverage="cnvkit/a.antitargetcoverage.cnn",
bins="cnvkit/a.bintest.cns",
regions="cnvkit/a.cnr",
segments="cnvkit/a.cns",
segments_called="cnvkit/a.call.cns",
target_coverage="cnvkit/a.targetcoverage.cnn",
log:
"cnvkit/a.antitargetcoverage.cnn.log",
params:
method='hybrid', # optional
extra="", # optional
resources:
mem_mb=1024,
wrapper:
"v3.9.0-14-g476823b/bio/cnvkit/batch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
cnvkit=0.9.11
Input/Output
Input:
bam
: one or more bam filesreference
: copy reference file, when calling samplesfasta
: reference gneome, when building new referenceantitarget
: bed antitarget file, when building new referencetarget
: bed target file, when building new referencemappability
: mappability file, when building new reference
Output:
a set of cnvkit generated files, multiple types of cns, cnr and cnn files
or a new reference file
Params
method
: Optional, if no value is set cnvkits default value will be useextra
: Any additional arguments to pass
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
import logging
from os import listdir
from os.path import basename
from os.path import dirname
from os.path import join
from tempfile import TemporaryDirectory
import shutil
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
input_bam_files = f"{snakemake.input.bam}"
target = ""
antitarget = ""
ref_fasta = ""
reference_cnv = ""
mappability = ""
create_reference = snakemake.output.get("reference", False)
if create_reference:
input_bam_files = f"-n {input_bam_files}"
ref_fasta = f"-f {snakemake.input.fasta}"
target = f"-t {snakemake.input.target}"
antitarget = f"-a {snakemake.input.antitarget}"
if "mappability" in snakemake.input:
mappability = f"-g {snakemake.input.mappability}"
if len(snakemake.output) > 1:
exception_message = (
"Only one output expected when creating a reference, multiple output "
f"files defined as output: [{snakemake.output}]"
)
raise Exception(exception_message)
else:
reference_cnv = f"-r {snakemake.input.reference}"
method = snakemake.params.get("method", "hybrid")
if method:
method = f"-m {method}"
extra = snakemake.params.get("extra", "")
with TemporaryDirectory() as tmpdirname:
if create_reference:
output = f"--output-reference {join(tmpdirname, 'reference.cnn')}"
output_list = [snakemake.output.reference]
else:
output_list = [value for key, value in snakemake.output.items()]
output = f"-d {tmpdirname} "
shell(
"(cnvkit.py batch {input_bam_files} "
"{reference_cnv} "
"{method} "
"{ref_fasta} "
"{target} "
"{antitarget} "
"{output} "
"{extra}) {log}"
)
for output_file in output_list:
filename = basename(output_file)
parent = dirname(output_file)
try:
shutil.copy2(join(tmpdirname, filename), output_file)
except FileNotFoundError as e:
temp_files = listdir(tmpdirname)
logging.error(
f"Couldn't locate file {basename} possible files are {[basename(f) for f in temp_files]}"
)
raise e