BCFTOOLS NORM
Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows.
URL: http://www.htslib.org/doc/bcftools.html#norm
Example
This wrapper can be used in the following way:
rule norm_vcf:
input:
"{prefix}.bcf",
#ref="genome.fasta" # optional reference (will be translated into the -f option)
output:
"{prefix}.norm.vcf", # can also be .bcf, corresponding --output-type parameter is inferred automatically
log:
"{prefix}.norm.log",
params:
extra="--rm-dup none", # optional
#uncompressed_bcf=False,
wrapper:
"v5.3.0-16-g710597c/bio/bcftools/norm"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
The extra param allows for additional program arguments (not –threads, -f/–fasta-ref, -o/–output, or -O/–output-type).
Software dependencies
bcftools=1.21
snakemake-wrapper-utils=0.6.2
Code
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bcftools norm {bcftools_opts} {extra} {snakemake.input[0]} {log}")