FREEBAYES¶
Call small genomic variants with freebayes.
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
ref="genome.fasta",
# you can have a list of samples here
samples="mapped/{sample}.bam",
# the matching BAI indexes have to present for freebayes
indexes="mapped/{sample}.bam.bai",
# optional BED file specifying chromosomal regions on which freebayes
# should run, e.g. all regions that show coverage
#regions="path/to/region-file.bed"
output:
"calls/{sample}.vcf", # either .vcf or .bcf
log:
"logs/freebayes/{sample}.log",
params:
extra="", # optional parameters
chunksize=100000, # reference genome chunk size for parallelization (default: 100000)
normalize=False, # optional flag to use bcftools norm to normalize indels (Valid params are -a, -f, -m, -D or -d)
threads: 2
wrapper:
"v1.25.0-12-ge2215f2e/bio/freebayes"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments for freebayes.
- The `uncompressed_bcf`param allows for uncompressed BCF output.
- The optional normalize param allows to use bcftools norm to normalize indels. When set one of the following params must be passed: -a, -f, -m, -D or -d
- The chunkzise param allows setting reference genome chunk size for parallelization (default: 100000)
- For more inforamtion see, https://github.com/freebayes/freebayes
Software dependencies¶
freebayes=1.3.6
bcftools=1.16
parallel=20230122
bedtools=2.30.0
sed=4.8
Authors¶
- Johannes Köster
- Felix Mölder
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster, Felix Mölder, Christopher Schröder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
from tempfile import TemporaryDirectory
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = snakemake.params.get("extra", "")
norm = snakemake.params.get("normalize", False)
# Infer output format
uncompressed_bcf = snakemake.params.get("uncompressed_bcf", False)
out_name, out_ext = path.splitext(snakemake.output[0])
if out_ext == ".vcf":
out_format = "v"
elif out_ext == ".bcf":
if uncompressed_bcf:
out_format = "u"
else:
out_format = "b"
elif out_ext == ".gz":
out_name, out_ext = path.splitext(out_name)
if out_ext == ".vcf":
out_format = "z"
else:
raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")
else:
raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")
pipe = ""
if norm:
pipe = f"| bcftools norm {norm} --output-type {out_format} -"
else:
pipe = f"| bcftools view --output-type {out_format} -"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
regions = (
"<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
snakemake=snakemake, chunksize=chunksize
)
)
if snakemake.input.get("regions", ""):
regions = (
"<(bedtools intersect -a "
r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
"{regions}) -b {snakemake.input.regions} | "
r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
).format(regions=regions, snakemake=snakemake)
freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
snakemake=snakemake, regions=regions
)
with TemporaryDirectory() as tempdir:
shell(
"({freebayes} {params} -f {snakemake.input.ref}"
" {snakemake.input.samples} | bcftools sort -T {tempdir} - {pipe} > {snakemake.output[0]}) {log}"
)