FREEBAYES¶
Call small genomic variants with freebayes.
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
ref="genome.fasta",
# you can have a list of samples here
samples="mapped/{sample}.bam",
# the matching BAI indexes have to present for freebayes
indexes="mapped/{sample}.bam.bai"
# optional BED file specifying chromosomal regions on which freebayes
# should run, e.g. all regions that show coverage
#regions="/path/to/region-file.bed"
output:
"calls/{sample}.vcf" # either .vcf or .bcf
log:
"logs/freebayes/{sample}.log"
params:
extra="", # optional parameters
chunksize=100000, # reference genome chunk size for parallelization (default: 100000)
normalize=False, # flag to use bcftools norm to normalize indels
threads: 2
wrapper:
"0.73.0/bio/freebayes"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
freebayes=1.3.1
bcftools=1.11
parallel=20190522
bedtools>=2.29
sed=4.7
Authors¶
- Johannes Köster
- Felix Mölder
Code¶
__author__ = "Johannes Köster, Felix Mölder, Christopher Schröder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = snakemake.params.get("extra", "")
norm = snakemake.params.get("normalize", False)
assert norm in [True, False]
pipe = ""
if snakemake.output[0].endswith(".bcf"):
if norm:
pipe = "| bcftools norm -Ob -"
else:
pipe = "| bcftools view -Ob -"
elif norm:
pipe = "| bcftools norm -"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
regions = (
"<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
snakemake=snakemake, chunksize=chunksize
)
)
if snakemake.input.get("regions", ""):
regions = (
"<(bedtools intersect -a "
r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
"{regions}) -b {snakemake.input.regions} | "
r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
).format(regions=regions, snakemake=snakemake)
freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
snakemake=snakemake, regions=regions
)
shell(
"({freebayes} {params} -f {snakemake.input.ref}"
" {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}"
)