FGBIO FILTERCONSENSUSREADS#
Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
Example#
This wrapper can be used in the following way:
rule FilterConsensusReads:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.filtered.bam"
params:
extra="",
min_base_quality=2,
min_reads=[2, 2, 2],
ref="genome.fasta"
log:
"logs/fgbio/filterconsensusreads/{sample}.log"
threads: 1
wrapper:
"v3.0.2/bio/fgbio/filterconsensusreads"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes#
min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
For more information see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Software dependencies#
fgbio=2.1.0
Input/Output#
Input:
bam file
vcf files
reference genome
Output:
filtered bam file
Code#
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
raise ValueError("min_base_quality needs to be provided as an Int!")
min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
raise ValueError(
"min_reads needs to be provided as list of Ints, min length 1, max length 3!"
)
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference needs to be provided!")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
bam_output = snakemake.output[0]
if not isinstance(bam_output, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(bam_output) + "!")
shell(
"fgbio FilterConsensusReads"
" -i {bam_input}"
" -o {bam_output}"
" -r {ref}"
" --min-reads {min_reads}"
" --min-base-quality {min_base_quality}"
" {extra_params}"
" {log}"
)