PYEFFGENOMESIZE
Calculate the effective genome size from a BAM file and genome annotations.
URL: https://github.com/bioinfo-pf-curie/TMB/tree/master?tab=readme-ov-file#pyeffgenomesizepy
Example
This wrapper can be used in the following way:
rule test_pyeffgenomesize_minimal:
input:
bed="small.bed",
gtf="small.gtf",
output:
txt="minimal.txt",
threads: 1
log:
"test_pyeffgenomesize_minimal.log",
params:
extra="--filterNonCoding --verbose",
wrapper:
"v7.6.0/bio/tmb/pyeffgenomesize"
rule test_pyeffgenomesize_complete:
input:
bed="small.bed",
gtf="small.gtf",
# Optional input:
bam="small.sorted.bam",
output:
txt="complete.txt",
# Optional output
intersect="complete.bed",
# Optional output that required BAM input:
regions="complete.regions.bed.gz",
thresholds="complete.thresholds.bed",
threads: 1
log:
"test_pyeffgenomesize_complete.log",
params:
extra="--filterNonCoding --verbose",
wrapper:
"v7.6.0/bio/tmb/pyeffgenomesize"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
This tool handles Gencode formatted GFF/GTF. To process any other source (Ensembl, Refseq, …) make sure the key transcript_type exists in the GTF and is never empty.
Software dependencies
tmb=1.5.0snakemake-wrapper-utils=0.8.0
Input/Output
Input:
gtf: Path to a GTF/GFF3 formatted genome annotationbed: Path to BED formatted genome intervals to operate onbam: Optional path to a BAM file
Output:
txt: Path to effective genome size resultthresholds: Optional path to mosdepth intermediate intervalsintersect: Optional path to bedtools intermediate intervalsregions: Optional path mosdepth intermadiate coverage results
Params
extra: Optional parameters, except IO parameters, threading, and –mosdepth.
Code
# coding: utf-8
"""This snakemake wrappers runs pyEffGenomeSize.py"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2025, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake_wrapper_utils.snakemake import move_files, is_arg
from snakemake.shell import shell
from tempfile import TemporaryDirectory
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
# Optional IO files
bam = snakemake.input.get("bam")
if bam:
extra += f" --bam '{bam}' --mosdepth"
# pyEffGenomeSize does not erase temporary files
# using a temporary directory to handle them:
with TemporaryDirectory() as tempdir:
optional_output = {}
thresholds = snakemake.output.get("thresholds")
if thresholds:
optional_output["thresholds"] = f"{tempdir}/snake_result.thresholds.bed"
extra += " --saveIntermediates"
regions = snakemake.output.get("regions")
if regions:
if not is_arg("--saveIntermediates", extra):
extra += " --saveIntermediates"
optional_output["regions"] = f"{tempdir}/snake_result.regions.bed.gz"
intersect = snakemake.output.get("intersect")
if intersect:
optional_output["intersect"] = f"{tempdir}/snake_result.intersect.bed"
shell(
"pyEffGenomeSize.py"
" --thread {snakemake.threads}"
" --bed {snakemake.input.bed:q}"
" --gtf {snakemake.input.gtf:q}"
" {extra}"
" --oprefix {tempdir}/snake_result"
" > {snakemake.output.txt:q}"
" {log}"
)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for move_cmd in move_files(snakemake, optional_output):
shell("{move_cmd} {log}")