PYEFFGENOMESIZE

https://img.shields.io/badge/wrapper_version-v7.6.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/tmb/pyeffgenomesize?label=version%20update%20pull%20requests&color=1cb481

Calculate the effective genome size from a BAM file and genome annotations.

URL: https://github.com/bioinfo-pf-curie/TMB/tree/master?tab=readme-ov-file#pyeffgenomesizepy

Example

This wrapper can be used in the following way:

rule test_pyeffgenomesize_minimal:
    input:
        bed="small.bed",
        gtf="small.gtf",
    output:
        txt="minimal.txt",
    threads: 1
    log:
        "test_pyeffgenomesize_minimal.log",
    params:
        extra="--filterNonCoding --verbose",
    wrapper:
        "v7.6.0/bio/tmb/pyeffgenomesize"


rule test_pyeffgenomesize_complete:
    input:
        bed="small.bed",
        gtf="small.gtf",
        # Optional input:
        bam="small.sorted.bam",
    output:
        txt="complete.txt",
        # Optional output
        intersect="complete.bed",
        # Optional output that required BAM input:
        regions="complete.regions.bed.gz",
        thresholds="complete.thresholds.bed",
    threads: 1
    log:
        "test_pyeffgenomesize_complete.log",
    params:
        extra="--filterNonCoding --verbose",
    wrapper:
        "v7.6.0/bio/tmb/pyeffgenomesize"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

This tool handles Gencode formatted GFF/GTF. To process any other source (Ensembl, Refseq, …) make sure the key transcript_type exists in the GTF and is never empty.

Software dependencies

  • tmb=1.5.0

  • snakemake-wrapper-utils=0.8.0

Input/Output

Input:

  • gtf: Path to a GTF/GFF3 formatted genome annotation

  • bed: Path to BED formatted genome intervals to operate on

  • bam: Optional path to a BAM file

Output:

  • txt: Path to effective genome size result

  • thresholds: Optional path to mosdepth intermediate intervals

  • intersect: Optional path to bedtools intermediate intervals

  • regions: Optional path mosdepth intermadiate coverage results

Params

  • extra: Optional parameters, except IO parameters, threading, and –mosdepth.

Authors

  • Thibault Dayris

Code

# coding: utf-8

"""This snakemake wrappers runs pyEffGenomeSize.py"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2025, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake_wrapper_utils.snakemake import move_files, is_arg
from snakemake.shell import shell
from tempfile import TemporaryDirectory

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# Optional IO files
bam = snakemake.input.get("bam")
if bam:
    extra += f" --bam '{bam}' --mosdepth"


# pyEffGenomeSize does not erase temporary files
# using a temporary directory to handle them:
with TemporaryDirectory() as tempdir:
    optional_output = {}
    thresholds = snakemake.output.get("thresholds")
    if thresholds:
        optional_output["thresholds"] = f"{tempdir}/snake_result.thresholds.bed"
        extra += " --saveIntermediates"

    regions = snakemake.output.get("regions")
    if regions:
        if not is_arg("--saveIntermediates", extra):
            extra += " --saveIntermediates"
        optional_output["regions"] = f"{tempdir}/snake_result.regions.bed.gz"

    intersect = snakemake.output.get("intersect")
    if intersect:
        optional_output["intersect"] = f"{tempdir}/snake_result.intersect.bed"

    shell(
        "pyEffGenomeSize.py"
        " --thread {snakemake.threads}"
        " --bed {snakemake.input.bed:q}"
        " --gtf {snakemake.input.gtf:q}"
        " {extra}"
        " --oprefix {tempdir}/snake_result"
        " > {snakemake.output.txt:q}"
        " {log}"
    )
    log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
    for move_cmd in move_files(snakemake, optional_output):
        shell("{move_cmd} {log}")