PYTRF

https://img.shields.io/badge/wrapper_version-v9.9.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/pytrf?label=version%20update%20pull%20requests&color=1cb481

Tandem Repeat (TR) finding and extraction toolkit. Supports finding exact short TRs (findstr), generic TRs (findgtr), approximate TRs (findatr), and extracting sequences (extract).

URL: https://pytrf.readthedocs.io/en/latest/usage.html

Example

This wrapper can be used in the following way:

########################################################
# Snakefile for pytrf wrapper
########################################################


# SAMPLE RULE: Find exact short tandem repeats (STRs/SSRs)
rule pytrf_findstr:
    input:
        seq="demo_data/{sample}.fasta",
    output:
        "results/{sample}_findstr.csv",
    log:
        "logs/{sample}.log",
    params:
        subcommand="findstr",
        extra="-r 5 1 3 3 3 3",
    wrapper:
        "v9.9.0/bio/pytrf"


# SAMPLE RULE: Find exact STRs with default parameters
rule pytrf_findstr_defaults:
    input:
        seq="demo_data/small_test.fasta",
    output:
        "results/small_test_findstr_defaults.tsv",
    log:
        "logs/small_test_defaults.log",
    params:
        subcommand="findstr",
    wrapper:
        "v9.9.0/bio/pytrf"


# SAMPLE RULE: Find generic tandem repeats
rule pytrf_findgtr:
    input:
        seq="demo_data/{sample}.fasta",
    output:
        "results/{sample}_findgtr.tsv",
    log:
        "logs/{sample}.log",
    params:
        subcommand="findgtr",
        extra="-m 3 -r 1",  # min-motif=3, min-repeat=1
    wrapper:
        "v9.9.0/bio/pytrf"


# SAMPLE RULE: Find approximate tandem repeats
rule pytrf_findatr:
    input:
        seq="demo_data/{sample}.fasta",
    output:
        "results/{sample}_findatr.tsv",
    log:
        "logs/{sample}.log",
    params:
        subcommand="findatr",
        extra="-m 3 -M 10",  # min-motif-size=3, max-motif-size=10
    wrapper:
        "v9.9.0/bio/pytrf"


# SAMPLE RULE: Extract TR sequences (NOT WORKING - see meta.yaml notes)
rule pytrf_extract:
    input:
        seq="demo_data/small_test_extract.fasta",  # sequence fasta
        repeat="demo_data/small_test_extract.tsv",  # repeat file from findstr/findgtr/findatr
    output:
        "results/small_test_extract.tsv",
    log:
        "logs/small_test_extract.log",
    params:
        subcommand="extract",
        extra="-l 150",  # flank-length=150
    wrapper:
        "v9.9.0/bio/pytrf"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Subcommands:
- findstr: Find exact short tandem repeats (STRs/microsatellites)
- findgtr: Find exact generic tandem repeats
- findatr: Find approximate/imperfect tandem repeats
- extract: NOT WORKING - Extract TR sequences and flanking sequence (bug in PyTRF 1.4.2, see known issues)

All commands require named input ‘seq’ (fasta/fastq).
Extract also requires ‘repeat’ (tsv/csv from findstr/findgtr/findatr).

Bioconda package: https://bioconda.github.io/recipes/pytrf/README.html
GitHub repository: https://github.com/lmdu/pytrf
License: MIT License
Disclaimer: This is a minimal implementation supporting basic functionality. pytrf is not a Python binding to TRF - it’s an independent tool.

Testing:
This wrapper skips extract test until upstream patch is released.

Known issues:
- PyTRF 1.4.2 has a bug in findstr bed output format (https://github.com/lmdu/pytrf/issues/7)
- PyTRF 1.4.2 has a bug in extract command (https://github.com/lmdu/pytrf/issues/6)

Software dependencies

  • pytrf=1.5.0

  • snakemake-wrapper-utils=0.8.0

Input/Output

Input:

  • seq: FASTA or FASTQ file (supports gzip compression)

  • repeat: For extract only - TSV/CSV file from findstr/findgtr/findatr

Output:

  • Output file (required). Format auto-detected from file extension.

Params

  • subcommand: PyTRF subcommand to run [findstr, findgtr, findatr, extract].

  • extra: Additional command-line arguments passed to pytrf.

Authors

  • Muhammad Rohan Ali Asmat

Code

"""
Snakemake Wrapper for PyTRF
------------------------------------------------------
Tandem repeat finding and extraction toolkit.
Supports: findstr, findgtr, findatr, extract subcommands.
"""

from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_format, is_arg

# Configuration variables
VALID_SUBCOMMANDS = {"findstr", "findgtr", "findatr", "extract"}
FORMAT_SUPPORT = {
    "findstr": {"tsv", "csv", "bed", "gff"},
    "findgtr": {"tsv", "csv", "gff"},
    "findatr": {"tsv", "csv", "gff"},
    "extract": {"tsv", "csv", "fasta"},
}

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

# Get subcommand type and validate
if snakemake.params.subcommand not in VALID_SUBCOMMANDS:
    raise ValueError(
        f"Invalid subcommand '{snakemake.params.subcommand}'. "
        f"Valid options: {', '.join(sorted(VALID_SUBCOMMANDS))}"
    )

# Get repeat file (extract only)
repeat_file = ""
if snakemake.params.subcommand == "extract":
    repeat_file = f"-r {snakemake.input.repeat}"
    if is_arg("-r", extra) or is_arg("--repeat-file", extra):
        raise ValueError(
            "Repeat file is provided as input.repeat. "
            "Do not specify -r/--repeat-file in params.extra"
        )

# Infer and validate output format
out_format = get_format(snakemake.output[0])
if out_format not in FORMAT_SUPPORT[snakemake.params.subcommand]:
    raise ValueError(
        f"Unsupported format '{out_format}' for pytrf {snakemake.params.subcommand}. "
        f"Supported formats: {', '.join(sorted(supported_formats))}"
    )

# Validate: block format and output flags
if (
    is_arg("-f", extra)
    or is_arg("--out-format", extra)
    or is_arg("-o", extra)
    or is_arg("--out-file", extra)
):
    raise ValueError(
        "Output format is inferred and output path is provided through Snakemake. "
        "Do not specify -f/--out-format or -o/--out-file in params.extra"
    )

# Execute
shell(
    "pytrf {snakemake.params.subcommand}"
    " {snakemake.input.seq}"
    " {repeat_file}"
    " {extra}"
    " -f {out_format}"
    " -o {snakemake.output[0]}"
    " {log}"
)