PYTRF
Tandem Repeat (TR) finding and extraction toolkit. Supports finding exact short TRs (findstr), generic TRs (findgtr), approximate TRs (findatr), and extracting sequences (extract).
URL: https://pytrf.readthedocs.io/en/latest/usage.html
Example
This wrapper can be used in the following way:
########################################################
# Snakefile for pytrf wrapper
########################################################
# SAMPLE RULE: Find exact short tandem repeats (STRs/SSRs)
rule pytrf_findstr:
input:
seq="demo_data/{sample}.fasta",
output:
"results/{sample}_findstr.csv",
log:
"logs/{sample}.log",
params:
subcommand="findstr",
extra="-r 5 1 3 3 3 3",
wrapper:
"v9.9.0/bio/pytrf"
# SAMPLE RULE: Find exact STRs with default parameters
rule pytrf_findstr_defaults:
input:
seq="demo_data/small_test.fasta",
output:
"results/small_test_findstr_defaults.tsv",
log:
"logs/small_test_defaults.log",
params:
subcommand="findstr",
wrapper:
"v9.9.0/bio/pytrf"
# SAMPLE RULE: Find generic tandem repeats
rule pytrf_findgtr:
input:
seq="demo_data/{sample}.fasta",
output:
"results/{sample}_findgtr.tsv",
log:
"logs/{sample}.log",
params:
subcommand="findgtr",
extra="-m 3 -r 1", # min-motif=3, min-repeat=1
wrapper:
"v9.9.0/bio/pytrf"
# SAMPLE RULE: Find approximate tandem repeats
rule pytrf_findatr:
input:
seq="demo_data/{sample}.fasta",
output:
"results/{sample}_findatr.tsv",
log:
"logs/{sample}.log",
params:
subcommand="findatr",
extra="-m 3 -M 10", # min-motif-size=3, max-motif-size=10
wrapper:
"v9.9.0/bio/pytrf"
# SAMPLE RULE: Extract TR sequences (NOT WORKING - see meta.yaml notes)
rule pytrf_extract:
input:
seq="demo_data/small_test_extract.fasta", # sequence fasta
repeat="demo_data/small_test_extract.tsv", # repeat file from findstr/findgtr/findatr
output:
"results/small_test_extract.tsv",
log:
"logs/small_test_extract.log",
params:
subcommand="extract",
extra="-l 150", # flank-length=150
wrapper:
"v9.9.0/bio/pytrf"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Subcommands:
- findstr: Find exact short tandem repeats (STRs/microsatellites)
- findgtr: Find exact generic tandem repeats
- findatr: Find approximate/imperfect tandem repeats
- extract: NOT WORKING - Extract TR sequences and flanking sequence (bug in PyTRF 1.4.2, see known issues)
All commands require named input ‘seq’ (fasta/fastq).
Extract also requires ‘repeat’ (tsv/csv from findstr/findgtr/findatr).
Bioconda package: https://bioconda.github.io/recipes/pytrf/README.html
GitHub repository: https://github.com/lmdu/pytrf
License: MIT License
Disclaimer: This is a minimal implementation supporting basic functionality. pytrf is not a Python binding to TRF - it’s an independent tool.
Testing:
This wrapper skips extract test until upstream patch is released.
Known issues:
- PyTRF 1.4.2 has a bug in findstr bed output format (https://github.com/lmdu/pytrf/issues/7)
- PyTRF 1.4.2 has a bug in extract command (https://github.com/lmdu/pytrf/issues/6)
Software dependencies
pytrf=1.5.0snakemake-wrapper-utils=0.8.0
Input/Output
Input:
seq: FASTA or FASTQ file (supports gzip compression)repeat: For extract only - TSV/CSV file from findstr/findgtr/findatr
Output:
Output file (required). Format auto-detected from file extension.
Params
subcommand: PyTRF subcommand to run [findstr, findgtr, findatr, extract].extra: Additional command-line arguments passed to pytrf.
Code
"""
Snakemake Wrapper for PyTRF
------------------------------------------------------
Tandem repeat finding and extraction toolkit.
Supports: findstr, findgtr, findatr, extract subcommands.
"""
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_format, is_arg
# Configuration variables
VALID_SUBCOMMANDS = {"findstr", "findgtr", "findatr", "extract"}
FORMAT_SUPPORT = {
"findstr": {"tsv", "csv", "bed", "gff"},
"findgtr": {"tsv", "csv", "gff"},
"findatr": {"tsv", "csv", "gff"},
"extract": {"tsv", "csv", "fasta"},
}
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Get subcommand type and validate
if snakemake.params.subcommand not in VALID_SUBCOMMANDS:
raise ValueError(
f"Invalid subcommand '{snakemake.params.subcommand}'. "
f"Valid options: {', '.join(sorted(VALID_SUBCOMMANDS))}"
)
# Get repeat file (extract only)
repeat_file = ""
if snakemake.params.subcommand == "extract":
repeat_file = f"-r {snakemake.input.repeat}"
if is_arg("-r", extra) or is_arg("--repeat-file", extra):
raise ValueError(
"Repeat file is provided as input.repeat. "
"Do not specify -r/--repeat-file in params.extra"
)
# Infer and validate output format
out_format = get_format(snakemake.output[0])
if out_format not in FORMAT_SUPPORT[snakemake.params.subcommand]:
raise ValueError(
f"Unsupported format '{out_format}' for pytrf {snakemake.params.subcommand}. "
f"Supported formats: {', '.join(sorted(supported_formats))}"
)
# Validate: block format and output flags
if (
is_arg("-f", extra)
or is_arg("--out-format", extra)
or is_arg("-o", extra)
or is_arg("--out-file", extra)
):
raise ValueError(
"Output format is inferred and output path is provided through Snakemake. "
"Do not specify -f/--out-format or -o/--out-file in params.extra"
)
# Execute
shell(
"pytrf {snakemake.params.subcommand}"
" {snakemake.input.seq}"
" {repeat_file}"
" {extra}"
" -f {out_format}"
" -o {snakemake.output[0]}"
" {log}"
)