TRF
Wrapper for Tandem Repeats Finder (TRF), a tool to identify tandem repeats in DNA sequences, enabling easy integration into Snakemake workflows.
URL: https://tandem.bu.edu/trf/home
Example
This wrapper can be used in the following way:
# SAMPLE RULE 1: Run with params directive that is a must.
# Flags or optional flags should go in extra parameter exactly
# as it would be typed in for a terminal command. In this
# example officially recommended parameters and flag options
# on TRF website are used. Also, use directory function
# to specifiy output folder.
rule run_trf_basic:
input:
sample="demo_data/{sample}.fasta",
output:
directory("trf_output/{sample}"),
log:
"logs/{sample}.log",
params:
match=2,
mismatch=7,
delta=7,
pm=80,
pi=10,
minscore=50,
maxperiod=500,
extra="-f -d -m",
wrapper:
"v9.4.2/bio/trf"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Flag(s) are specified using the ‘extra’ param (e.g., ‘-d -h’). For -l flag, write it with space like -l 29. TRF documentation allows -l=29 but on current version running it behaves abonormally, hence avoid, as this behaves according to the utility.
One or more output type of files are produced based on the flag(s) selection just as in TRF.
For Developers:
. - GitHub Repository: https://github.com/Benson-Genomics-Lab/TRF
- Bioconda Package: https://prefix.dev/channels/bioconda/packages/trf
Limitations or Future Work:
Note: As this is a wrapper for TRF utility, it comes with it’s limitations or defects if any. Also, allowed values mentioned below are specified as deduced from the TRF’s resources, so this wrapper doesn’t validate on top of the TRF utility, rather depict’s TRF behaviour.
Software dependencies
trf=4.10.0rc2
Input/Output
Input:
fasta: A DNA sequence file in FASTA format to be analyzed by Tandem Repeats Finder.
Output:
directory: This must be specified in the way shown in the rule snippet above. Final contents can contain one or more of Data file (*.dat), Masked sequence file (*.mask) or HTML file (*.html).
Params
match: Match weight (allowed: >= 1)mismatch: Mismatch penalty (allowed: 3, 5, 7)delta: Indel penalty (allowed: 3, 5, 7)pm: Match probability percentage (allowed: 75, 80)pi: Indel probability percentage (allowed: 10, 20)minscore: Minimum alignment score to report (allowed: >= 1)maxperiod: Maximum period size to report (allowed: >= 1 and <= 2000)extra: Optional command-line flags to pass to Tandem Repeats Finder (TRF).
These flags are appended after the 7 required numeric parameters. Supported flags (default state in parentheses): -m (generate masked sequence file), -f (include flanking sequence), -d (produce .dat file), -h (suppress HTML output), -l <n> (set max tandem repeat size, allowed: >=1, preffered: >=1 and <=29), -ngs (more compact .dat output on multisequence files), -u (usage), -v (version). Provide flags as a quoted string, e.g., ‘-d -h’.
Code
"""
Snakemake Wrapper for TRF (Tandem Repeat Finder)
------------------------------------------------------
Take all necessary parameters required by the TRF tool
and runs the facility to produce the desired output.
"""
import os
from pathlib import Path
from typing import Any
from snakemake.shell import shell
###########################################################################
###################### Constants or Function Definitions ##################
###########################################################################
TRF_PARAMS = {
"match": False,
"mismatch": False,
"delta": False,
"pm": False,
"pi": False,
"minscore": False,
"maxperiod": False,
}
def get_params_and_flags_string(
snakemake_params: Any, trf_params: dict[str, bool]
) -> str:
"""
This function returns a string after building partial command to run trf utility.
It includes the value of parameters and flags provided by the user.
"""
collected_params = {}
flags_string = ""
expected_params = set(trf_params.keys())
for key, value in snakemake_params.items():
if key.lower() in expected_params:
collected_params[key.lower()] = value
elif key.lower() == "extra":
flags_string = value
else:
print(f"[TRF-WARNING] Unknown parameter '{key}' will be ignored")
provided_params = set(collected_params.keys())
missing_params = expected_params - provided_params
if missing_params:
raise ValueError(
f"Missing required parameters: {', '.join(sorted(missing_params))}. "
f"Required parameters are: {', '.join(sorted(expected_params))}"
)
collected_params_and_flags = ""
ordered_data = {
k: collected_params[k] for k in trf_params.keys() if k in collected_params
}
for _, items in ordered_data.items():
collected_params_and_flags += f" {items}"
if flags_string:
collected_params_and_flags += " " + flags_string
return collected_params_and_flags
###########################################################################
###################### Main Flow Starts Here ##############################
###########################################################################
# Setting up log redirect.
log_redirect = "" # pylint: disable=invalid-name
if snakemake.log and snakemake.log[0]:
try:
log_file = Path(snakemake.log[0]).resolve()
log_file.parent.mkdir(parents=True, exist_ok=True)
snakemake.log = str(log_file)
log_redirect = snakemake.log_fmt_shell(stdout=True, stderr=True)
print(f"[TRF-INFO] Logging redirected to: {log_file}")
except (OSError, PermissionError) as e:
print(f"[TRF-WARNING] Failed to set up logging: {e}")
log_redirect = "" # pylint: disable=invalid-name
else:
print("[TRF-INFO] No logging file provided, so outputting to console.")
# Getting & Validating input File.
try:
input_file = Path(snakemake.input[0]).resolve()
output_dir = Path(snakemake.output[0]).resolve()
except (IndexError, TypeError) as e:
raise ValueError(f"Input/output specification error: {e}") from e
# Changing to output directory
try:
output_dir.mkdir(parents=True, exist_ok=True)
os.chdir(output_dir)
print(f"[TRF-INFO] Working in output directory: {output_dir}")
except (OSError, PermissionError) as e:
raise RuntimeError(
f"Failed to create/access output directory '{output_dir}': {e}"
) from e
# Building Command for TRF
try:
relative_input = os.path.relpath(input_file, output_dir)
cmd = f"trf {relative_input}" # pylint: disable=invalid-name
except ValueError as e:
raise RuntimeError(f"Failed to compute relative path: {e}") from e
try:
collected_trf_params_and_flags = get_params_and_flags_string(
snakemake.params, TRF_PARAMS
)
cmd += collected_trf_params_and_flags
except ValueError as e:
raise RuntimeError(f"Parameter validation failed: {e}") from e
# Running Command & printing status.
print(f"[TRF-INFO] Ready to run TRF command: {cmd}")
try:
shell(f"{cmd} {log_redirect}")
print("[TRF-INFO] Snakemake TRF wrapper completed actions.")
except Exception as e:
raise RuntimeError(f"TRF command execution failed: {e}") from e