TRF

https://img.shields.io/badge/wrapper_version-v9.4.2-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/trf?label=version%20update%20pull%20requests&color=1cb481

Wrapper for Tandem Repeats Finder (TRF), a tool to identify tandem repeats in DNA sequences, enabling easy integration into Snakemake workflows.

URL: https://tandem.bu.edu/trf/home

Example

This wrapper can be used in the following way:

# SAMPLE RULE 1: Run with params directive that is a must.
# Flags or optional flags should go in extra parameter exactly
# as it would be typed in for a terminal command. In this
# example officially recommended parameters and flag options
# on TRF website are used. Also, use directory function
# to specifiy output folder.
rule run_trf_basic:
    input:
        sample="demo_data/{sample}.fasta",
    output:
        directory("trf_output/{sample}"),
    log:
        "logs/{sample}.log",
    params:
        match=2,
        mismatch=7,
        delta=7,
        pm=80,
        pi=10,
        minscore=50,
        maxperiod=500,
        extra="-f -d -m",
    wrapper:
        "v9.4.2/bio/trf"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Flag(s) are specified using the ‘extra’ param (e.g., ‘-d -h’). For -l flag, write it with space like -l 29. TRF documentation allows -l=29 but on current version running it behaves abonormally, hence avoid, as this behaves according to the utility.

One or more output type of files are produced based on the flag(s) selection just as in TRF.

For Developers:
. - GitHub Repository: https://github.com/Benson-Genomics-Lab/TRF
- Bioconda Package: https://prefix.dev/channels/bioconda/packages/trf


Limitations or Future Work:
Note: As this is a wrapper for TRF utility, it comes with it’s limitations or defects if any. Also, allowed values mentioned below are specified as deduced from the TRF’s resources, so this wrapper doesn’t validate on top of the TRF utility, rather depict’s TRF behaviour.

Software dependencies

  • trf=4.10.0rc2

Input/Output

Input:

  • fasta: A DNA sequence file in FASTA format to be analyzed by Tandem Repeats Finder.

Output:

  • directory: This must be specified in the way shown in the rule snippet above. Final contents can contain one or more of Data file (*.dat), Masked sequence file (*.mask) or HTML file (*.html).

Params

  • match: Match weight (allowed: >= 1)

  • mismatch: Mismatch penalty (allowed: 3, 5, 7)

  • delta: Indel penalty (allowed: 3, 5, 7)

  • pm: Match probability percentage (allowed: 75, 80)

  • pi: Indel probability percentage (allowed: 10, 20)

  • minscore: Minimum alignment score to report (allowed: >= 1)

  • maxperiod: Maximum period size to report (allowed: >= 1 and <= 2000)

  • extra: Optional command-line flags to pass to Tandem Repeats Finder (TRF).
    These flags are appended after the 7 required numeric parameters. Supported flags (default state in parentheses): -m (generate masked sequence file), -f (include flanking sequence), -d (produce .dat file), -h (suppress HTML output), -l <n> (set max tandem repeat size, allowed: >=1, preffered: >=1 and <=29), -ngs (more compact .dat output on multisequence files), -u (usage), -v (version). Provide flags as a quoted string, e.g., ‘-d -h’.

Authors

  • Muhammad Rohan Ali Asmat

Code

"""
Snakemake Wrapper for TRF (Tandem Repeat Finder)
------------------------------------------------------
Take all necessary parameters required by the TRF tool
and runs the facility to produce the desired output.
"""

import os
from pathlib import Path
from typing import Any

from snakemake.shell import shell

###########################################################################
###################### Constants or Function Definitions ##################
###########################################################################

TRF_PARAMS = {
    "match": False,
    "mismatch": False,
    "delta": False,
    "pm": False,
    "pi": False,
    "minscore": False,
    "maxperiod": False,
}


def get_params_and_flags_string(
    snakemake_params: Any, trf_params: dict[str, bool]
) -> str:
    """
    This function returns a string after building partial command to run trf utility.
    It includes the value of parameters and flags provided by the user.
    """
    collected_params = {}
    flags_string = ""
    expected_params = set(trf_params.keys())

    for key, value in snakemake_params.items():
        if key.lower() in expected_params:
            collected_params[key.lower()] = value
        elif key.lower() == "extra":
            flags_string = value
        else:
            print(f"[TRF-WARNING] Unknown parameter '{key}' will be ignored")

    provided_params = set(collected_params.keys())
    missing_params = expected_params - provided_params

    if missing_params:
        raise ValueError(
            f"Missing required parameters: {', '.join(sorted(missing_params))}. "
            f"Required parameters are: {', '.join(sorted(expected_params))}"
        )

    collected_params_and_flags = ""
    ordered_data = {
        k: collected_params[k] for k in trf_params.keys() if k in collected_params
    }
    for _, items in ordered_data.items():
        collected_params_and_flags += f" {items}"

    if flags_string:
        collected_params_and_flags += " " + flags_string

    return collected_params_and_flags


###########################################################################
###################### Main Flow Starts Here ##############################
###########################################################################

# Setting up log redirect.
log_redirect = ""  # pylint: disable=invalid-name
if snakemake.log and snakemake.log[0]:
    try:
        log_file = Path(snakemake.log[0]).resolve()
        log_file.parent.mkdir(parents=True, exist_ok=True)
        snakemake.log = str(log_file)
        log_redirect = snakemake.log_fmt_shell(stdout=True, stderr=True)
        print(f"[TRF-INFO] Logging redirected to: {log_file}")
    except (OSError, PermissionError) as e:
        print(f"[TRF-WARNING] Failed to set up logging: {e}")
        log_redirect = ""  # pylint: disable=invalid-name
else:
    print("[TRF-INFO] No logging file provided, so outputting to console.")


# Getting & Validating input File.
try:
    input_file = Path(snakemake.input[0]).resolve()
    output_dir = Path(snakemake.output[0]).resolve()
except (IndexError, TypeError) as e:
    raise ValueError(f"Input/output specification error: {e}") from e

# Changing to output directory
try:
    output_dir.mkdir(parents=True, exist_ok=True)
    os.chdir(output_dir)
    print(f"[TRF-INFO] Working in output directory: {output_dir}")
except (OSError, PermissionError) as e:
    raise RuntimeError(
        f"Failed to create/access output directory '{output_dir}': {e}"
    ) from e

# Building Command for TRF
try:
    relative_input = os.path.relpath(input_file, output_dir)
    cmd = f"trf {relative_input}"  # pylint: disable=invalid-name
except ValueError as e:
    raise RuntimeError(f"Failed to compute relative path: {e}") from e

try:
    collected_trf_params_and_flags = get_params_and_flags_string(
        snakemake.params, TRF_PARAMS
    )
    cmd += collected_trf_params_and_flags
except ValueError as e:
    raise RuntimeError(f"Parameter validation failed: {e}") from e

# Running Command & printing status.
print(f"[TRF-INFO] Ready to run TRF command: {cmd}")
try:
    shell(f"{cmd} {log_redirect}")
    print("[TRF-INFO] Snakemake TRF wrapper completed actions.")
except Exception as e:
    raise RuntimeError(f"TRF command execution failed: {e}") from e