.. _`bio/gffread`:

GFFREAD
=======


.. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gffread?label=version%20update%20pull%20requests
   :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gffread

Validate, filter, convert and perform various other operations on GFF/GTF files with Gffread


**URL**: http://ccb.jhu.edu/software/stringtie/gff.shtml

Example
-------

This wrapper can be used in the following way:

.. code-block:: python

    rule test_gffread:
        input:
            fasta="genome.fasta",
            annotation="annotation.gtf",
            # ids="",  # Optional path to records to keep
            # nids="",  # Optional path to records to drop
            # seq_info="",  # Optional path to sequence information
            # sort_by="",  # Optional path to the ordered list of reference sequences
            # attr="",  # Optional annotation attributes to keep.
            # chr_replace="",  # Optional path to <original_ref_ID> <new_ref_ID>
        output:
            records="transcripts.fa",
            # dupinfo="",  # Optional path to clustering/merging information
        threads: 1
        log:
            "logs/gffread.log",
        params:
            extra="",
        wrapper:
            "v3.0.1/bio/gffread"

Note that input, output and log file paths can be chosen freely.

When running with

.. code-block:: bash

    snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.


Notes
-----

Input/output formats are automatically detected from their file extension.

Software dependencies
---------------------

* ``gffread=0.12.7``

Input/Output
------------
**Input:**

* ``fasta``: Path to genome file (FASTA formatted).
* ``annotation``: Path to genome annotation (GTF/GTF/BED formatted).
* ``ids``: Optional path to records/transcript to keep.
* ``nids``: Optional path to records/transcripts to discard.
* ``seq_info``: Optional path to sequence information, a TSV formatted text file containing `<seq-name> <seq-length> <seq-description>`
* ``sort_by``: Optional path to a text file containing the ordered list of reference sequences.
* ``attr``: Optional text file containing comma-separated list of annotation attributes to keep.
* ``chr_replace``: Optional path to a TSV-formatted text file containing `<original_ref_ID> <new_ref_ID>`.

**Output:**

* ``records``: Path to genome sequence/annotation in the requested format, containing the requested information.
* ``dupinfo``: Optional path to clustering/merging information


Authors
-------


Code
----

.. code-block:: python

    __author__ = "Thibault Dayris"
    __copyright__ = "Copyright 2023, Thibault Dayris"
    __mail__ = "thibault.dayris@gustaveroussy.fr"
    __license__ = "MIT"


    from snakemake.shell import shell

    extra = snakemake.params.get("extra", "")
    log = snakemake.log_fmt_shell(stdout=False, stderr=True)

    annotation = snakemake.input.annotation
    records = snakemake.output.records

    # Input format control
    if annotation.endswith(".bed"):
        extra += " --in-bed "
    elif annotation.endswith(".tlf"):
        extra += " --in-tlf "
    elif annotation.endswith(".gtf"):
        pass
    else:
        raise ValueError("Unknown annotation format")

    # In most cases, output can be specified with -o
    out_flag = " -o "

    # Output format control
    if records.endswith((".gtf", ".gff", ".gff3")):
        extra += " -T "
    elif records.endswith(".bed"):
        extra += " --bed "
    elif records.endswith(".tlf"):
        extra += " --tlf "
    elif records.endswith((".fasta", ".fa", ".fna")):
        # Fasta output must be specified with -w
        out_flag = " -w "
    else:
        raise ValueError("Unknown records format")


    # Optional input files
    ids = snakemake.input.get("ids", "")
    if ids:
        extra += f" --ids {ids} "

    nids = snakemake.input.get("nids", "")
    if nids:
        if ids:
            raise ValueError(
                "Provide either sequences ids to keep, or to drop."
                " Or else, an empty file is produced."
            )
        extra += f" --nids {nids} "

    seq_info = snakemake.input.get("seq_info", "")
    if seq_info:
        extra += f" -s {seq_info} "

    sort_by = snakemake.input.get("sort_by", "")
    if sort_by:
        extra += f" --sort-by {sort_by} "

    attr = snakemake.input.get("attr", "")
    if attr:
        if not records.endswith((".gtf", ".gff", ".gff3")):
            raise ValueError(
                "GTF attributes specified in input, "
                "but records are not in GTF/GFF format."
            )
        extra += f" --attrs {attr} "

    chr_replace = snakemake.input.get("chr_replace", "")
    if chr_replace:
        extra += f" -m {chr_replace} "


    # Optional output files
    dupinfo = snakemake.output.get("dupinfo", "")
    if dupinfo:
        extra += f" -d {dupinfo} "


    shell(
        "gffread {extra} "
        "{out_flag} {records} "
        "-g {snakemake.input.fasta} "
        "{annotation} "
        "{log} "
    )


.. |nl| raw:: html

   <br>