.. _`bio/gffread`: GFFREAD ======= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gffread?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gffread Validate, filter, convert and perform various other operations on GFF/GTF files with Gffread **URL**: http://ccb.jhu.edu/software/stringtie/gff.shtml Example ------- This wrapper can be used in the following way: .. code-block:: python rule test_gffread: input: fasta="genome.fasta", annotation="annotation.gtf", # ids="", # Optional path to records to keep # nids="", # Optional path to records to drop # seq_info="", # Optional path to sequence information # sort_by="", # Optional path to the ordered list of reference sequences # attr="", # Optional annotation attributes to keep. # chr_replace="", # Optional path to output: records="transcripts.fa", # dupinfo="", # Optional path to clustering/merging information threads: 1 log: "logs/gffread.log", params: extra="", wrapper: "v3.0.1/bio/gffread" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- Input/output formats are automatically detected from their file extension. Software dependencies --------------------- * ``gffread=0.12.7`` Input/Output ------------ **Input:** * ``fasta``: Path to genome file (FASTA formatted). * ``annotation``: Path to genome annotation (GTF/GTF/BED formatted). * ``ids``: Optional path to records/transcript to keep. * ``nids``: Optional path to records/transcripts to discard. * ``seq_info``: Optional path to sequence information, a TSV formatted text file containing ` ` * ``sort_by``: Optional path to a text file containing the ordered list of reference sequences. * ``attr``: Optional text file containing comma-separated list of annotation attributes to keep. * ``chr_replace``: Optional path to a TSV-formatted text file containing ` `. **Output:** * ``records``: Path to genome sequence/annotation in the requested format, containing the requested information. * ``dupinfo``: Optional path to clustering/merging information Authors ------- Code ---- .. code-block:: python __author__ = "Thibault Dayris" __copyright__ = "Copyright 2023, Thibault Dayris" __mail__ = "thibault.dayris@gustaveroussy.fr" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) annotation = snakemake.input.annotation records = snakemake.output.records # Input format control if annotation.endswith(".bed"): extra += " --in-bed " elif annotation.endswith(".tlf"): extra += " --in-tlf " elif annotation.endswith(".gtf"): pass else: raise ValueError("Unknown annotation format") # In most cases, output can be specified with -o out_flag = " -o " # Output format control if records.endswith((".gtf", ".gff", ".gff3")): extra += " -T " elif records.endswith(".bed"): extra += " --bed " elif records.endswith(".tlf"): extra += " --tlf " elif records.endswith((".fasta", ".fa", ".fna")): # Fasta output must be specified with -w out_flag = " -w " else: raise ValueError("Unknown records format") # Optional input files ids = snakemake.input.get("ids", "") if ids: extra += f" --ids {ids} " nids = snakemake.input.get("nids", "") if nids: if ids: raise ValueError( "Provide either sequences ids to keep, or to drop." " Or else, an empty file is produced." ) extra += f" --nids {nids} " seq_info = snakemake.input.get("seq_info", "") if seq_info: extra += f" -s {seq_info} " sort_by = snakemake.input.get("sort_by", "") if sort_by: extra += f" --sort-by {sort_by} " attr = snakemake.input.get("attr", "") if attr: if not records.endswith((".gtf", ".gff", ".gff3")): raise ValueError( "GTF attributes specified in input, " "but records are not in GTF/GFF format." ) extra += f" --attrs {attr} " chr_replace = snakemake.input.get("chr_replace", "") if chr_replace: extra += f" -m {chr_replace} " # Optional output files dupinfo = snakemake.output.get("dupinfo", "") if dupinfo: extra += f" -d {dupinfo} " shell( "gffread {extra} " "{out_flag} {records} " "-g {snakemake.input.fasta} " "{annotation} " "{log} " ) .. |nl| raw:: html