NGSDERIVE
Backwards computing information from next-generation sequencing data and annotating splice junctions
URL: https://github.com/stjudecloud/ngsderive
Example
This wrapper can be used in the following way:
rule test_ngsderive_endedness:
input:
ngs="A.rg.bam",
output:
tsv="A.endedness.tsv",
log:
"ngsderive/endedness.log",
params:
command="endedness",
extra="--n-reads 2",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_junction_annotation:
input:
ngs="A.rg.bam",
gene_model="annotation.sorted.gtf.gz",
output:
tsv="A.junctions.tsv",
junction_dir=directory("junctions"),
log:
"ngsderive/junctions.log",
params:
command="junction-annotation",
extra="--min-intron 2 --consider-unannotated-references-novel",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_junction_annotation_list:
input:
ngs="A.rg.bam",
gene_model="annotation.sorted.gtf.gz",
output:
tsv="A.junctions_list.tsv",
junction_dir=["junctions/A.rg.bam.junctions.tsv"],
log:
"ngsderive/junctions.log",
params:
command="junction-annotation",
extra="--min-intron 2 --consider-unannotated-references-novel",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_strandeness:
input:
ngs="A.rg.bam",
gene_model="annotation.sorted.gtf.gz",
output:
tsv="A.strandedness.tsv",
log:
"ngsderive/strand.log",
params:
command="strandedness",
extra="--verbose --minimum-reads-per-gene 2 --n-genes 1",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_encoding:
input:
ngs="A.rg.bam",
output:
tsv="A.encoding.tsv",
log:
"ngsderive/encoding.log",
params:
command="encoding",
extra="--n-reads 2",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_instrument:
input:
ngs="A.rg.bam",
output:
tsv="A.instrument.tsv",
log:
"ngsderive/instrument.log",
params:
command="instrument",
extra="--n-reads 2 --verbose",
wrapper:
"v3.12.2/bio/ngsderive"
rule test_ngsderive_readlen:
input:
ngs="A.rg.bam",
output:
tsv="A.readlen.tsv",
log:
"ngsderive/readlen.log",
params:
command="readlen",
extra="--majority-vote-cutoff 10 --n-reads 2",
wrapper:
"v3.12.2/bio/ngsderive"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
GTF/GFF will be automatically sorted and tabix-indexed by ngsderive if needed.
Software dependencies
ngsderive=4.0.0
Input/Output
Input:
ngs: Path to BAM/SAM/Fastq file. SAM/BAM files should be indexed.gene_model: Path to sorted GTF/GFF file. Should be tabix indexed.
Output:
tsv: Path to output filejunctions: Optional path to junction directory, or list of paths to junction files with a common prefix
Params
subcommand: Name of the ngsderive subcommandextra: Optional parameters, besides -o, -g
Code
# coding: utf-8
__author__ = "Thibault Dayris"
__mail__ = "thibault.dayris@gustaveroussy.fr"
__copyright__ = "Copyright 2024, Thibault Dayris"
__license__ = "MIT"
from os.path import commonprefix, dirname
from snakemake import shell
from warnings import warn
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
gene_model = snakemake.input.get("gene_model", "")
if gene_model:
gene_model = f"--gene-model {gene_model}"
junction_dir = snakemake.output.get("junction_dir", "")
if isinstance(junction_dir, list):
junction_dir = commonprefix([dirname(fp) for fp in junction_dir])
if not junction_dir:
warn(
"No common prefix was found within the list of "
"files given as `junction_files_dir`. Falling "
"back to default ngsderive value"
)
if junction_dir:
junction_dir = f"--junction-files-dir {junction_dir}"
shell(
"ngsderive {snakemake.params.command} "
"{extra} {gene_model} {junction_dir} "
"{snakemake.input.ngs} "
"--outfile {snakemake.output.tsv} "
"{log} "
)