NGS-BITS SAMPLESIMILARITY

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/ngsbits/samplesimilarity?label=version%20update%20pull%20requests

Calculate several metrics that measure sample similarity.

URL: https://github.com/imgag/ngs-bits/blob/master/doc/tools/SampleSimilarity/index.md

Example

This wrapper can be used in the following way:

rule test_ngsbits_samplesimilarity:
    input:
        # ref="", # Optional path to fasta.fai file
        # regions="", # Optional path to regions of interest (bed)
        samples=expand("{sample}.vcf", sample=("a", "b")),
    output:
        "similarity.tsv",
    threads: 1
    log:
        "samplesimilarity.log",
    params:
        extra="-build hg19",
    wrapper:
        "v5.5.2-17-g33d5b76/bio/ngsbits/samplesimilarity"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • ngs-bits=2024_11

Input/Output

Input:

  • samples: list of paths to vcf/vcf.gz files, or list of paths to bam/sam/cram files

  • ref: Optional path to reference genome index file (FAI). Required for CRAM input.

  • regions: Optional path to regions of interest (BED).

Output:

  • Path to output TSV results

Params

  • extra: Optional parameters besides IO, or -mode.

Authors

  • Thibault Dayris

Code

# coding: utf-8

"""Snakemake wrapper for NGS-bits SampleSimilarity"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2024, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(
    stdout=True,
    stderr=True,
)
extra = snakemake.params.get("extra", "")
ref = snakemake.input.get("ref")
if ref:
    extra += f" -ref {ref:q}"

roi = snakemake.input.get("regions")
if roi:
    extra += f" -roi {roi:q}"

input_files = snakemake.input.get("samples")
if all(str(i).endswith((".vcf", ".vcf.gz")) for i in input_files):
    extra += " -mode vcf"
elif all(str(i).endswith((".sam", ".bam", ".cram")) for i in input_files):
    extra += " -mode bam"
else:
    extra += " -mode gsvar"

shell(
    "SampleSimilarity"
    " -in {input_files}"
    " {extra}"
    " -out {snakemake.output:q}"
    " {log}"
)