SALSA2

A tool to scaffold long read assemblies with Hi-C data

URL: https://github.com/marbl/SALSA

Example

This wrapper can be used in the following way:

rule salsa2:
    input:
        fas="{sample}.fasta",
        fai="{sample}.fasta.fai",
        bed="{sample}.bed",
    output:
        agp="out/{sample}.agp",
        fas="out/{sample}.fas",
    log:
        "logs/salsa2/{sample}.log",
    params:
        enzyme="CTTAAG",  # optional
        extra="--clean yes",  # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v1.15.2/bio/salsa2"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The extra param allows for additional program arguments.

Software dependencies

  • salsa2=2.3

Input/Output

Input:

  • BED file
  • FASTA file
  • FASTA index file

Output:

  • polished assembly (FASTA format)
  • polished assembly (AGP format)

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"


import tempfile
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


enzyme = snakemake.params.get("enzyme", "")
if enzyme:
    enzyme = f"--enzyme {enzyme}"

gfa = snakemake.input.get("gfa", "")
if gfa:
    gfa = f"--gfa {gfa}"


with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "run_pipeline.py"
        " --assembly {snakemake.input.fas}"
        " --length {snakemake.input.fai}"
        " --bed {snakemake.input.bed}"
        " {enzyme}"
        " {gfa}"
        " {extra}"
        " --output {tmpdir}"
        " {log}"
    )

    if snakemake.output.get("agp"):
        shell("cat {tmpdir}/scaffolds_FINAL.agp > {snakemake.output.agp}")
    if snakemake.output.get("fas"):
        shell("cat {tmpdir}/scaffolds_FINAL.fasta > {snakemake.output.fas}")