SALSA2
A tool to scaffold long read assemblies with Hi-C data
URL: https://github.com/marbl/SALSA
Example
This wrapper can be used in the following way:
rule salsa2:
input:
fas="{sample}.fasta",
fai="{sample}.fasta.fai",
bed="{sample}.bed",
output:
agp="out/{sample}.agp",
fas="out/{sample}.fas",
log:
"logs/salsa2/{sample}.log",
params:
enzyme="CTTAAG", # optional
extra="--clean yes", # optional
resources:
mem_mb=1024,
wrapper:
"v5.7.0/bio/salsa2"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The extra param allows for additional program arguments.
Software dependencies
salsa2=2.3
Input/Output
Input:
BED file
FASTA file
FASTA index file
Output:
polished assembly (FASTA format)
polished assembly (AGP format)
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
enzyme = snakemake.params.get("enzyme", "")
if enzyme:
enzyme = f"--enzyme {enzyme}"
gfa = snakemake.input.get("gfa", "")
if gfa:
gfa = f"--gfa {gfa}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"run_pipeline.py"
" --assembly {snakemake.input.fas}"
" --length {snakemake.input.fai}"
" --bed {snakemake.input.bed}"
" {enzyme}"
" {gfa}"
" {extra}"
" --output {tmpdir}"
" {log}"
)
if snakemake.output.get("agp"):
shell("cat {tmpdir}/scaffolds_FINAL.agp > {snakemake.output.agp}")
if snakemake.output.get("fas"):
shell("cat {tmpdir}/scaffolds_FINAL.fasta > {snakemake.output.fas}")