PURGE_DUPS GET_SEQS

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/purge_dups/get_seqs?label=version%20update%20pull%20requests

Purge haplotigs and overlaps in an assembly based on read depth

URL: https://github.com/dfguan/purge_dups

Example

This wrapper can be used in the following way:

rule purge_dups_get_seqs:
    input:
        fas="genome.fasta",
        bed="purge_dups.bed",
    output:
        hap="out/get_seqs.hap.fasta",
        purged="out/get_seqs.purged.fasta",
    log:
        "logs/get_seqs.log",
    params:
        extra="",
    threads: 1
    wrapper:
        "v3.6.0-3-gc8272d7/bio/purge_dups/get_seqs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The extra param allows for additional program arguments.

Software dependencies

  • purge_dups=1.2.6

Input/Output

Input:

  • BED file

  • FASTA file

Output:

  • purged FASTA file

Authors

  • Filipe Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"


import tempfile
from pathlib import Path
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


if Path(snakemake.input.bed).stat().st_size:
    with tempfile.TemporaryDirectory() as tmpdir:
        shell(
            "get_seqs {extra} -p {tmpdir}/out {snakemake.input.bed} {snakemake.input.fas} {log}"
        )

        if snakemake.output.get("hap"):
            shell("cat {tmpdir}/out.hap.fa > {snakemake.output.hap}")
        if snakemake.output.get("purged"):
            shell("cat {tmpdir}/out.purged.fa > {snakemake.output.purged}")
else:
    # If BED file empty, copy input to output since `get_seqs` will segfault
    log = Path(snakemake.log[0])
    log.write_text(
        "WARN: Input BED file is empty. Input FASTA file will be copied to output."
    )
    shell("cp {snakemake.input.fas} {snakemake.output.hap}")
    Path(snakemake.output.purged).touch()