PURGE_DUPS GET_SEQS
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example
This wrapper can be used in the following way:
rule purge_dups_get_seqs:
input:
fas="genome.fasta",
bed="purge_dups.bed",
output:
hap="out/get_seqs.hap.fasta",
purged="out/get_seqs.purged.fasta",
log:
"logs/get_seqs.log",
params:
extra="",
threads: 1
wrapper:
"v5.0.1/bio/purge_dups/get_seqs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The extra param allows for additional program arguments.
Software dependencies
purge_dups=1.2.6
Input/Output
Input:
BED file
FASTA file
Output:
purged FASTA file
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if Path(snakemake.input.bed).stat().st_size:
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"get_seqs {extra} -p {tmpdir}/out {snakemake.input.bed} {snakemake.input.fas} {log}"
)
if snakemake.output.get("hap"):
shell("cat {tmpdir}/out.hap.fa > {snakemake.output.hap}")
if snakemake.output.get("purged"):
shell("cat {tmpdir}/out.purged.fa > {snakemake.output.purged}")
else:
# If BED file empty, copy input to output since `get_seqs` will segfault
log = Path(snakemake.log[0])
log.write_text(
"WARN: Input BED file is empty. Input FASTA file will be copied to output."
)
shell("cp {snakemake.input.fas} {snakemake.output.hap}")
Path(snakemake.output.purged).touch()