MASHMAP
Compute local alignment boundaries between long DNA sequences with MashMap
URL: https://github.com/marbl/MashMap
Example
This wrapper can be used in the following way:
rule test_mashmap:
input:
ref="reference.fasta.gz", # This can be a txt file with a path to a fasta-file per line
query="read.fasta.gz",
output:
"mashmap.out",
threads: 2
params:
extra="-s 1000 --pi 99",
log:
"logs/mashmap.log",
wrapper:
"v5.7.0/bio/mashmap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
input.ref may be either a path to a fasta file or a text file containing a list of paths to several fasta files.
input.query may be either a path to a fastq file or a text file containing a list of paths to several fastq files.
Software dependencies
mashmap=3.1.3
gsl=2.7
gzip=1.13
Input/Output
Input:
ref
: Path to reference filequery
: Path to query file (fasta, fastq)
Output:
Path to the alignment file
Params
extra
: Optional parameters for MashMap
Code
#!/usr/bin/python3.8
# coding: utf-8
""" Snakemake wrapper for MashMap """
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
max_threads = snakemake.threads
# Handling input file types (either a fasta file, or a text file with a list of paths to fasta files)
ref = snakemake.input["ref"]
if ref.endswith(".txt"):
ref = f"--refList {ref}"
elif ref.endswith(".gz"):
ref = f"--ref <( gzip --decompress --stdout {ref} )"
max_threads -= 1
else:
ref = f"--ref {ref}"
if max_threads < 1:
raise ValueError(
"Reference fasta on-the-fly g-unzipping consumed one thread."
f" Please increase the number of available threads by {1 - max_threads}."
)
# Handling query file format (either a fastq file or a text file with a list of fastq files)
query = snakemake.input["query"]
if query.endswith(".txt"):
query = f"--queryList {query}"
else:
query = f"--query {query}"
shell(
"mashmap "
"{ref} "
"{query} "
"--output {snakemake.output} "
"--threads {snakemake.threads} "
"{extra} "
"{log}"
)