MASHMAP

Compute local alignment boundaries between long DNA sequences with MashMap

URL: https://github.com/marbl/MashMap

Example

This wrapper can be used in the following way:

rule test_mashmap:
    input:
        ref="reference.fasta.gz",  # This can be a txt file with a path to a fasta-file per line
        query="read.fasta.gz",
    output:
        "mashmap.out",
    threads: 2
    params:
        extra="-s 1000 --pi 99",
    log:
        "logs/mashmap.log",
    wrapper:
        "v1.7.0/bio/mashmap"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • input.ref may be either a path to a fasta file or a text file containing a list of paths to several fasta files.
  • input.query may be either a path to a fastq file or a text file containing a list of paths to several fastq files.

Software dependencies

  • mashmap=2.0
  • gsl=2.7
  • gzip=1.11

Input/Output

Input:

  • ref: Path to reference file
  • query: Path to query file (fasta, fastq)

Output:

  • Path to the alignment file

Params

  • extra: Optional parameters for MashMap

Authors

  • Thibault Dayris

Code

#!/usr/bin/python3.8
# coding: utf-8

""" Snakemake wrapper for MashMap """

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
max_threads = snakemake.threads

# Handling input file types (either a fasta file, or a text file with a list of paths to fasta files)
ref = snakemake.input["ref"]
if ref.endswith(".txt"):
    ref = f"--refList {ref}"
elif ref.endswith(".gz"):
    ref = f"--ref <( gzip --decompress --stdout {ref} )"
    max_threads -= 1
else:
    ref = f"--ref {ref}"

if max_threads < 1:
    raise ValueError(
        "Reference fasta on-the-fly g-unzipping consumed one thread."
        f" Please increase the number of available threads by {1 - max_threads}."
    )


# Handling query file format (either a fastq file or a text file with a list of fastq files)
query = snakemake.input["query"]
if query.endswith(".txt"):
    query = f"--queryList {query}"
else:
    query = f"--query {query}"

shell(
    "mashmap "
    "{ref} "
    "{query} "
    "--output {snakemake.output} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{log}"
)