MEGAHIT
MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly. Input options that can be specified for multiple times (supporting plain text and gz/bz2 extensions).
URL: https://github.com/voutcn/megahit
Example
This wrapper can be used in the following way:
container: "docker://continuumio/miniconda3:4.4.10"
rule run_megahit:
input:
reads=["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
output:
contigs="assembly/final.contigs.fasta",
benchmark:
"logs/benchmarks/assembly/megahit.txt"
params:
# all parameters are optional
extra="--min-count 10 --k-list 21,29,39,59,79,99,119,141",
log:
"logs/megahit.log",
threads: 8
resources:
mem_mb=250000,
wrapper:
"v7.6.0/bio/megahit"
rule download_test_reads:
output:
["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
log:
"logs/download.log",
shell:
"(wget -O - https://zenodo.org/record/3992790/files/test_reads.tar.gz | tar -xzf -) > {log} 2>&1"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
megahit=1.2.9snakemake-wrapper-utils=0.8.0
Input/Output
Input:
reads: list of reads in FASTQ formatr1: forward readsr2: reverse readsinterleaved: interleaved readsunpaired: unpaired reads
Output:
contigs: output file with contigslog: log filejson: options json file
Code
"""Snakemake wrapper for megahit."""
__author__ = "Jie Zhu @alienzj, Filipe G. Vieira @fgvieira"
__copyright__ = "Copyright 2025, Jie Zhu, Filipe G. Vieira"
__email__ = "alienchuj@gmail.com"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
# parse params
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
memory_requirements = get_mem(snakemake, out_unit="KiB") * 1024
# parse short reads
reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input
input_arg = ""
# handle named inputs if available
if hasattr(snakemake.input, "r1") and hasattr(snakemake.input, "r2"):
input_arg += f" -1 {snakemake.input.r1} -2 {snakemake.input.r2}"
elif len(reads) >= 2:
input_arg += f" -1 {reads[0]} -2 {reads[1]}"
# handle interleaved reads if specified
if hasattr(snakemake.input, "interleaved"):
input_arg += f" --12 {snakemake.input.interleaved}"
elif len(reads) >= 3 and not hasattr(snakemake.input, "r1"):
input_arg += f" --12 {reads[2]}"
# handle additional reads if specified
if hasattr(snakemake.input, "unpaired"):
input_arg += f" --read {snakemake.input.unpaired}"
elif len(reads) >= 4 and not hasattr(snakemake.input, "r1"):
input_arg += f" --read {reads[3]}"
# run megahit
with tempfile.TemporaryDirectory() as tmpdir:
output_tmpdir = Path(tmpdir) / "temp"
shell(
"megahit"
" -t {snakemake.threads}"
" -m {memory_requirements}"
" -o {output_tmpdir}"
" {input_arg}"
" {extra}"
" {log}"
)
# Ensure user can name each file according to their need
output_mapping = {
"contigs": f"{output_tmpdir}/final.contigs.fa",
"json": f"{output_tmpdir}/options.json",
"log": f"{output_tmpdir}/log",
}
for output_key, temp_file in output_mapping.items():
output_path = snakemake.output.get(output_key)
if output_path:
shell("cp --verbose {temp_file:q} {output_path:q} {log}")