.. _`bio/spades/metaspades`:
METASPADES
==========
.. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/spades/metaspades?label=version%20update%20pull%20requests
:target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/spades/metaspades
Assemble metagenome with metaspades. For more information see the `Spades documentation `_.
Metagenome assembly uses a lot of computational resources. Spades is told to restart from a previous checkpont if the file `params.txt` exist in the output directory. In this way one can use snakemake with `--restart-times` to automatically restart the assembly.
Input of metaspades should be at least one paired-end library (=2 fastq files) optionally merged reads as a third fastq file might be supplied and singleton reads as a 4th input file. Long reads can also be input as `pacbio` or `nanopore` input argument. To distinguish short from long reads. Use the `reads` as name for the short reads.
Example
-------
This wrapper can be used in the following way:
.. code-block:: python
container: "docker://continuumio/miniconda3:4.4.10"
rule run_metaspades:
input:
reads=["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
output:
contigs="assembly/contigs.fasta",
scaffolds="assembly/scaffolds.fasta",
dir=directory("assembly/intermediate_files"),
benchmark:
"logs/benchmarks/assembly/spades.txt"
params:
# all parameters are optional
k="auto",
extra="--only-assembler",
log:
"log/spades.log",
threads: 8
resources:
mem_mem=250000,
time=60 * 24,
wrapper:
"v3.0.1/bio/spades/metaspades"
rule download_test_reads:
output:
["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
log:
"log/download.log",
shell:
" wget https://zenodo.org/record/3992790/files/test_reads.tar.gz >> {log} 2>&1 ; "
" tar -xzf test_reads.tar.gz >> {log} 2>&1"
Note that input, output and log file paths can be chosen freely.
When running with
.. code-block:: bash
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
---------------------
* ``spades=3.15.5``
* ``python=3.12.0``
Authors
-------
* Silas Kieser
* Anton Korobeynikov
Code
----
.. code-block:: python
"""Snakemake wrapper for metaspades."""
__author__ = "Silas Kieser @silask"
__copyright__ = "Copyright 2021, Silas Kieser"
__email__ = "silas.kieser@gmail.com"
__license__ = "MIT"
import os, shutil
from snakemake.shell import shell
# infer output directory
if hasattr(snakemake.output, "dir"):
output_dir = snakemake.output.dir
else:
# get output_dir file from output
if hasattr(snakemake.output, "contigs"):
output_file = snakemake.output.contigs
elif hasattr(snakemake.output, "scaffolds"):
output_file = snakemake.output.scaffolds
else:
output_file = snakemake.output[0]
output_dir = os.path.split(output_file)[0]
# parse params
extra = snakemake.params.get("extra", "")
kmers = snakemake.params.get("k", "'auto'")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if hasattr(snakemake.resources, "mem_mb"):
mem_gb = snakemake.resources.mem_mb // 1000
memory_requirements = f" --memory {mem_gb}"
else:
memory_requirements = ""
if not os.path.exists(os.path.join(output_dir, "params.txt")):
# parse short reads
if hasattr(snakemake.input, "reads"):
reads = snakemake.input.reads
else:
reads = snakemake.input
assert (
len(reads) > 1
), "Metaspades needs a paired end library. This means you should supply at least 2 fastq files in the rule input."
assert (
type(reads[0]) == str
), f"Metaspades allows only 1 library. Therefore reads need to be strings got {reads}"
input_arg = " --pe1-1 {0} --pe1-2 {1} ".format(*reads)
if len(reads) >= 3:
input_arg += " --pe1-m {2}".format(*reads)
if len(reads) >= 4:
input_arg += " --pe1-s {3}".format(*reads)
# parse long reads
for longread_name in ["pacbio", "nanopore"]:
if hasattr(snakemake.input, longread_name):
input_arg += " --{name} {}".format(name=longread_name, **snakemake.input)
shell(
"spades.py --meta "
" --threads {snakemake.threads} "
" {memory_requirements} "
" -o {output_dir} "
" -k {kmers} "
" {input_arg} "
" {extra} "
" > {snakemake.log[0]} 2>&1 "
)
else:
# params.txt file exitst already I restart from previous run
shell(
"echo '\n\nRestart Spades \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
)
shell("rm -f {output_dir}/pipeline_state/stage_*_copy_files 2>> {log}")
shell(
"spades.py --meta "
" --restart-from last "
" --threads {threads} "
" {memory_requirements} "
" -o {output_dir} "
" >> {snakemake.log[0]} 2>&1 "
)
# Rename/ move output files
Output_key_mapping = {
"contigs": "contigs.fasta",
"scaffolds": "scaffolds.fasta",
"graph": "assembly_graph_with_scaffolds.gfa",
}
has_named_output = False
for key in Output_key_mapping:
if hasattr(snakemake.output, key):
has_named_output = True
file_produced = os.path.join(output_dir, Output_key_mapping[key])
file_renamed = getattr(snakemake.output, key)
if file_produced != file_renamed:
shutil.move(file_produced, file_renamed)
if not has_named_output:
file_produced = os.path.join(output_dir, "contigs.fasta")
file_renamed = snakemake.output[0]
if file_produced != file_renamed:
shutil.move(file_produced, file_renamed)
.. |nl| raw:: html