NANOSIM-H

NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.

Example

This wrapper can be used in the following way:

rule nanosimh:
    input:
        "{sample}.fa"
    output:
        reads = "{sample}.simulated.fa",
        log = "{sample}.simulated.log",
        errors = "{sample}.simulated.errors.txt"
    params:
        extra = "",
        num_reads = 10,
        perfect_reads = True,
        min_read_len = 10,
    log:
        "logs/nanosim-h/test/{sample}.log"
    wrapper:
        "0.75.0-13-g0997adf/bio/nanosim-h"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • nanosim-h==1.1.0.4

Authors

  • Michael Hall

Code

"""Snakemake wrapper for NanoSim-H."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


def is_header(query):
    return query.startswith(">")


def get_length_of_longest_sequence(fh):
    current_length = 0
    all_lengths = []
    for line in fh:
        if not is_header(line):
            current_length += len(line.rstrip())
        else:
            all_lengths.append(current_length)
            current_length = 0
    all_lengths.append(current_length)

    return max(all_lengths)


# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)

# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
    with open(snakemake.input[0]) as fh:
        max_read_len = get_length_of_longest_sequence(fh)

perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "nanosim-h {extra} "
    "{perfect_reads_flag} "
    "--max-len {max_read_len} "
    "--min-len {min_read_len} "
    "--profile {profile} "
    "--number {num_reads} "
    "--out-pref {prefix} "
    "{snakemake.input} {log}"
)