MINIMAP2

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/minimap2/aligner?label=version%20update%20pull%20requests

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

URL: https://lh3.github.io/minimap2

Example

This wrapper can be used in the following way:

rule minimap2_paf:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"],
    output:
        "aligned/{input1}_aln.paf",
    log:
        "logs/minimap2/{input1}.log",
    params:
        extra="-x map-pb",  # optional
        sorting="coordinate",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_sam:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"],
    output:
        "aligned/{input1}_aln.sam",
    log:
        "logs/minimap2/{input1}.log",
    params:
        extra="-x map-pb",  # optional
        sorting="none",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_sam_sorted:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"],
    output:
        "aligned/{input1}_aln.sorted.sam",
    log:
        "logs/minimap2/{input1}.log",
    params:
        extra="-x map-pb",  # optional
        sorting="queryname",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_bam_sorted:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"],
    output:
        "aligned/{input1}_aln.sorted.bam",
    log:
        "logs/minimap2/{input1}.log",
    params:
        extra="-x map-pb",  # optional
        sorting="coordinate",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"

rule minimap2_ubam_paf:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query="query/reads.bam",
    output:
        "aligned/{input1}_aln.ubam.paf",
    log:
        "logs/minimap2/{input1}.ubam.log",
    params:
        extra="-x map-pb",  # optional
        sorting="coordinate",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_ubam_sam:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query="query/reads.bam",
    output:
        "aligned/{input1}_aln.ubam.sam",
    log:
        "logs/minimap2/{input1}.ubam.log",
    params:
        extra="-x map-pb",  # optional
        sorting="none",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_ubam_sam_sorted:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query="query/reads.bam",
    output:
        "aligned/{input1}_aln.sorted.ubam.sam",
    log:
        "logs/minimap2/{input1}.ubam.log",
    params:
        extra="-x map-pb",  # optional
        sorting="queryname",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"


rule minimap2_ubam_bam_sorted:
    input:
        target="target/{input1}.mmi",  # can be either genome index or genome fasta
        query="query/reads.bam",
    output:
        "aligned/{input1}_aln.sorted.ubam.bam",
    log:
        "logs/minimap2/{input1}.ubam.log",
    params:
        extra="-x map-pb",  # optional
        sorting="coordinate",  # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v3.9.0/bio/minimap2/aligner"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The extra param allows for additional arguments for minimap2.

  • The sort param allows to enable sorting (if output not PAF), and can be either ‘none’, ‘queryname’ or ‘coordinate’.

  • The sort_extra allows for extra arguments for samtools/picard

Software dependencies

  • minimap2=2.28

  • samtools=1.20

  • snakemake-wrapper-utils=0.6.2

Input/Output

Input:

  • FASTQ file(s) or unaligned BAM file

  • reference genome

Output:

  • SAM/BAM/CRAM file

Authors

  • Tom Poorten

  • Michael Hall

  • Filipe G. Vieira

Code

__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
from snakemake_wrapper_utils.samtools import get_samtools_opts


samtools_opts = get_samtools_opts(
    snakemake, parse_output=False, param_name="sort_extra"
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

if isinstance(snakemake.input.query, list):
    in_ext = infer_out_format(snakemake.input.query[0])
    if in_ext == "BAM" and len(snakemake.input.query) > 1:
        raise ValueError(f"uBAM input mode only supports a single uBAM file")
else:
    in_ext = infer_out_format(snakemake.input.query)

pre_cmd = ""
query = ""
if in_ext == "BAM":
    # convert uBAM to fastq keeping all tags
    pre_cmd = f'samtools fastq -T "*" {snakemake.input.query} |'
    # tell minimap2 to parse tags from fastq header
    extra += " -y"
    query = "-"
else:
    query = snakemake.input.query

out_ext = infer_out_format(snakemake.output[0])

pipe_cmd = ""
if out_ext != "PAF":
    # Add option for SAM output
    extra += " -a"

    # Determine which pipe command to use for converting to bam or sorting.
    if sort == "none":
        if out_ext != "SAM":
            # Simply convert to output format using samtools view.
            pipe_cmd = f"| samtools view -h {samtools_opts}"

    elif sort in ["coordinate", "queryname"]:
        # Add name flag if needed.
        if sort == "queryname":
            sort_extra += " -n"

        # Sort alignments.
        pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"

    else:
        raise ValueError(f"Unexpected value for params.sort: {sort}")

shell(
    "({pre_cmd}"
    " minimap2"
    " -t {snakemake.threads}"
    " {extra} "
    " {snakemake.input.target}"
    " {query}"
    " {pipe_cmd}"
    " > {snakemake.output[0]}"
    ") {log}"
)