HISAT2 INDEX

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/hisat2/index?label=version%20update%20pull%20requests

Create index files for hisat2.

URL: http://daehwankimlab.github.io/hisat2/manual/

Example

This wrapper can be used in the following way:

rule hisat2_index:
    input:
        fasta="{genome}.fasta",
    output:
        multiext(
            "hisat2_index/{genome}",
            ".1.ht2",
            ".2.ht2",
            ".3.ht2",
            ".4.ht2",
            ".5.ht2",
            ".6.ht2",
            ".7.ht2",
            ".8.ht2",
        ),
    params:
        extra="",
    log:
        "logs/hisat2_index_{genome}.log",
    threads: 2
    wrapper:
        "v5.6.1-7-g2ff6d79/bio/hisat2/index"


rule hisat2_indexL:
    input:
        fasta="{genome}.fasta",
    output:
        multiext(
            "hisat2_index/{genome}",
            ".1.ht2l",
            ".2.ht2l",
            ".3.ht2l",
            ".4.ht2l",
            ".5.ht2l",
            ".6.ht2l",
            ".7.ht2l",
            ".8.ht2l",
        ),
    params:
        extra="--large-index",
    log:
        "logs/hisat2_indexL_{genome}.log",
    threads: 2
    wrapper:
        "v5.6.1-7-g2ff6d79/bio/hisat2/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • hisat2=2.2.1

Input/Output

Input:

  • sequence: list of FASTA files to index

Output:

  • List of output index file paths. The hisat2-build command generates 8 files with .ht2 extension for small genomes and .ht2l for large genomes (greater than ~4 Gbp). This is usually handled automatically, but you must use the correct output file extension (.ht2 or .ht2l) to match your genome size. If needed, you can force the creation of a large index by adding the parameter extra = “–large-index”, and use .ht2l as the output file extension.

Params

  • extra: additional parameters that will be passed to hisat2-build.

Authors

  • Joël Simoneau

  • Hugo Tavares

Code

"""Snakemake wrapper for HISAT2 index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
    input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta

# get common prefix
prefix = os.path.commonprefix(snakemake.output).rstrip(".")

shell("hisat2-build --threads {snakemake.threads} {input_seq} {extra} {prefix} {log}")