HISAT2 INDEX
Create index files for hisat2.
URL: http://daehwankimlab.github.io/hisat2/manual/
Example
This wrapper can be used in the following way:
rule hisat2_index:
input:
fasta="{genome}.fasta",
output:
multiext(
"hisat2_index/{genome}",
".1.ht2",
".2.ht2",
".3.ht2",
".4.ht2",
".5.ht2",
".6.ht2",
".7.ht2",
".8.ht2",
),
params:
extra="",
log:
"logs/hisat2_index_{genome}.log",
threads: 2
wrapper:
"v5.8.0/bio/hisat2/index"
rule hisat2_indexL:
input:
fasta="{genome}.fasta",
output:
multiext(
"hisat2_index/{genome}",
".1.ht2l",
".2.ht2l",
".3.ht2l",
".4.ht2l",
".5.ht2l",
".6.ht2l",
".7.ht2l",
".8.ht2l",
),
params:
extra="--large-index",
log:
"logs/hisat2_indexL_{genome}.log",
threads: 2
wrapper:
"v5.8.0/bio/hisat2/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
hisat2=2.2.1
Input/Output
Input:
sequence
: list of FASTA files to index
Output:
List of output index file paths. The hisat2-build command generates 8 files with .ht2 extension for small genomes and .ht2l for large genomes (greater than ~4 Gbp). This is usually handled automatically, but you must use the correct output file extension (.ht2 or .ht2l) to match your genome size. If needed, you can force the creation of a large index by adding the parameter extra = “–large-index”, and use .ht2l as the output file extension.
Params
extra
: additional parameters that will be passed to hisat2-build.
Code
"""Snakemake wrapper for HISAT2 index"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta
# get common prefix
prefix = os.path.commonprefix(snakemake.output).rstrip(".")
shell("hisat2-build --threads {snakemake.threads} {input_seq} {extra} {prefix} {log}")