PYFAIDX
Pythonic indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
URL: https://github.com/mdshw5/pyfaidx?tab=readme-ov-file#cli-script-faidx
Example
This wrapper can be used in the following way:
rule test_pyfaidx_out_fasta:
input:
fasta="sequence.fasta",
bed="interval.bed",
output:
"retrieved.fasta",
log:
"test_pyfaidx.log",
params:
extra="",
regions="",
wrapper:
"v9.4.2/bio/pyfaidx"
rule test_pyfaidx_index_fasta:
input:
fasta="sequence.fasta",
bed="interval.bed",
output:
"sequence.fasta.fai",
log:
"test_pyfaidx_index_fasta.log",
params:
extra="",
regions="",
wrapper:
"v9.4.2/bio/pyfaidx"
rule test_pyfaidx_out_sizes:
input:
fasta="sequence.fasta",
bed="interval.bed",
output:
"retrieved.chrom",
params:
extra="",
regions="",
log:
"test_pyfaidx_out_sizes.log",
wrapper:
"v9.4.2/bio/pyfaidx"
rule test_pyfaidx_out_bed:
input:
fasta="sequence.fasta",
bed="interval.bed",
output:
"retrieved.bed",
params:
extra="",
regions="",
log:
"test_pyfaidx_out_bed.log",
wrapper:
"v9.4.2/bio/pyfaidx"
rule test_pyfaidx_fetch_regions:
input:
#bed="interval.bed",
fasta="sequence.fasta",
output:
"regions.fa",
params:
extra="",
regions="seq1",
log:
"test_pyfaidx_fetch_regions.log",
wrapper:
"v9.4.2/bio/pyfaidx"
rule test_pyfaidx_fetch_list_regions:
input:
#bed="interval.bed",
fasta="sequence.fasta",
output:
"list_regions.fa",
params:
extra="",
regions=["seq1", "seq2"],
log:
"test_pyfaidx_fetch_list_regions.log",
wrapper:
"v9.4.2/bio/pyfaidx"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The –transform parameter is automatically inferred from output file path. This tool automatically creates a fasta index if the output file is fasta formatted. If no index exists alongside with the input fasta file, then it will be created automatically.
Software dependencies
pyfaidx=0.9.0.4snakemake-wrapper-utils=0.8.0
Input/Output
Input:
fasta: Path to a sequence fasta filebed: Path to BED intervals (optional)
Output:
Path to the modified sequences/intervals
Params
extra: Optional parameters besides –transform, –bed and –out.regions: Optional region, or list of regions to retrieve from fasta file
Code
# coding: utf-8
"""Snakemake-wrapper for pyfaidx"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2025, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake_wrapper_utils.snakemake import get_format
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=False)
bed = snakemake.input.get("bed", "")
if bed:
extra += f" --bed {bed}"
out = str(snakemake.output[0])
fmt = get_format(out)
if fmt == "fai":
out = ""
elif fmt == "fasta":
out = f"--out {out}"
elif fmt == "bed":
out = f"--out {out} --transform bed"
elif fmt == "chrom":
out = f"--out {out} --transform chromsizes"
elif fmt == "nuc":
out = f"--out {out} --transform nucleotide"
else:
raise ValueError(f"invalid output file format: {out}")
regions = snakemake.params.get("regions", "")
shell("faidx {extra} {out} {snakemake.input.fasta} {regions} {log}")