BUSCO
Assess assembly and annotation completeness with BUSCO
Example
This wrapper can be used in the following way:
rule run_busco:
input:
"protein.fasta",
output:
short_json="txome_busco/short_summary.json",
short_txt="txome_busco/short_summary.txt",
full_table="txome_busco/full_table.tsv",
miss_list="txome_busco/busco_missing.tsv",
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco.log",
params:
mode="proteins",
lineage="stramenopiles_odb10",
# optional parameters
extra="",
threads: 8
wrapper:
"v5.7.0/bio/busco"
rule run_busco_euk:
input:
"protein.fasta",
output:
out_dir=directory("txome_busco/euk"),
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco_euk.log",
params:
mode="proteins",
# optional parameters
extra="--auto-lineage-euk",
threads: 8
wrapper:
"v5.7.0/bio/busco"
rule run_busco_prok:
input:
"protein.fasta",
output:
out_dir=directory("txome_busco/prok"),
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco_prok.log",
params:
mode="proteins",
# optional parameters
extra="--auto-lineage-prok",
threads: 8
wrapper:
"v5.7.0/bio/busco"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The lineage parameter sets the lineage dataset ot use (optional). In auto-lineage mode, output or dataset folders need to be retrieved as a whole, since it is not possible to infer the output file names (they depend on the best lineage match).
Software dependencies
busco=5.8.2
Input/Output
Input:
Path to assembly fasta
Output:
out_dir
: Path to annotation quality filesdataset_dir
: Optional path to dataset directoryshort_txt
: Optional path to plain text results summary. Requires parameter lineage.short_json
: Optional path to json formatted results summary. Requires parameter lineage.full_table
: Optional path to TSV formatted results. Requires parameter lineage.miss_list
: Contains a list of missing BUSCOs. Requires parameter lineage.
Params
lineage
: Assembly lineage.mode
: Either genome, transcriptome, and proteinsextra
: Optional parameters besides mode –lineage, –cpu and IO files.
Code
"""Snakemake wrapper for BUSCO assessment"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode in [
"genome",
"transcriptome",
"proteins",
], "invalid run mode: only 'genome', 'transcriptome' or 'proteins' allowed"
lineage = lineage_opt = snakemake.params.get("lineage", "")
if lineage_opt:
lineage_opt = f"--lineage {lineage_opt}"
with tempfile.TemporaryDirectory() as tmpdir:
dataset_dir = snakemake.input.get("dataset_dir", "")
if not dataset_dir:
dataset_dir = f"{tmpdir}/dataset"
shell(
"busco"
" --cpu {snakemake.threads}"
" --in {snakemake.input}"
" --mode {mode}"
" {lineage_opt}"
" {extra}"
" --download_path {dataset_dir}"
" --out_path {tmpdir}"
" --out output"
" {log}"
)
if snakemake.output.get("short_txt"):
assert lineage, "parameter 'lineage' is required to output 'short_tsv'"
shell(
"cat {tmpdir}/output/short_summary.specific.{lineage}.output.txt > {snakemake.output.short_txt:q}"
)
if snakemake.output.get("short_json"):
assert lineage, "parameter 'lineage' is required to output 'short_json'"
shell(
"cat {tmpdir}/output/short_summary.specific.{lineage}.output.json > {snakemake.output.short_json:q}"
)
if snakemake.output.get("full_table"):
assert lineage, "parameter 'lineage' is required to output 'full_table'"
shell(
"cat {tmpdir}/output/run_{lineage}/full_table.tsv > {snakemake.output.full_table:q}"
)
if snakemake.output.get("miss_list"):
assert lineage, "parameter 'lineage' is required to output 'miss_list'"
shell(
"cat {tmpdir}/output/run_{lineage}/missing_busco_list.tsv > {snakemake.output.miss_list:q}"
)
if snakemake.output.get("out_dir"):
shell("mv {tmpdir}/output {snakemake.output.out_dir:q}")
if snakemake.output.get("dataset_dir"):
shell("mv {dataset_dir} {snakemake.output.dataset_dir:q}")