.. _`bio/busco`: BUSCO ===== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/busco?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/busco Assess assembly and annotation completeness with BUSCO **URL**: https://busco.ezlab.org/ Example ------- This wrapper can be used in the following way: .. code-block:: python rule run_busco: input: "protein.fasta", output: short_json="txome_busco/short_summary.json", short_txt="txome_busco/short_summary.txt", full_table="txome_busco/full_table.tsv", miss_list="txome_busco/busco_missing.tsv", dataset_dir=directory("resources/busco_downloads"), log: "logs/proteins_busco.log", params: mode="proteins", lineage="stramenopiles_odb10", # optional parameters extra="", threads: 8 wrapper: "v3.0.1/bio/busco" rule run_busco_euk: input: "protein.fasta", output: out_dir=directory("txome_busco/euk"), dataset_dir=directory("resources/busco_downloads"), log: "logs/proteins_busco_euk.log", params: mode="proteins", # optional parameters extra="--auto-lineage-euk", threads: 8 wrapper: "v3.0.1/bio/busco" rule run_busco_prok: input: "protein.fasta", output: out_dir=directory("txome_busco/prok"), dataset_dir=directory("resources/busco_downloads"), log: "logs/proteins_busco_prok.log", params: mode="proteins", # optional parameters extra="--auto-lineage-prok", threads: 8 wrapper: "v3.0.1/bio/busco" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * The `lineage` parameter sets the lineage dataset ot use (optional). In `auto-lineage` mode, `output` or `dataset` folders need to be retrieved as a whole, since it is not possible to infer the output file names (they depend on the best lineage match). Software dependencies --------------------- * ``busco=5.5.0`` Input/Output ------------ **Input:** * Path to assembly fasta **Output:** * ``out_dir``: Path to annotation quality files * ``dataset_dir``: Optional path to dataset directory * ``short_txt``: Optional path to plain text results summary. Requires parameter `lineage`. * ``short_json``: Optional path to json formatted results summary. Requires parameter `lineage`. * ``full_table``: Optional path to TSV formatted results. Requires parameter `lineage`. * ``miss_list``: Contains a list of missing BUSCOs. Requires parameter `lineage`. Params ------ * ``lineage``: Assembly lineage. * ``mode``: Either `genome`, `transcriptome`, and `proteins` * ``extra``: Optional parameters besides `mode` `--lineage`, `--cpu` and IO files. Authors ------- * Tessa Pierce * Filipe G. Vieira Code ---- .. code-block:: python """Snakemake wrapper for BUSCO assessment""" __author__ = "Tessa Pierce" __copyright__ = "Copyright 2018, Tessa Pierce" __email__ = "ntpierce@gmail.com" __license__ = "MIT" import tempfile from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") mode = snakemake.params.get("mode") assert mode in [ "genome", "transcriptome", "proteins", ], "invalid run mode: only 'genome', 'transcriptome' or 'proteins' allowed" lineage = lineage_opt = snakemake.params.get("lineage", "") if lineage_opt: lineage_opt = f"--lineage {lineage_opt}" with tempfile.TemporaryDirectory() as tmpdir: dataset_dir = snakemake.input.get("dataset_dir", "") if not dataset_dir: dataset_dir = f"{tmpdir}/dataset" shell( "busco" " --cpu {snakemake.threads}" " --in {snakemake.input}" " --mode {mode}" " {lineage_opt}" " {extra}" " --download_path {dataset_dir}" " --out_path {tmpdir}" " --out output" " {log}" ) if snakemake.output.get("short_txt"): assert lineage, "parameter 'lineage' is required to output 'short_tsv'" shell( "cat {tmpdir}/output/short_summary.specific.{lineage}.output.txt > {snakemake.output.short_txt:q}" ) if snakemake.output.get("short_json"): assert lineage, "parameter 'lineage' is required to output 'short_json'" shell( "cat {tmpdir}/output/short_summary.specific.{lineage}.output.json > {snakemake.output.short_json:q}" ) if snakemake.output.get("full_table"): assert lineage, "parameter 'lineage' is required to output 'full_table'" shell( "cat {tmpdir}/output/run_{lineage}/full_table.tsv > {snakemake.output.full_table:q}" ) if snakemake.output.get("miss_list"): assert lineage, "parameter 'lineage' is required to output 'miss_list'" shell( "cat {tmpdir}/output/run_{lineage}/missing_busco_list.tsv > {snakemake.output.miss_list:q}" ) if snakemake.output.get("out_dir"): shell("mv {tmpdir}/output {snakemake.output.out_dir:q}") if snakemake.output.get("dataset_dir"): shell("mv {dataset_dir} {snakemake.output.dataset_dir:q}") .. |nl| raw:: html