Contributing

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:

  1. Fork the repository: https://github.com/snakemake/snakemake-wrappers

  2. Clone your fork locally.

  3. Locally, create a new branch: git checkout -b my-new-snakemake-wrapper

  4. Commit your contributions to that branch and push them to your fork: git push -u origin my-new-snakemake-wrapper

  5. Create a pull request.

The pull request will be reviewed and included as fast as possible. If your pull request does not get a review quickly, you can @mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/> previous contributors to a particular wrapper (git blame) or regular contributors that you think might be able to give a review.

In general, always take inspiration from existing wrappers. And then, contributions should:

conda/mamba environment for development

To have all the tools you need for developing and testing wrappers in one single conda/mamba environment:

  1. Install miniforge.

  2. Set up the channels as described for bioconda.

  3. Create an environment with the necessary dependencies:

mamba create -n snakemake-wrappers-development -c conda-forge -c bioconda snakemake snakefmt snakedeploy black mamba pytest
  1. Activate the environment with:

mamba activate snakemake-wrappers-development

meta.yaml file

This file describes the wrapper and how to use it.

The general file syntax is YAML. Text / strings (values in a YAML mapping or sequence), can use reStructuredText syntax.

The following fields are available to use in the wrapper meta.yaml file. All, except those marked optional, should be provided. Especially make sure to include a URL of the respective tool’s documentation.

  • name: The name of the wrapper.

  • description: a description of what the wrapper does.

  • url: URL to the wrapper tool webpage.

  • authors: A sequence of names of the people who have contributed to the wrapper.

  • input: A mapping or sequence of required inputs for the wrapper.

  • output: A mapping or sequence of output(s) from the wrapper.

  • params (optional): A mapping of parameters that can be used in the wrapper’s params directive. If no parameters are used for the wrapper, this field can be omitted.

  • notes (optional): Anything of note that does not fit into the scope of the other fields.

You can add a newline to the rendered text in these fields with the addition of |nl|.

Example

name: seqtk mergepe
description: Interleave two paired-end FASTA/Q files
url: https://github.com/lh3/seqtk
authors:
  - Michael Hall
input:
  - paired fastq files - can be compressed.
output:
  - >
    a single, interleaved FASTA/Q file. By default, the output will be compressed,
    use the param ``compress_lvl`` to change this.
params:
  compress_lvl: >
    Regulate the speed of compression using the specified digit,
    where 1 indicates the fastest compression method (less compression)
    and 9 indicates the slowest compression method (best compression).
    0 is no compression. 11 gives a few percent better compression at a severe cost
    in execution time, using the zopfli algorithm. The default is 6.
notes: Multiple threads can be used during compression of the output file with ``pigz``.

environment.yaml file

This file needs to list all the software that the wrapper code needes to run successfully.

For all software following semantic versioning conventions, specify (and thus pin) the major and minor version, but leave the patch version unspecified. Also, unless this is needed to work around version incompatibilities not properly handled by the conda packages themselves, only specify the actual software needed and let conda/mamba determine the dependencies.

To make sure that conda/mamba knows where to look for the package, include a list of all of the conda channels that the software and its dependencies require. This will usually include conda-forge, as it contains many essential libraries that other packages and tools depend on. This channel should usually be specified first, to make sure it takes precedence (snakemake asks users to conda config --set channel_priority strict). In addition, you may need to include other sustainable community maintained channels (like bioconda). And as the last channel specification, always include nodefaults. This avoids software dependency conflicts between the conda-forge channel and the default channels that should not be needed nowadays.

Finally, make sure to run snakedeploy pin-conda-envs environment.yaml on the finished environment specification. This will generate a file called environment.linux-64.pin.txt with all the dependency versions determined by conda/mamba, ensuring that a particular wrapper version will always generate the exact same environment with the exact package versions from this file. You should include this pinning file in the pull request for your wrapper.

Example

channels:
  - conda-forge
  - bioconda
  - nodefaults
dependencies:
  - bioconductor-biomart =2.58
  - r-nanoparquet =0.3
  - r-tidyverse = 2.0

wrapper.py or wrapper.R file

This is the actual code that the wrapper executes. It is handled like an external script in snakemake, so you have the respective snakemake objects available.

Please ensure that the wrapper:

  • can deal with arbitrary input: and output: paths and filenames

  • redirects stdout and stderr to log files specified by the log: directive (typical boilerplate code can for example be found in this knowledge base)

  • automatically infers command line arguments wherever possible (for example based on file extensions in input: and output:)

  • passes on the threads value, if the used tool(s) allow(s) it

  • writes any temporary files to a unique hidden folder in the working directory, or (better) stores them where the Python function tempfile.gettempdir() points (this also means that using any Python tempfile default behavior works)

  • is formatted according to the language’s standards (for Python, format it with black: black wrapper.py)

For repeatedly needed functionality you can use the snakemake-wrapper-utils. Use what is available or create new functionality there, whenever you start repeating functions across wrappers. Examples of this are:

  1. The command line argument parsing for a software tool like samtools where you create one wrapper each for a number of different subcommands that share the main arguments. See the samtools.py utility functions for the respective functionality.

  2. The handling of recurring Java options, for example things like memory handling. See java.py for the respective functionality.

To use snakemake-wrapper-utils, you have to include them as a depenency in your environment.yaml file definition file and import the respective function(s) in your wrapper.py or wrapper.R file script (for example from snakemake_wrapper_utils.java import get_java_opts).

test/Snakefile file

In a subfolder called test, create a Snakefile with example invocations of the wrapper. These examples should comprehensively showcase the available functionality of the wrapper, as they serve as both the copy-pasteable examples rendered in the documentation, and the test cases run in the continuous integration testing (make sure to include calls to the rules in test.py, see test.py tests file). If these rules need any input data, you can also include minimal (small) testing data in the test/ folder (also check existing wrappers for suitable data).

When writing the Snakefile, please ensure that:

  • rule names in the examples are in snake_case and descriptive (they should explain what the rule is does, or match the tool’s purpose or name; for example map_reads for a step that maps reads)

  • it is formatted correctly by running snakefmt (snakefmt Snakefile)

  • it also passes linting, see Linting

  • all example rules in your test/Snakefile have an invocation as a test case in test.py, see test.py tests file

  • wherever you can do this with a short comment, explain possible settings for all keywords like input:, output:, params:, threads:, etc. (provide longer explanations in the meta.yaml file file)

  • provide a sensible default for threads:, if more than one thread can be used by the wrapper

test.py tests file

Every example rule listed in a test/Snakefile file, should be included as a test case in test.py. The easiest way is usually to duplicate an existing test and adapt it to your newly added example rule.

When done editing, make sure that test.py Formatting still follows |black|_ standards.

Example

@skip_if_not_modified
def test_bcftools_sort():
    run(
        "bio/bcftools/sort",
        ["snakemake", "--cores", "1", "--use-conda", "-F", "a.sorted.bcf"],
    )

Formatting

Please ensure Python files such as test.py and wrapper.py are formatted with |black|_. Additionally, please format your test Snakefile with snakefmt.

Linting

Please lint your test Snakefile with:

snakemake -s <path/to/wrapper/test/Snakefile> --lint

Testing locally

If you want to debug your contribution locally before creating a pull request, ensure you have the conda/mamba environment for development installed and activated.

Afterwards, from the main directory of the repo, you can run the test(s) for your contribution by specifying an expression that matches the name(s) of your test(s) via the -k option of pytest:

pytest test.py -v -k your_test

If you also want to test the docs generation locally, create another environment and activate it:

mamba create -n test-snakemake-wrapper-docs -c conda-forge sphinx sphinx_rtd_theme pyyaml sphinx-copybutton sphinxawesome_theme myst-parser
mamba activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs:

cd docs
make html

If it runs through, you can open the main page at docs/_build/html/index.html in a web browser. If you want to start fresh, you can clean up the build with make clean.