.. _contributing: Contributing ============ We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure: #. Fork the repository: https://github.com/snakemake/snakemake-wrappers #. Clone your fork locally. #. Locally, create a new branch: ``git checkout -b my-new-snakemake-wrapper`` #. Commit your contributions to that branch and push them to your fork: ``git push -u origin my-new-snakemake-wrapper`` #. Create a pull request. The pull request will be reviewed and included as fast as possible. If your pull request does not get a review quickly, you can `@mention ` previous contributors to a particular wrapper (``git blame``) or regular contributors that you think might be able to give a review. In general, always take inspiration from existing wrappers. And then, contributions should: * provide the following files: * ``meta.yaml`` (wrapper description), see :ref:`meta` * ``environment.yaml`` (required software), see :ref:`environment` * ``environment.linux-64.pin.yaml`` (autogenerated pinning of the software), see :ref:`environment` * ``wrapper.py`` or ``wrapper.R`` (actual wrapper code), see :ref:`wrapper` * ``test/Snakefile`` (minimal test cases and copy-pasteable examples), see :ref:`Snakefile` * amend ``test.py`` to call all of the testing rules provided in ``test/Snakefile``, see :ref:`test` * ensure consistent: * `formatting`_ of Python files * `linting`_ of Snakefiles .. _development environment: ``conda``/``mamba`` environment for development ----------------------------------------------- To have all the tools you need for developing and testing wrappers in one single ``conda``/``mamba`` environment: 1. `Install miniforge `_. 2. Set up the channels as `described for bioconda `_. 3. Create an environment with the necessary dependencies: .. code-block:: bash mamba create -n snakemake-wrappers-development -c conda-forge -c bioconda snakemake snakefmt snakedeploy black mamba pytest 4. Activate the environment with: .. code-block:: bash mamba activate snakemake-wrappers-development .. _meta: ``meta.yaml`` file ------------------- This file describes the wrapper and how to use it. The general file syntax is `YAML`_. Text / strings (values in a `YAML`_ `mapping`_ or `sequence`_), can use `reStructuredText`_ syntax. The following fields are available to use in the wrapper ``meta.yaml`` file. All, except those marked optional, should be provided. Especially make sure to include a URL of the respective tool's documentation. * **name**: The name of the wrapper. * **description**: a description of what the wrapper does. * **url**: URL to the wrapper tool webpage. * **authors**: A `sequence`_ of names of the people who have contributed to the wrapper. * **input**: A `mapping`_ or `sequence`_ of required inputs for the wrapper. * **output**: A `mapping`_ or `sequence`_ of output(s) from the wrapper. * **params** (optional): A `mapping`_ of parameters that can be used in the wrapper's ``params`` directive. If no parameters are used for the wrapper, this field can be omitted. * **notes** (optional): Anything of note that does not fit into the scope of the other fields. You can add a newline to the rendered text in these fields with the addition of ``|nl|``. Example ^^^^^^^ .. code-block:: yaml name: seqtk mergepe description: Interleave two paired-end FASTA/Q files url: https://github.com/lh3/seqtk authors: - Michael Hall input: - paired fastq files - can be compressed. output: - > a single, interleaved FASTA/Q file. By default, the output will be compressed, use the param ``compress_lvl`` to change this. params: compress_lvl: > Regulate the speed of compression using the specified digit, where 1 indicates the fastest compression method (less compression) and 9 indicates the slowest compression method (best compression). 0 is no compression. 11 gives a few percent better compression at a severe cost in execution time, using the zopfli algorithm. The default is 6. notes: Multiple threads can be used during compression of the output file with ``pigz``. .. _YAML: https://yaml.org/spec/1.2.2/ .. _sequence: https://yaml.org/spec/1.2.2/#21-collections .. _mapping: https://yaml.org/spec/1.2.2/#21-collections .. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html .. _environment: ``environment.yaml`` file ------------------------- This file needs to list all the software that the wrapper code needes to run successfully. For all software following `semantic versioning `_ conventions, specify (and thus pin) the major and minor version, but leave the patch version unspecified. Also, unless this is needed to work around version incompatibilities not properly handled by the conda packages themselves, only specify the actual software needed and let ``conda``/``mamba`` determine the dependencies. To make sure that ``conda``/``mamba`` knows where to look for the package, include a list of all of the conda channels that the software and its dependencies require. This will usually include `conda-forge `_, as it contains many essential libraries that other packages and tools depend on. This channel should usually be specified first, to make sure it takes precedence (``snakemake`` asks users to ``conda config --set channel_priority strict``). In addition, you may need to include other sustainable community maintained channels (like `bioconda `_). And as the last channel specification, always include ``nodefaults``. This avoids software dependency conflicts between the ``conda-forge`` channel and the ``default`` channels that should not be needed nowadays. Finally, make sure to run ``snakedeploy pin-conda-envs environment.yaml`` on the finished environment specification. This will generate a file called ``environment.linux-64.pin.txt`` with all the dependency versions determined by ``conda``/``mamba``, ensuring that a particular wrapper version will always generate the exact same environment with the exact package versions from this file. You should include this pinning file in the pull request for your wrapper. Example ^^^^^^^ .. code-block:: yaml channels: - conda-forge - bioconda - nodefaults dependencies: - bioconductor-biomart =2.58 - r-nanoparquet =0.3 - r-tidyverse = 2.0 .. _wrapper: ``wrapper.py`` or ``wrapper.R`` file ------------------------------------ This is the actual code that the wrapper executes. It is handled like an `external script in snakemake `_, so you have the respective `snakemake` objects available. Please ensure that the wrapper: * can deal with arbitrary ``input:`` and ``output:`` paths and filenames * redirects `stdout` and `stderr` to log files specified by the `log:` directive (typical boilerplate code can for example be found in `this knowledge base `_) * automatically infers command line arguments wherever possible (for example based on file extensions in ``input:`` and ``output:``) * passes on the `threads` value, if the used tool(s) allow(s) it * writes any temporary files to a unique hidden folder in the working directory, or (better) stores them where the Python function `tempfile.gettempdir() `_ points (this also means that using any Python tempfile default behavior works) * is formatted according to the language's standards (for Python, format it with `black `_: ``black wrapper.py``) For repeatedly needed functionality you can use the `snakemake-wrapper-utils `_. Use what is available or create new functionality there, whenever you start repeating functions across wrappers. Examples of this are: 1. The command line argument parsing for a software tool like ``samtools`` where you create one wrapper each for a number of different subcommands that share the main arguments. See the `samtools.py `_ utility functions for the respective functionality. 2. The handling of recurring Java options, for example things like memory handling. See `java.py `_ for the respective functionality. To use ``snakemake-wrapper-utils``, you have to include them as a depenency in your :ref:`environment` definition file and import the respective function(s) in your :ref:`wrapper` script (for example ``from snakemake_wrapper_utils.java import get_java_opts``). .. _snakefile: ``test/Snakefile`` file ----------------------- In a subfolder called ``test``, create a ``Snakefile`` with example invocations of the wrapper. These examples should comprehensively showcase the available functionality of the wrapper, as they serve as both the copy-pasteable examples rendered in the documentation, and the test cases run in the continuous integration testing (make sure to include calls to the rules in ``test.py``, see :ref:`test`). If these rules need any input data, you can also include minimal (small) testing data in the ``test/`` folder (also check existing wrappers for suitable data). When writing the ``Snakefile``, please ensure that: * rule names in the examples are in `snake_case `_ and descriptive (they should explain what the rule is does, or match the tool's purpose or name; for example ``map_reads`` for a step that maps reads) * it is formatted correctly by running `snakefmt `_ (``snakefmt Snakefile``) * it also passes linting, see :ref:`linting` * all example rules in your ``test/Snakefile`` have an invocation as a test case in ``test.py``, see :ref:`test` * wherever you can do this with a short comment, explain possible settings for all keywords like ``input:``, ``output:``, ``params:``, ``threads:``, etc. (provide longer explanations in the :ref:`meta` file) * provide a sensible default for ``threads:``, if more than one thread can be used by the wrapper .. _test: ``test.py`` tests file ---------------------- Every example rule listed in a :ref:`snakefile`, should be included as a test case in ``test.py``. The easiest way is usually to duplicate an existing test and adapt it to your newly added example rule. When done editing, make sure that ``test.py`` :ref:`formatting` still follows |black|_ standards. Example ^^^^^^^ .. code-block:: python @skip_if_not_modified def test_bcftools_sort(): run( "bio/bcftools/sort", ["snakemake", "--cores", "1", "--use-conda", "-F", "a.sorted.bcf"], ) .. _formatting: Formatting ---------- Please ensure Python files such as ``test.py`` and ``wrapper.py`` are formatted with |black|_. Additionally, please format your test ``Snakefile`` with |snakefmt|_. .. |black| replace:: ``black`` .. _black: https://github.com/psf/black .. |snakefmt| replace:: ``snakefmt`` .. _snakefmt: https://github.com/snakemake/snakefmt .. _linting: Linting ------- Please `lint`_ your test ``Snakefile`` with:: snakemake -s --lint .. _lint: https://snakemake.readthedocs.io/en/stable/snakefiles/writing_snakefiles.html#best-practices Testing locally --------------- If you want to debug your contribution locally before creating a pull request, ensure you have the :ref:`development environment` installed and activated. Afterwards, from the main directory of the repo, you can run the test(s) for your contribution by `specifying an expression `_ that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``: .. code-block:: bash pytest test.py -v -k your_test If you also want to test the docs generation locally, create another environment and activate it: .. code-block:: bash mamba create -n test-snakemake-wrapper-docs -c conda-forge sphinx sphinx_rtd_theme pyyaml sphinx-copybutton sphinxawesome_theme myst-parser mamba activate test-snakemake-wrapper-docs Then, enter the respective directory and build the docs: .. code-block:: bash cd docs make html If it runs through, you can open the main page at ``docs/_build/html/index.html`` in a web browser. If you want to start fresh, you can clean up the build with ``make clean``. .. |mamba| replace:: ``mamba`` .. _mamba: https://github.com/mamba-org/mamba .. |conda| replace:: ``conda`` .. _conda: https://conda.io