NGS-DISAMBIGUATE

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.

Example

This wrapper can be used in the following way:

rule disambiguate:
    input:
        a="mapped/{sample}.a.bam",
        b="mapped/{sample}.b.bam"
    output:
        a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
        b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
        a_disambiguated='disambiguate/{sample}.graft.bam',
        b_disambiguated='disambiguate/{sample}.host.bam',
        summary='qc/disambiguate/{sample}.txt'
    params:
        algorithm="bwa",
        # optional: Prefix to use for output. If omitted, a
        # suitable value is guessed from the output paths. Prefix
        # is used for the intermediate output paths, as well as
        # sample name in summary file.
        prefix="{sample}",
        extra=""
    wrapper:
        "0.75.0/bio/ngs-disambiguate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • ngs-disambiguate==2016.11.10
  • bamtools==2.4.0

Input/Output

Input:

  • species a bam file (name sorted)
  • species b bam file (name sorted)

Output:

  • bam file with ambiguous alignments for species a
  • bam file with ambiguous alignments for species b
  • bam file with unambiguous alignments for species a
  • bam file with unambiguous alignments for species b

Authors

  • Julian de Ruiter

Code

"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")

output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
    prefix = path.splitext(path.basename(snakemake.output.summary))[0]

# Run command.
shell(
    "ngs_disambiguate"
    " {extra}"
    " -o {output_dir}"
    " -s {prefix}"
    " -a {snakemake.params.algorithm}"
    " {snakemake.input.a}"
    " {snakemake.input.b}"
)

# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)

output_map = {
    output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
    output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
    output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
    output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
    output_base + "_summary.txt": snakemake.output.summary,
}

for src, dest in output_map.items():
    if src != dest:
        shell("mv {src} {dest}")