PRESEQ LC_EXTRAP
preseq
estimates the library complexity of existing sequencing data to then estimate the yield of future experiments based on their design.
URL: https://github.com/smithlabcode/preseq
Example
This wrapper can be used in the following way:
rule preseq_lc_extrap_bam:
input:
"samples/{sample}.sorted.bam"
output:
"test_bam/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bam/{sample}.log"
wrapper:
"v3.6.0-3-gc8272d7/bio/preseq/lc_extrap"
rule preseq_lc_extrap_bed:
input:
"samples/{sample}.sorted.bed"
output:
"test_bed/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bed/{sample}.log"
wrapper:
"v3.6.0-3-gc8272d7/bio/preseq/lc_extrap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
preseq=3.2.0
Input/Output
Input:
bed files containing duplicates and sorted by chromosome, start position, strand position and finally strand OR
bam files containing duplicates and sorted by using bamtools or samtools sort.
Output:
lc_extrap (.lc_extrap)
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
if "-bam" not in (snakemake.input[0]):
params = "-bam "
shell(
"(preseq lc_extrap {params} {snakemake.params} {snakemake.input[0]} -output {snakemake.output[0]}) {log}"
)