CSVTK

https://img.shields.io/badge/wrapper_version-v9.5.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/utils/csvtk?label=version%20update%20pull%20requests&color=1cb481

Perform various operations over CSV/TSV tables.

URL: https://bioinf.shenwei.me/csvtk/

Example

This wrapper can be used in the following way:

### Concatenation subcommand ###
rule test_csvtk_cat:
    input:
        table=["table.csv", "right.csv"],
    output:
        "csvtk/cat.csv",
    log:
        "logs/cat.log",
    params:
        subcommand="cat",
        extra="",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Summary subcommand ###
rule test_csvtk_summary_csv:
    input:
        table="table.csv",
    output:
        "csvtk/summary_csv.csv",
    log:
        "logs/summary_csv.log",
    params:
        subcommand="summary",
        extra="--fields s1,s3",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


rule test_csvtk_summary_tsv:
    input:
        table="table.tsv",
    output:
        "csvtk/summary_tsv.csv",
    log:
        "logs/summary_tsv.log",
    params:
        subcommand="summary",
        extra="--fields s1,s3",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Frequency subcommand ###
rule test_csvtk_frequency:
    input:
        table="table.csv",
    output:
        "csvtk/frequency.csv",
    log:
        "logs/frequency.log",
    params:
        subcommand="freq",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Headers subcommand ###
rule test_csvtk_headers:
    input:
        table="table.csv",
    output:
        "csvtk/headers.csv",
    log:
        "logs/headers.log",
    params:
        subcommand="headers",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Join subcommand ###
rule test_csvtk_join:
    input:
        table=["table.csv", "right.csv"],
    output:
        "csvtk/join.csv",
    log:
        "logs/join.log",
    params:
        subcommand="join",
        col1="gene_id",
        col2="gene_id",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Sample subcommand ###
rule test_csvtk_sample:
    input:
        table="table.csv",
    output:
        "csvtk/sample.csv",
    log:
        "logs/sample.log",
    params:
        subcommand="sample",
        extra="-s 123 -p 0.5",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Grep subcommand ###
rule test_csvtk_grep:
    input:
        table="table.csv",
    output:
        "csvtk/grep.csv",
    log:
        "logs/grep.log",
    params:
        subcommand="grep",
        extra="--fields gene_id --pattern ENSG[0-9]+",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Cut subcommand ###
rule test_csvtk_cut:
    input:
        table="table.csv",
    output:
        "csvtk/cut.csv",
    log:
        "logs/cut.log",
    params:
        subcommand="cut",
        extra="-f 2",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Sort subcommand ###
rule test_csvtk_sort:
    input:
        table="table.csv",
    output:
        "csvtk/sort.csv",
    log:
        "logs/sort.log",
    params:
        subcommand="sort",
        extra="--keys 1",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Split subcommand ###
rule test_csvtk_split:
    input:
        table="table.csv",
    output:
        directory("csvtk/split"),
    log:
        "logs/split.log",
    params:
        subcommand="split",
        extra="-f gene_id",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Stats subcommand ###
rule test_csvtk_stats:
    input:
        table="table.csv",
    output:
        "csvtk/stats.txt",
    log:
        "logs/stats.log",
    params:
        subcommand="stats",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"


### Uniq subcommand ###
rule test_csvtk_uniq:
    input:
        table="table.csv",
    output:
        "csvtk/uniq.txt",
    log:
        "logs/uniq.log",
    params:
        subcommand="uniq",
        extra="-f gene_id",
    threads: 1
    wrapper:
        "v9.5.0/utils/csvtk"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • csvtk=0.37.0

  • snakemake-wrapper-utils=0.8.0

Input/Output

Input:

  • Path to CSV/TSV file.

Output:

  • Path to result file / directory

Params

  • extra: Optional arguments for csvtk (for TSV files, –delimiter is automatically set).

  • subcommand: csvtk subcommand among cat, count, fixlengths, flatten, fmt, frequency, headers, index, input, join, sample, search, select, slice, sort, split, stats, or table.

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_format

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
subcommand = snakemake.params["subcommand"]
extra = snakemake.params.get("extra", "")


# Input file delimiter
in_files = (
    [Path(snakemake.input[0])]
    if len(snakemake.input) == 1
    else [Path(in_file) for in_file in snakemake.input]
)
if all(get_format(in_file) == "tsv" for in_file in in_files if in_file.suffix):
    extra += " --tabs"


# Output file delimiter
out_files = (
    [Path(snakemake.output[0])]
    if len(snakemake.output) == 1
    else [Path(out_file) for out_file in snakemake.output]
)
if all(get_format(out_file) == "tsv" for out_file in out_files if out_file.suffix):
    extra += " --out-tabs"


shell(
    "csvtk {subcommand} --num-cpus {snakemake.threads} {extra} --out-file {snakemake.output} {snakemake.input} {log}"
)