CSVTK

Perform various operations over CSV/TSV tables.

URL: https://bioinf.shenwei.me/csvtk/

Example

This wrapper can be used in the following way:

### Concatenation subcommand ###
rule test_csvtk_cat:
    input:
        table=["table.csv", "right.csv"],
    output:
        "csvtk/cat.csv",
    log:
        "logs/cat.log",
    params:
        subcommand="cat",
        extra="",
    threads: 1
    wrapper:
        "v3.9.0/utils/csvtk"


### Summary subcommand ###
rule test_csvtk_summary:
    input:
        table="table.csv",
    output:
        "csvtk/summary.csv",
    log:
        "logs/summary_csv.log",
    params:
        subcommand="summary",
        extra="--fields s1,s3",
    threads: 1
    wrapper:
        "v3.9.0/utils/csvtk"


use rule test_csvtk_summary as test_csvtk_summary_tsv_input with:
    input:
        table="table.tsv",
    output:
        "csvtk/summary_tsv.csv",
    log:
        "logs/summary_tsv.log",


### Frequency subcommand ###
use rule test_csvtk_summary as test_csvtk_frequency with:
    output:
        "csvtk/frequency.csv",
    log:
        "logs/frequency.log",
    params:
        subcommand="freq",


### Headers subcommand ###
use rule test_csvtk_summary as test_csvtk_headers with:
    output:
        "csvtk/headers.csv",
    log:
        "logs/headers.log",
    params:
        subcommand="headers",


### Join subcommand ###
use rule test_csvtk_cat as test_csvtk_join with:
    output:
        "csvtk/join.csv",
    log:
        "logs/join.log",
    params:
        subcommand="join",
        col1="gene_id",
        col2="gene_id",


### Sample subcommand ###
use rule test_csvtk_summary as test_csvtk_sample with:
    output:
        "csvtk/sample.csv",
    log:
        "logs/sample.log",
    params:
        subcommand="sample",
        extra="-s 123 -p 0.5",


### Grep subcommand ###
use rule test_csvtk_summary as test_csvtk_grep with:
    output:
        "csvtk/grep.csv",
    log:
        "logs/grep.log",
    params:
        subcommand="grep",
        extra="--fields gene_id --pattern ENSG[0-9]+",


### Cut subcommand ###
use rule test_csvtk_summary as test_csvtk_cut with:
    output:
        "csvtk/cut.csv",
    log:
        "logs/cut.log",
    params:
        subcommand="cut",
        extra="-f 2",


### Sort subcommand ###
use rule test_csvtk_summary as test_csvtk_sort with:
    output:
        "csvtk/sort.csv",
    log:
        "logs/sort.log",
    params:
        subcommand="sort",
        extra="--keys 1",


### Split subcommand ###
use rule test_csvtk_summary as test_csvtk_split with:
    output:
        directory("csvtk/split"),
    log:
        "logs/split.log",
    params:
        subcommand="split",
        extra="-f gene_id",


### Stats subcommand ###
use rule test_csvtk_summary as test_csvtk_stats with:
    output:
        "csvtk/stats.txt",
    log:
        "logs/stats.log",
    params:
        subcommand="stats",


### Uniq subcommand ###
use rule test_csvtk_summary as test_csvtk_uniq with:
    output:
        "csvtk/uniq.txt",
    log:
        "logs/uniq.log",
    params:
        subcommand="uniq",
        extra="-f gene_id",

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

csvtk=0.30.0

Input/Output

Input:

Path to CSV/TSV table.

Output:

Path the result file / directory

Params

extra: Optional arguments for csvtk (for TSV files, –delimiter is automatically set).
subcommand: csvtk subcommand among cat, count, fixlengths, flatten, fmt, frequency, headers, index, input, join, sample, search, select, slice, sort, split, stats, or table

Authors

Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
subcommand = snakemake.params["subcommand"]
extra = snakemake.params.get("extra", "")

# Input TSV delimiter
if len(snakemake.input) == 1:
    if str(snakemake.input).removesuffix(".gz").endswith(".tsv"):
        extra += " --tabs"
elif all(input.removesuffix(".gz").endswith(".tsv") for input in snakemake.input):
    extra += " --tabs"


# Output TSV delimiter
if len(snakemake.output) == 1:
    if str(snakemake.output).removesuffix(".gz").endswith(".tsv"):
        extra += " --out-tabs"
elif all(output.removesuffix(".gz").endswith(".tsv") for output in snakemake.output):
    extra += " --out-tabs"


shell(
    "csvtk {subcommand} --num-cpus {snakemake.threads} {extra} --out-file {snakemake.output} {snakemake.input} {log}"
)