CSVTK
Perform various operations over CSV/TSV tables.
URL: https://bioinf.shenwei.me/csvtk/
Example
This wrapper can be used in the following way:
### Concatenation subcommand ###
rule test_csvtk_cat:
input:
table=["table.csv", "right.csv"],
output:
"csvtk/cat.csv",
log:
"logs/cat.log",
params:
subcommand="cat",
extra="",
threads: 1
wrapper:
"v3.9.0/utils/csvtk"
### Summary subcommand ###
rule test_csvtk_summary:
input:
table="table.csv",
output:
"csvtk/summary.csv",
log:
"logs/summary_csv.log",
params:
subcommand="summary",
extra="--fields s1,s3",
threads: 1
wrapper:
"v3.9.0/utils/csvtk"
use rule test_csvtk_summary as test_csvtk_summary_tsv_input with:
input:
table="table.tsv",
output:
"csvtk/summary_tsv.csv",
log:
"logs/summary_tsv.log",
### Frequency subcommand ###
use rule test_csvtk_summary as test_csvtk_frequency with:
output:
"csvtk/frequency.csv",
log:
"logs/frequency.log",
params:
subcommand="freq",
### Headers subcommand ###
use rule test_csvtk_summary as test_csvtk_headers with:
output:
"csvtk/headers.csv",
log:
"logs/headers.log",
params:
subcommand="headers",
### Join subcommand ###
use rule test_csvtk_cat as test_csvtk_join with:
output:
"csvtk/join.csv",
log:
"logs/join.log",
params:
subcommand="join",
col1="gene_id",
col2="gene_id",
### Sample subcommand ###
use rule test_csvtk_summary as test_csvtk_sample with:
output:
"csvtk/sample.csv",
log:
"logs/sample.log",
params:
subcommand="sample",
extra="-s 123 -p 0.5",
### Grep subcommand ###
use rule test_csvtk_summary as test_csvtk_grep with:
output:
"csvtk/grep.csv",
log:
"logs/grep.log",
params:
subcommand="grep",
extra="--fields gene_id --pattern ENSG[0-9]+",
### Cut subcommand ###
use rule test_csvtk_summary as test_csvtk_cut with:
output:
"csvtk/cut.csv",
log:
"logs/cut.log",
params:
subcommand="cut",
extra="-f 2",
### Sort subcommand ###
use rule test_csvtk_summary as test_csvtk_sort with:
output:
"csvtk/sort.csv",
log:
"logs/sort.log",
params:
subcommand="sort",
extra="--keys 1",
### Split subcommand ###
use rule test_csvtk_summary as test_csvtk_split with:
output:
directory("csvtk/split"),
log:
"logs/split.log",
params:
subcommand="split",
extra="-f gene_id",
### Stats subcommand ###
use rule test_csvtk_summary as test_csvtk_stats with:
output:
"csvtk/stats.txt",
log:
"logs/stats.log",
params:
subcommand="stats",
### Uniq subcommand ###
use rule test_csvtk_summary as test_csvtk_uniq with:
output:
"csvtk/uniq.txt",
log:
"logs/uniq.log",
params:
subcommand="uniq",
extra="-f gene_id",
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
csvtk=0.30.0
Input/Output
Input:
Path to CSV/TSV table.
Output:
Path the result file / directory
Params
extra
: Optional arguments for csvtk (for TSV files, –delimiter is automatically set).subcommand
: csvtk subcommand among cat, count, fixlengths, flatten, fmt, frequency, headers, index, input, join, sample, search, select, slice, sort, split, stats, or table
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
subcommand = snakemake.params["subcommand"]
extra = snakemake.params.get("extra", "")
# Input TSV delimiter
if len(snakemake.input) == 1:
if str(snakemake.input).removesuffix(".gz").endswith(".tsv"):
extra += " --tabs"
elif all(input.removesuffix(".gz").endswith(".tsv") for input in snakemake.input):
extra += " --tabs"
# Output TSV delimiter
if len(snakemake.output) == 1:
if str(snakemake.output).removesuffix(".gz").endswith(".tsv"):
extra += " --out-tabs"
elif all(output.removesuffix(".gz").endswith(".tsv") for output in snakemake.output):
extra += " --out-tabs"
shell(
"csvtk {subcommand} --num-cpus {snakemake.threads} {extra} --out-file {snakemake.output} {snakemake.input} {log}"
)