CSVTK
Perform various operations over CSV/TSV tables.
URL: https://bioinf.shenwei.me/csvtk/
Example
This wrapper can be used in the following way:
### Concatenation subcommand ###
rule test_csvtk_cat:
input:
table=["table.csv", "right.csv"],
output:
"csvtk/cat.csv",
log:
"logs/cat.log",
params:
subcommand="cat",
extra="",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Summary subcommand ###
rule test_csvtk_summary_csv:
input:
table="table.csv",
output:
"csvtk/summary_csv.csv",
log:
"logs/summary_csv.log",
params:
subcommand="summary",
extra="--fields s1,s3",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
rule test_csvtk_summary_tsv:
input:
table="table.tsv",
output:
"csvtk/summary_tsv.csv",
log:
"logs/summary_tsv.log",
params:
subcommand="summary",
extra="--fields s1,s3",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Frequency subcommand ###
rule test_csvtk_frequency:
input:
table="table.csv",
output:
"csvtk/frequency.csv",
log:
"logs/frequency.log",
params:
subcommand="freq",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Headers subcommand ###
rule test_csvtk_headers:
input:
table="table.csv",
output:
"csvtk/headers.csv",
log:
"logs/headers.log",
params:
subcommand="headers",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Join subcommand ###
rule test_csvtk_join:
input:
table=["table.csv", "right.csv"],
output:
"csvtk/join.csv",
log:
"logs/join.log",
params:
subcommand="join",
col1="gene_id",
col2="gene_id",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Sample subcommand ###
rule test_csvtk_sample:
input:
table="table.csv",
output:
"csvtk/sample.csv",
log:
"logs/sample.log",
params:
subcommand="sample",
extra="-s 123 -p 0.5",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Grep subcommand ###
rule test_csvtk_grep:
input:
table="table.csv",
output:
"csvtk/grep.csv",
log:
"logs/grep.log",
params:
subcommand="grep",
extra="--fields gene_id --pattern ENSG[0-9]+",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Cut subcommand ###
rule test_csvtk_cut:
input:
table="table.csv",
output:
"csvtk/cut.csv",
log:
"logs/cut.log",
params:
subcommand="cut",
extra="-f 2",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Sort subcommand ###
rule test_csvtk_sort:
input:
table="table.csv",
output:
"csvtk/sort.csv",
log:
"logs/sort.log",
params:
subcommand="sort",
extra="--keys 1",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Split subcommand ###
rule test_csvtk_split:
input:
table="table.csv",
output:
directory("csvtk/split"),
log:
"logs/split.log",
params:
subcommand="split",
extra="-f gene_id",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Stats subcommand ###
rule test_csvtk_stats:
input:
table="table.csv",
output:
"csvtk/stats.txt",
log:
"logs/stats.log",
params:
subcommand="stats",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
### Uniq subcommand ###
rule test_csvtk_uniq:
input:
table="table.csv",
output:
"csvtk/uniq.txt",
log:
"logs/uniq.log",
params:
subcommand="uniq",
extra="-f gene_id",
threads: 1
wrapper:
"v9.5.0/utils/csvtk"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
csvtk=0.37.0snakemake-wrapper-utils=0.8.0
Input/Output
Input:
Path to CSV/TSV file.
Output:
Path to result file / directory
Params
extra: Optional arguments for csvtk (for TSV files, –delimiter is automatically set).subcommand: csvtk subcommand among cat, count, fixlengths, flatten, fmt, frequency, headers, index, input, join, sample, search, select, slice, sort, split, stats, or table.
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_format
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
subcommand = snakemake.params["subcommand"]
extra = snakemake.params.get("extra", "")
# Input file delimiter
in_files = (
[Path(snakemake.input[0])]
if len(snakemake.input) == 1
else [Path(in_file) for in_file in snakemake.input]
)
if all(get_format(in_file) == "tsv" for in_file in in_files if in_file.suffix):
extra += " --tabs"
# Output file delimiter
out_files = (
[Path(snakemake.output[0])]
if len(snakemake.output) == 1
else [Path(out_file) for out_file in snakemake.output]
)
if all(get_format(out_file) == "tsv" for out_file in out_files if out_file.suffix):
extra += " --out-tabs"
shell(
"csvtk {subcommand} --num-cpus {snakemake.threads} {extra} --out-file {snakemake.output} {snakemake.input} {log}"
)