Comments (2)
Nice use case. My first thought is to wonder if there enough commonality in these patterns to develop a tool around. More examples would shed light on this. But, if it turned out that the flexibility of awk
or sed
is needed, then it might be best to leave these tasks to those tools and custom scripts.
from tsv-utils.
That's a good point, and I'm not unsympathetic to it at all. If I hit more examples, I'll try to remember to outline them here.
I'll note up front that I really don't like sed/awk for this sort of thing because they're specifically general line-oriented tools. It's fine if there's something like "cores" to anchor on for extracting numbers and splitting them (and I think you rightly surmise that I wasn't looking to necessarily extract the column name in the same operation), but for the more general case? They're clunky-- the awareness of columns is extremely powerful and useful.
Just doodling here, but something like:
tsv-filter --split 1:_:cores,threads
...could be helpful. Or maybe something like regex substitution via capture groups:
tsv-filter --split 1:'([0-9]+)cores_([0-9]+)threads':cores,threads
...if we continue looking at my original example. (The column selector is necessary for the more general case that you have multiple columns with the delimiter of interest -- colon, for example -- but you only want to split one of them and the other is something like a timestamp.)
Broadly, I think I'd characterise this class of problem as "normalisation", which also includes other transformations on columns. (For example, some existing tools produce measures in whole seconds, so I want to multiply that my 1000 or divide the millisecond metrics by the same so they can be compared properly. ...This might be a separate ER?)
from tsv-utils.
Related Issues (20)
- AUR package with LTO & PGO enabled HOT 2
- How to best use the code as a library? HOT 4
- Improve tsv-pretty lookahead logic [tsv-pretty mistake in column formatting.] HOT 8
- bufferedByLine does not work with File due to @safe <> @system conflict HOT 3
- Issue with installing on Windows 10 using D / build failure HOT 28
- tsv-summarize: Slice SummarizerBase._operators when invoking std.algorithm.each
- Inconsistent newline handling on Windows HOT 2
- Status of Windows build HOT 6
- Bulding tsv-utils with LTO and PGO on Archlinux HOT 14
- Homebrew install HOT 6
- Package tsv-utils for conda(-forge)? HOT 1
- No linux release assets for v2.2.1
- -bash: ./tsv-pretty: cannot execute binary file HOT 1
- Ability to produce proper CSV files
- Sort using column names
- tsv-append: limit number of rows per file? [feature request]
- Error [tsv-filter]: Not enough fields in line. File: c.tsv, Line: 1425063 HOT 1
- ENH: Add ARM64 build assets for native functionality on M1 macs (the future) HOT 3
- Q: any API doc? how to skip empty field in csvReader?
- Updated benchmarks including qsv HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsv-utils.