Giter Site home page Giter Site logo

Add some way to split a field about tsv-utils HOT 2 OPEN

ebay avatar ebay commented on August 14, 2024
Add some way to split a field

from tsv-utils.

Comments (2)

jondegenhardt avatar jondegenhardt commented on August 14, 2024

Nice use case. My first thought is to wonder if there enough commonality in these patterns to develop a tool around. More examples would shed light on this. But, if it turned out that the flexibility of awk or sed is needed, then it might be best to leave these tasks to those tools and custom scripts.

from tsv-utils.

Llammissar avatar Llammissar commented on August 14, 2024

That's a good point, and I'm not unsympathetic to it at all. If I hit more examples, I'll try to remember to outline them here.

I'll note up front that I really don't like sed/awk for this sort of thing because they're specifically general line-oriented tools. It's fine if there's something like "cores" to anchor on for extracting numbers and splitting them (and I think you rightly surmise that I wasn't looking to necessarily extract the column name in the same operation), but for the more general case? They're clunky-- the awareness of columns is extremely powerful and useful.

Just doodling here, but something like:
tsv-filter --split 1:_:cores,threads
...could be helpful. Or maybe something like regex substitution via capture groups:
tsv-filter --split 1:'([0-9]+)cores_([0-9]+)threads':cores,threads
...if we continue looking at my original example. (The column selector is necessary for the more general case that you have multiple columns with the delimiter of interest -- colon, for example -- but you only want to split one of them and the other is something like a timestamp.)

Broadly, I think I'd characterise this class of problem as "normalisation", which also includes other transformations on columns. (For example, some existing tools produce measures in whole seconds, so I want to multiply that my 1000 or divide the millisecond metrics by the same so they can be compared properly. ...This might be a separate ER?)

from tsv-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.