Giter Site home page Giter Site logo

Comments (11)

jquartel avatar jquartel commented on June 15, 2024 1

I deal with numerical tuples quite a lot and sometimes I'll have two columns that I need to compute together to make a new tuple. So I might have in Col1 "1.0 0.8 -0.1", and in Col2 "0.0 -0.2 0.9" and I need to make a new column with the vector sum of these, so I'd create a column from Col1 and use
forEachIndex(value.split(" "),i,c1,with(cells.Col2.value.split(" "),c2,c1.toNumber()+c2[i].toNumber())).join(" ")
which you can see is a bit clunky and asymmetric.

But now that you mention loops (and combining more arrays than one), I realise there is actually a reasonably symmetric (albeit a bit more verbose with the indexing) construct I could use instead. That is:
with(value.split(" "), c1, with(cells.Col2.value.split(" "), c2, forRange(0,3,1,i,c1[i].toNumber()+c2[i].toNumber()))).join(" ")
and this is effectively a loop construct that works with any number of arrays. Although the with controls aren't strictly necessary here, I do sometimes need arithmetic expressions that are a bit more complicated and being able to refer to them with a bound variable rather than the full 'split' deconstruction is handy.

It was really when thinking about the similar problem of zipping strings (the example in #5440) that made me wonder if a more general control might be useful. But yes, I see that maybe something that only works with two arrays is weirdly specific (though I do think it may be pretty common).

from openrefine.

tfmorris avatar tfmorris commented on June 15, 2024 1

The correct zip issue reference is #5440. Python's zip(*iterables) function accepts an arbitrary number of iterables.

More generally programming language design is a pretty nuanced art, so it's helpful to have examples in other languages which can be used as a template -- and preferably one or a small number of languages, so we don't end up with a polyglot mix of Python, R, PHP, and Rust syntaxes.

from openrefine.

thadguidry avatar thadguidry commented on June 15, 2024

why only 2 arrays? why not 5 or 25? This feels more like a need for better pipelining of operations (loops?) and currying. I'm not even sure that we'd want to do something like this within GREL syntax specifically. It might be better outside of GREL and take another form. If it was part of GREL, it might be better to break it off into another control structure entirely. Maybe something like Groovy's Spread map operator. I dunno.

from openrefine.

thadguidry avatar thadguidry commented on June 15, 2024

Could you instead rewrite your needs and expressions with some real world data and put it into a numbered list of operations you would like to perform on the data and its structure? Currently, it's really hard to understand it as you have written. Don't worry about the GREL syntax you want or think you might like... just give us the problem you have in paragraphs and pictures if you can, and then describe in a numbered listing in simple sentences of the order of what/where the data would need to be changed based on whatever criteria, rules, loops, whatever.

from openrefine.

wetneb avatar wetneb commented on June 15, 2024

Intuitively I'd do that via a zip function: forEach(zip(e1, e2), pair, some_function_of(pair[0], pair[1])).

from openrefine.

thadguidry avatar thadguidry commented on June 15, 2024

It seems perhaps that this term zip-like came into computer science around the later 1990's thereabouts from what I can tell? So I was not familiar with the original ask on #5440 and had no comment, as no reference of the algorithm or behavior was given, and why I asked for an explanation here of how the algorithm would work in practice. But now reading around, I can see this zip or zipping behavior can also be looked at almost like a pivot or transpose function, where essentially zipping translates sequences into sequences where, if visualized in two dimensions, the rows and columns are swapped?

Hmm, perhaps it might make sense for our user base to label this new GREL function for zip-like behavior as a transpose function in the interests of keeping terminology more likely to be familiar for them within data tools and less so within computer science or programming? I'd definitely avoid labelling it as innerjoin which it basically could also be called if one specific behavior trait was needed. Interestingly enough, Groovy programming language has GQuery (also see link below) which brings about SQL-like joining to mimic zip-like behavior.

@wetneb We'd also have to make some hard choices on that kind of joining behavior as well since zip implementations or libraries have different strengths and weaknesses. Like would the stopping policy of the algorithm be that it would stop after the shortest list?

Thoughts?

from openrefine.

thadguidry avatar thadguidry commented on June 15, 2024

Forgot to add, I am not opposed to GREL adding a simple zip function and labeling it as such... but I'm always on the lookout for what the use cases are and if there's broader applicability that might be hiding. We could certainly down the road add zip and transpose and map functions and more as necessary, as long as things can work together cohesively and solve problems and feel natural for most of our users.

from openrefine.

jquartel avatar jquartel commented on June 15, 2024

Yes I had to look up 'zip' myself as I'm not so familiar with python, and yes it's a kind of transpose operation (though it's a little asymmetric as it will turn two or more arrays into one array of arrays). The example on #5440 is specifically about string-concatenation of corresponding elements of (two) arrays of strings.

I think @wetneb's description for
forEach(zip(e1, e2), pair, some_function_of(pair[0], pair[1]))
is very nice but unless zip can work with more than two arguments then we're a bit limited. We could get around this by creating a transpose function which specifically works on an array of arrays, so
trans([a1 a2 a3...]) gives [[a1[0], a2[0], a3[0],...],[a1[1], a2[1], a3[1]],...]

With this, I could solve my vector combination with
forEach(trans([cells.Col1.value.split(" "),cells.Col2.value.split(" ")]),v,v[0].toNumber()+v[1].toNumber()).join(" ")
And the string concatenation from #5440 would look exactly the same but without the toNumber() calls, and splitting/joining with a comma separator rather than space.

As an aside, you can probably see that one of my problems with numerical tuples is having to call toNumber on each element. This prompted #4935.

from openrefine.

jquartel avatar jquartel commented on June 15, 2024

Ah yes sorry about the typo on the reference.
And yes, a zip function (or whatever it may be called) would certainly simplify my case adequately.

from openrefine.

jquartel avatar jquartel commented on June 15, 2024

So perhaps we can close this ticket as a bad idea (or at least a redundant one with #5440), but before we do I'd like to throw out one more idea. If we aren't averse to allowing variable numbers of arguments, suppose we just extended the current functionality of forEach to apply to multiple arrays like this:
forEach(a1, v1, a2, v2, ... , e)
where v1, v2, ... bind to the nth element of a1, a2, ... , respectively, and n iterates over the length of a1 (or the smallest of the arrays).

from openrefine.

wetneb avatar wetneb commented on June 15, 2024

I would argue for the zip function to accept a variable number of arguments: zip(firstList, secondList, thirdList).
This way, the control could be kept as it currently is.

from openrefine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.