Comments (11)
I deal with numerical tuples quite a lot and sometimes I'll have two columns that I need to compute together to make a new tuple. So I might have in Col1 "1.0 0.8 -0.1", and in Col2 "0.0 -0.2 0.9" and I need to make a new column with the vector sum of these, so I'd create a column from Col1 and use
forEachIndex(value.split(" "),i,c1,with(cells.Col2.value.split(" "),c2,c1.toNumber()+c2[i].toNumber())).join(" ")
which you can see is a bit clunky and asymmetric.
But now that you mention loops (and combining more arrays than one), I realise there is actually a reasonably symmetric (albeit a bit more verbose with the indexing) construct I could use instead. That is:
with(value.split(" "), c1, with(cells.Col2.value.split(" "), c2, forRange(0,3,1,i,c1[i].toNumber()+c2[i].toNumber()))).join(" ")
and this is effectively a loop construct that works with any number of arrays. Although the with
controls aren't strictly necessary here, I do sometimes need arithmetic expressions that are a bit more complicated and being able to refer to them with a bound variable rather than the full 'split' deconstruction is handy.
It was really when thinking about the similar problem of zipping strings (the example in #5440) that made me wonder if a more general control might be useful. But yes, I see that maybe something that only works with two arrays is weirdly specific (though I do think it may be pretty common).
from openrefine.
The correct zip issue reference is #5440. Python's zip(*iterables)
function accepts an arbitrary number of iterables.
More generally programming language design is a pretty nuanced art, so it's helpful to have examples in other languages which can be used as a template -- and preferably one or a small number of languages, so we don't end up with a polyglot mix of Python, R, PHP, and Rust syntaxes.
from openrefine.
why only 2 arrays? why not 5 or 25? This feels more like a need for better pipelining of operations (loops?) and currying. I'm not even sure that we'd want to do something like this within GREL syntax specifically. It might be better outside of GREL and take another form. If it was part of GREL, it might be better to break it off into another control structure entirely. Maybe something like Groovy's Spread map operator. I dunno.
from openrefine.
Could you instead rewrite your needs and expressions with some real world data and put it into a numbered list of operations you would like to perform on the data and its structure? Currently, it's really hard to understand it as you have written. Don't worry about the GREL syntax you want or think you might like... just give us the problem you have in paragraphs and pictures if you can, and then describe in a numbered listing in simple sentences of the order of what/where the data would need to be changed based on whatever criteria, rules, loops, whatever.
from openrefine.
Intuitively I'd do that via a zip
function: forEach(zip(e1, e2), pair, some_function_of(pair[0], pair[1]))
.
from openrefine.
It seems perhaps that this term zip
-like came into computer science around the later 1990's thereabouts from what I can tell? So I was not familiar with the original ask on #5440 and had no comment, as no reference of the algorithm or behavior was given, and why I asked for an explanation here of how the algorithm would work in practice. But now reading around, I can see this zip
or zipping
behavior can also be looked at almost like a pivot or transpose function, where essentially zipping translates sequences into sequences where, if visualized in two dimensions, the rows and columns are swapped?
Hmm, perhaps it might make sense for our user base to label this new GREL function for zip
-like behavior as a transpose
function in the interests of keeping terminology more likely to be familiar for them within data tools and less so within computer science or programming? I'd definitely avoid labelling it as innerjoin
which it basically could also be called if one specific behavior trait was needed. Interestingly enough, Groovy programming language has GQuery (also see link below) which brings about SQL-like joining to mimic zip
-like behavior.
@wetneb We'd also have to make some hard choices on that kind of joining behavior as well since zip
implementations or libraries have different strengths and weaknesses. Like would the stopping policy of the algorithm be that it would stop after the shortest list?
Thoughts?
from openrefine.
Forgot to add, I am not opposed to GREL adding a simple zip
function and labeling it as such... but I'm always on the lookout for what the use cases are and if there's broader applicability that might be hiding. We could certainly down the road add zip
and transpose
and map
functions and more as necessary, as long as things can work together cohesively and solve problems and feel natural for most of our users.
from openrefine.
Yes I had to look up 'zip' myself as I'm not so familiar with python, and yes it's a kind of transpose operation (though it's a little asymmetric as it will turn two or more arrays into one array of arrays). The example on #5440 is specifically about string-concatenation of corresponding elements of (two) arrays of strings.
I think @wetneb's description for
forEach(zip(e1, e2), pair, some_function_of(pair[0], pair[1]))
is very nice but unless zip can work with more than two arguments then we're a bit limited. We could get around this by creating a transpose function which specifically works on an array of arrays, so
trans([a1 a2 a3...])
gives [[a1[0], a2[0], a3[0],...],[a1[1], a2[1], a3[1]],...]
With this, I could solve my vector combination with
forEach(trans([cells.Col1.value.split(" "),cells.Col2.value.split(" ")]),v,v[0].toNumber()+v[1].toNumber()).join(" ")
And the string concatenation from #5440 would look exactly the same but without the toNumber() calls, and splitting/joining with a comma separator rather than space.
As an aside, you can probably see that one of my problems with numerical tuples is having to call toNumber on each element. This prompted #4935.
from openrefine.
Ah yes sorry about the typo on the reference.
And yes, a zip function (or whatever it may be called) would certainly simplify my case adequately.
from openrefine.
So perhaps we can close this ticket as a bad idea (or at least a redundant one with #5440), but before we do I'd like to throw out one more idea. If we aren't averse to allowing variable numbers of arguments, suppose we just extended the current functionality of forEach to apply to multiple arrays like this:
forEach(a1, v1, a2, v2, ... , e)
where v1, v2, ... bind to the nth element of a1, a2, ... , respectively, and n iterates over the length of a1 (or the smallest of the arrays).
from openrefine.
I would argue for the zip
function to accept a variable number of arguments: zip(firstList, secondList, thirdList)
.
This way, the control could be kept as it currently is.
from openrefine.
Related Issues (20)
- The error dialog box cannot be closed after an error is reported after creating the project. HOT 3
- New column is not shown in Wikibase schema unless application is reloaded
- Impossible to load a project again after manually matching reconciled cells with errors
- Rewrite the GREL parser with a parser generator HOT 6
- Remove the Denormalize operation from the backend
- Improve test coverage of our CalendarParser Java class HOT 6
- Enable source data display for JSON parser
- Generate deterministic project IDs correlated with creation date
- Bundled extensions should declare slf4j and Jackson as "provided" HOT 5
- MacOS builds are not published as snapshot releases anymore HOT 1
- Drop keying methods which are not exposed to users
- Move the "phonetic" extension to a separate repository or merge it with the "main" module HOT 2
- Improve Coveralls reporting
- loosing rows in a project HOT 5
- Weird layout for Wikitext importer HOT 2
- Duplicate jars in packaged versions
- Data parsing bug when the first column has nulls HOT 7
- Problem with previewing images + UX suggestion
- Concatenate strings with null values doesn't work as expected HOT 2
- Node 16 EOL
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openrefine.