nickslevine / zebras Goto Github PK
View Code? Open in Web Editor NEWData analysis library for JavaScript built with Ramda
Data analysis library for JavaScript built with Ramda
Hi @nickslevine, zebras look very nice and promising! ๐
Curious whether there is a reason behind checking in node_modules
? https://github.com/nickslevine/zebras/tree/master/node_modules
Currently
(static) toCSV(df, filepath) โ {undefined}
Should be
(static) toCSV(filepath, df) โ {undefined}
Convert zebras.js to an AMD module so z = require('zebras')
will work on Observable rather than having to do z = require('https://bundle.run/zebras)
.
Eg.
"COL A
Some info1", "COL B
Some Info2"
"data 1", "data 2"
It would be cool to have a way to map groupby objects. It could preserve the same keys and map the values, just like ramda's map, or just get the values to get a dataframe-like result, roughly like:
function apply(fn, df){
return R.pipe(
R.mapObjIndexed((value, index) => fn(value, index)),
R.values
)(df)
}
It's not super straightforward that these methods print something as a string. Wouldn't it be more useful and clear to have them return a subset of df
(similarly to filter
)?
Then printing of head
or tail
could be done via Z.print(Z.head(10, df))
which is more clear on the intent? Or it could also be encapsulated in print-specific helpers: printHead
and printTail
.
Side note, there is console.log()
lingering in tail
:
Line 228 in b670068
Right now Zebras is intended for use either in Jupyter notebook or in ObservableHQ. But there might be a value in allowing it to be used in the browser as well. To be able to use some of its methods inside web applications, etc.
It seems that it wouldn't be too difficult to make it happen, the main blocker right now seems to be that everything is in one file and it uses the file system:
Line 1 in b670068
An idea could be to modularise it exposing methods as individual exports. This also has the perk of improved maintainability going forward instead of cramming more functionality into one file. Similarly to how Ramda is structured.
It'd be also advised, maybe as next step and not right away, to introduce a build step to produce different kinds of dist bundles (es6, cjs, umd).
I think it would be great to have tests. It would be a good next step, before further planned refactoring, to have a piece of mind that nothing gets broken in the process.
If you didn't already, I could add a testing setup with mocha
and start adding tests until we get full coverage.
What do you think?
Change z.sortByCol(columnName, direction, df)
so that it accepts an array of column names and sorts a df by the specified columns, giving precedence to columns listed earlier, as in pandas.
Implement function to join dataframes on columns. See pandas implementation here.
This issue appears fixed in the code but not in the npm release:
#35
The last package release was prior to that commit - https://www.npmjs.com/package/zebras
Possible to get a new release?
Hi,
It seems that you can't do a pipe like this:
const data = [{"Day": "Monday", "value": 10}, {"Day": "Tuesday", "value": 5}, {"Day": "Monday", "value": 7}]
Z.pipe([
Z.groupBy(s => s.Day),
Z.gbSum('value') // Uncaught TypeError: g.call is not a function
])(data)
It's because the gbSum function (and the other gb funcs) is not curried?
Super cool with the domain!
It would be great to ensure https enabled, maybe even enforced. It should be supported by github pages as I see documented here: https://blog.github.com/2018-05-01-github-pages-custom-domains-https/
First off, I love this tool, would love to contribute!
Personally, I like to work with GitHub issues rather than a TODO.md
file, since it's easier to reference in PR's/comments. Contributors could also comment on the issue if they want to work on it, which is a little harder to do in a committed file. Also, tagging issues with feature
or bug
can be pretty helpful!
Excel surrounds these with cases with quotations when it is in a CSV. Zebras seems to just ignore these rows of data. This could be a user error. Please comment if you know a fix.
Add z.describe(arr)
, which returns a df of summary statistics, including mean, std, min, max, count and number of unique values.
See the pandas implementation here.
Perhaps this should just go in the docs, but "1,200.30" turns into 1
. Thanks, parseFloat
!
Observe:
> z.parseNums(['amount'], [{amount: '1,200.30'}])
[ { amount: 1 } ]
There's a US-centric way to handle this (https://stackoverflow.com/a/11665949/1024811) or a much heavier I18N way to do it (https://stackoverflow.com/a/42000120/1024811), or people could just be warned that columns should not have commas in them.
Write high-level roadmap document sketching out goals and potential priorities.
Very slow. Tried it on a .5 mb / ~12,000 row csv, took ~20 seconds.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.