The zebras's discuss from nickslevine

Why check `node_modules` into the repo?

Hi @nickslevine, zebras look very nice and promising! 👍

Curious whether there is a reason behind checking in node_modules? https://github.com/nickslevine/zebras/tree/master/node_modules

[fix] - fix wrong function declaration mistake on toCSV.js.html & index.d.ts

PROBLEM Inconsistent example of Z.toCSV()

Currently

(static) toCSV(df, filepath) → {undefined}

Should be

(static) toCSV(filepath, df) → {undefined}

Convert to AMD module

Convert zebras.js to an AMD module so z = require('zebras') will work on Observable rather than having to do z = require('https://bundle.run/zebras).

Does not handle multiline column headers

Eg.

"COL A
Some info1", "COL B
Some Info2"
"data 1", "data 2"

Implement map for groupby objects

It would be cool to have a way to map groupby objects. It could preserve the same keys and map the values, just like ramda's map, or just get the values to get a dataframe-like result, roughly like:

function apply(fn, df){
  return R.pipe(
    R.mapObjIndexed((value, index) => fn(value, index)),
    R.values
  )(df)
}

Question about `head` and `tail`

It's not super straightforward that these methods print something as a string. Wouldn't it be more useful and clear to have them return a subset of df (similarly to filter)?

Then printing of head or tail could be done via Z.print(Z.head(10, df)) which is more clear on the intent? Or it could also be encapsulated in print-specific helpers: printHead and printTail.

Side note, there is console.log() lingering in tail:

zebras/zebras.js

Line 228 in b670068

console.log(print(truncated))

Make zebras browser friendly

Right now Zebras is intended for use either in Jupyter notebook or in ObservableHQ. But there might be a value in allowing it to be used in the browser as well. To be able to use some of its methods inside web applications, etc.

It seems that it wouldn't be too difficult to make it happen, the main blocker right now seems to be that everything is in one file and it uses the file system:

zebras/zebras.js

Line 1 in b670068

const fs = require("fs")

An idea could be to modularise it exposing methods as individual exports. This also has the perk of improved maintainability going forward instead of cramming more functionality into one file. Similarly to how Ramda is structured.

It'd be also advised, maybe as next step and not right away, to introduce a build step to produce different kinds of dist bundles (es6, cjs, umd).

Add tests

I think it would be great to have tests. It would be a good next step, before further planned refactoring, to have a piece of mind that nothing gets broken in the process.

If you didn't already, I could add a testing setup with mocha and start adding tests until we get full coverage.

What do you think?

Sort df by multiple columns

Change z.sortByCol(columnName, direction, df) so that it accepts an array of column names and sorts a df by the specified columns, giving precedence to columns listed earlier, as in pandas.

Implement merge

Implement function to join dataframes on columns. See pandas implementation here.

Add different types of joins and ability to join on multiple columns to merge function

(latest npm package) Does not import data with a comma in them

This issue appears fixed in the code but not in the npm release:
#35

The last package release was prior to that commit - https://www.npmjs.com/package/zebras

image from npm:

Possible to get a new release?

Build out groupBy functionality

See pandas documentation here and here.

This is a stand-in for a more detailed set of issues once I've broken them up into chunks.

gbCount, gbMean etc. not curried

Hi,

It seems that you can't do a pipe like this:

const data = [{"Day": "Monday", "value": 10}, {"Day": "Tuesday", "value": 5}, {"Day": "Monday", "value": 7}]

Z.pipe([
  Z.groupBy(s => s.Day),
  Z.gbSum('value') // Uncaught TypeError: g.call is not a function
])(data)

It's because the gbSum function (and the other gb funcs) is not curried?

Add https to zebrasjs.com

Super cool with the domain!

It would be great to ensure https enabled, maybe even enforced. It should be supported by github pages as I see documented here: https://blog.github.com/2018-05-01-github-pages-custom-domains-https/

Now there is this unpleasant warning:

Proposal: Move TODO.md to issues on GitHub

First off, I love this tool, would love to contribute!

Personally, I like to work with GitHub issues rather than a TODO.md file, since it's easier to reference in PR's/comments. Contributors could also comment on the issue if they want to work on it, which is a little harder to do in a committed file. Also, tagging issues with feature or bug can be pretty helpful!

> z.parseNums(['amount'], [{amount: '1,200.30'}])
[ { amount: 1 } ]

There's a US-centric way to handle this (https://stackoverflow.com/a/11665949/1024811) or a much heavier I18N way to do it (https://stackoverflow.com/a/42000120/1024811), or people could just be warned that columns should not have commas in them.

nickslevine / zebras Goto Github PK

zebras's Issues

PROBLEM Inconsistent example of Z.toCSV()

Recommend Projects

Recommend Topics

Recommend Org