Giter Site home page Giter Site logo

hypdb's Introduction

HypDB

The core HypDB module lives in the HypDB/ directory. A web UI demo that demonstrates the capabilities of HypDB lives in the demo/ directory.

PyPI

Our package is published on PyPI here.

Paper

Our paper (published in VLDB 2018 in Rio de Janeiro) can be found here.

Contributing

We follow angular-style commit message guidelines.

To write code to solve an issue, branch off from master and name the branch with something unique and descriptive. We may open a PR at any stage of solving an issue, but request for code review when it might be ready to merge back into master. We can close the respective issue once the PR has been merged.

hypdb's People

Contributors

coreycole avatar dependabot[bot] avatar ptrbrn avatar pzli3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hypdb's Issues

feat(client): page structure

As per babak:

We should have the rewrite query answer right below of naive query answer graph. Then the explanation. Then graph and rewriten query come at the bottom of the page.

I think by "rewrite query" he means the further group by most responsible. I'm not sure what he means by the explaination, the coarse & fine-grained for the most responsible? The graph needs to be the last thing in the page with nothing next to it (it's really tricky to position the graph).

It seems to me like the page can be laid out as followed:

  • row 1 - naive query, put naive query answer to right on same row
  • row 2 - further group by query, put query answer to right on same row
  • row 3 - total effect query, put query answer to right on same row
  • row 4 - direct effect query, put query answer to right on same row
  • row 5 - coarse and fine-grained analysis
  • row 6 - causal dag

can you clarify/confirm my proposed page order @bsalimi ?

feat(bias): make grouping attribute optional

Right now, it seems that the grouping attribute is required because there is always a pandas call to do the group by. If there is no grouping attribute, we should modify the bias query to just compare all treated to all control.

refactor(hypdb): move core hypdb code into separate top level folder

I think we should avoid creating two separate repositories for testing and issue tracking reasons.

We might want the structure to be:

  • demo
    • client
    • server
  • hypdb
  • PyPI configuration files

The hypdb directory would be where all the core code would be, but the PyPI configuration files would all be top level. When we make the project and prepare it for release on PyPI, we would omit the demo code and make the release as small as we can. I don't think it needs to be in a separate repository to do this.

docs: digest the jupyter notebook and add comments

Definitely make note of inputs and outputs to different functions. What order they will all be called in and different stages/groups that we can call them in.

I'm assuming the basic average differences can come first so the user can see the results of their query before HypDB comes in an explains it away.

feat(server/client): change dag library

This will most likely be on the server and send an image to the client. Need to research which one to use.

This will replace the current dag library on the frontend.

bug(server): generated queries need line length limit

In order to have enough room to display bar charts on the same line as their respective queries, especially for data with multi-level treatment, we need to limit the maximum length of query lines.

See attached image
screen shot 2018-08-24 at 8 03 24 pm

tools(hypdb): add travis CI

Once we have linting, type checking and automated tests all setup we should run these in travis. This will be helpful in the future when people would like to contribute to the project.

Blocked by #57 #58 #59

feat(client): query validation

After the user inputs a query, we should have client-side validation that will throw an error if the input query is invalid. For the first iteration of this feature, we don't have to worry about making the error messages very custom/helpful.

Blocked by #5

feat(client/server): upload csv file

An uploaded file should be saved on the server in some directory that is in the .gitignore

It should convert to JSON upon upload. It should expect the first line to be the column headers. It should confirm the column headers with the user before uploading.

chore: setup docker for demo

For people to easily try out the demo without needing to install numerous dependencies, we will use docker (just like ZaliQL). It doesn't seem in the critical path to do this quite yet, so I'm leaving it out of project v1.

bug(server): bar char data incorrect

Right now, the ATE only returns the bar chart data for the most fine-grained chart. If I query with grouping attribute carrier I get an array of bar chart data to display. This should really be a 2D array of length 1, where the 0th element is the array I'm getting.

If I query with grouping attributes carrier and origin I should get a 2D array of length 2, where the 0th element is the array I got with only carrier as the grouping attribute, and the 1st element is the more fine-grained ATE (the array we currently get when calling with 2 grouping attributes).

In summary, we need to return the ATE for each grouping attribute starting with the first. This will always be a 2D array with length >= 1

research(client): decide on graph data viz library

We need to find a graph visualization library that we think will fit our requirements for both the minimum viable demo and later iterations where we'd like to show more fine-grained animations describing the covariate discovery algorithm.

feat(client): query input

A standard query will look something like this

SELECT avg(departure_delay)
FROM flights
WHERE airport='JFK'
GROUP BY airline

We need to further decide how exactly we want users to enter in something like this. To keep it simple, I'm thinking

  • drop-down menu for the outcome column (departure_delay)
  • drop-down menu for the csv file
  • text entry for the WHERE clause with an arbitrary number of AND and OR
  • drop-down menu for the groupin_attribute column (airline)

bug(frontend): query misc.

  • query text not updating after selecting outcome
  • query not clearing after changing file
  • where clause shows up in query text even when no condition is provided

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.