Giter Site home page Giter Site logo

uwdata / draco Goto Github PK

View Code? Open in Web Editor NEW
219.0 219.0 27.0 39.56 MB

Visualization Constraints and Weight Learning

Home Page: https://uwdata.github.io/draco/

License: BSD 3-Clause "New" or "Revised" License

Python 38.79% Shell 1.91% JavaScript 5.39% TypeScript 53.91%
answer-set-programming vega visualization-constraints visualization-recommendation

draco's Issues

Task

task - mark
task - channel

Refactor code

We should aim to not rely too much on files in our APIs. It makes it really hard to write in memory algorithms. Instead, we should mooe around in-memory objects and handle IO separately. I added a few TODOs in the code to show where I see problems.

Running 'run_pipeline.sh examples/ab.json' produces wrong result

Running run_pipeline.sh examples/ab.json results in a bad result:

The field of encoding e0 is "e1" in the generated file, where "e1" is an encoding id not a field.

The result is supposed to be either a or b for the field.

FYI, soft constraints generated:

% ====== Data definitions ======
fieldtype(a,string).
cardinality(a,3).

fieldtype(b,number).
cardinality(b,6).

% ====== Query constraints ======
mark(bar).

encoding(e0).
channel(e0,x).
:- not field(e0,_).
:- not type(e0,_).
%0 { log(e0) } 1.
%0 { zero(e0) } 1.

encoding(e1).
channel(e1,y).
:- not field(e1,_).
type(e1,quantitative).
aggregate(e1,max).
%0 { log(e1) } 1.
%0 { zero(e1) } 1.

Generated full spec:

{                                                                                                                                              
    "$schema": "https://vega.github.io/schema/vega-lite/v2.0.json",
    "data": {
        "url": "examples/data/ab.csv"
    },
    "encoding": {
        "x": {
            "field": "e1",
            "type": "ordinal"
        },
        "y": {
            "aggregate": "max",
            "field": "b",
            "scale": {
                "zero": true
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Get bugs

Get a list of wrong predictions. Quickly view them.

Create pairs

We need pairs that we can ask the user for labels. These pairs should cover interesting correlations.

Active learning

Figure out what pairs of visualization we should ask an expert for labels.

Rename methods

For example, load_from_vl_json should be load_query_from_json

Add importance to data fields

Users should be able to express whether a field is considered important or not.

Options for adding this are

  • A flag to indicate whether a field is important
  • A score
  • A total order
  • Preferences (foo > bar)

Evaluation

  • Can we learn all good visualizations? We may miss important soft constraints so we cannot recommend certain charts.
  • Can we recommend the top 10 visualizations from http://viziometrics.org/?
  • Do we beat CompassQL in human ratings?

Fix scale.zero and scale.type: log

We should be parsing this spec correctly. Right now, we expect zero and log not to be nested under scale.

{
    "encoding": {
        "x": {
            "bin": 10,
            "field": "horsepower",
            "type": "quantitative"
        },
        "y": {
            "aggregate": "count",
            "scale": {
                "zero": True
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Skew

  • Add a soft constraint not to add zero when the difference between min and max is less than the distance to 0.
  • Discourage line or area without aggregation when the size of the data is larger than the size of the ordinal axis

Hard constraints

  • Don't use bar with just d
  • Don't use rule with only x or y
  • rule and tick need q
  • do not use negative values for size
  • size should have high preference for zero, even with bin
  • don't allow row without y, column without x
  • stack can only have one continuous and it has to be on x or y
  • only allow dates to be used for temporal (no string or number)
  • we overlap bars when we don't have an ordinal and only q

Data Gen Issues

  • Better way to generate 'stack' (maybe hard constraints will help)
  • D(x) x D(y) x Q(other) should have examples with agg on other
  • special case for type when generating
  • too many channel-channel
  • support count

Data gen issues

(I will keep adding to this list)

  • We don't support square mark types (I just added a constraint for that)
  • Prefer to generate aggregation together with lines and area unless its q q (I'm adding soft constraints to learn this)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.