The draco's discuss from uwdata

Don't stack and bin at the same time

Use proper temp dir

https://docs.python.org/3.6/library/tempfile.html

View visualizations by feature

A simple viewer where we can look at visualizations and see what soft constraints they violate.

We should aim to not rely too much on files in our APIs. It makes it really hard to write in memory algorithms. Instead, we should mooe around in-memory objects and handle IO separately. I added a few TODOs in the code to show where I see problems.

Generate feature vecors in parallel

Speed up development

UI to ask the user for labels

Running 'run_pipeline.sh examples/ab.json' produces wrong result

Running run_pipeline.sh examples/ab.json results in a bad result:

The field of encoding e0 is "e1" in the generated file, where "e1" is an encoding id not a field.

The result is supposed to be either a or b for the field.

FYI, soft constraints generated:

% ====== Data definitions ======
fieldtype(a,string).
cardinality(a,3).

fieldtype(b,number).
cardinality(b,6).

% ====== Query constraints ======
mark(bar).

encoding(e0).
channel(e0,x).
:- not field(e0,_).
:- not type(e0,_).
%0 { log(e0) } 1.
%0 { zero(e0) } 1.

encoding(e1).
channel(e1,y).
:- not field(e1,_).
type(e1,quantitative).
aggregate(e1,max).
%0 { log(e1) } 1.
%0 { zero(e1) } 1.

Generated full spec:

{                                                                                                                                              
    "$schema": "https://vega.github.io/schema/vega-lite/v2.0.json",
    "data": {
        "url": "examples/data/ab.csv"
    },
    "encoding": {
        "x": {
            "field": "e1",
            "type": "ordinal"
        },
        "y": {
            "aggregate": "max",
            "field": "b",
            "scale": {
                "zero": true
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Test with schema

Get bugs

Get a list of wrong predictions. Quickly view them.

Fix outputs

For example, bin: 3 is not valid.

Write tests for python code

Simplify view

We could use https://vega.github.io/vega/usage/#view instead of creating an SVG and inlining that.

Write up the algorithms

Use clyngor

Write features with data size

Capture interaction between x/y channels and row/col

Write more soft constraints

Create pairs

We need pairs that we can ask the user for labels. These pairs should cover interesting correlations.

remove scale

Active learning

Figure out what pairs of visualization we should ask an expert for labels.

Don't ship with binary

Rename methods

For example, load_from_vl_json should be load_query_from_json

Encoding is an array in queries

Get clingo in homebrew

MLN with ranking

Explain the algorithm more

How do we do online learning
How does our algo compare to existing ones

Implement a way to get the number of violating soft constraints

Write learning algorithm

Add importance to data fields

Users should be able to express whether a field is considered important or not.

Options for adding this are

A flag to indicate whether a field is important
A score
A total order
Preferences (foo > bar)

Add data, task and spec to ui

Create good input/output examples

Use CompassQL.

@domoritz starts with 3-5
@Mestway adds more afterwards.

Evaluation

Can we learn all good visualizations? We may miss important soft constraints so we cannot recommend certain charts.
Can we recommend the top 10 visualizations from http://viziometrics.org/?
Do we beat CompassQL in human ratings?

Inspecting how to sample solutions from partial spec

Given a partial spec, and a set of hard constraints, sample from the set of all possible full specs that satisfies these hard constraints.

Fix scale.zero and scale.type: log

We should be parsing this spec correctly. Right now, we expect zero and log not to be nested under scale.

{
    "encoding": {
        "x": {
            "bin": 10,
            "field": "horsepower",
            "type": "quantitative"
        },
        "y": {
            "aggregate": "count",
            "scale": {
                "zero": True
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Add a soft constraint not to add zero when the difference between min and max is less than the distance to 0.
Discourage line or area without aggregation when the size of the data is larger than the size of the ordinal axis

Hard constraints

Don't use bar with just d
Don't use rule with only x or y
rule and tick need q
do not use negative values for size
size should have high preference for zero, even with bin
don't allow row without y, column without x
stack can only have one continuous and it has to be on x or y
only allow dates to be used for temporal (no string or number)
we overlap bars when we don't have an ordinal and only q

Use different levels of constraints

Some constraints may not be strict but should be above the soft constraints. I don't have time to add this anytime soon, though.

Data Gen Issues

Better way to generate 'stack' (maybe hard constraints will help)
D(x) x D(y) x Q(other) should have examples with agg on other
special case for type when generating
too many channel-channel
support count

Support sort

Sort scales as feature

Data gen issues

(I will keep adding to this list)

We don't support square mark types (I just added a constraint for that)
Prefer to generate aggregation together with lines and area unless its q q (I'm adding soft constraints to learn this)

Test CLI

http://dustinrcollins.com/testing-python-command-line-apps

uwdata / draco Goto Github PK

draco's Issues

Recommend Projects

Recommend Topics

Recommend Org