The orco from spirali

Proposal: Add versioning to builders

Idea: We can have a version=1 parameter for @builder and Builder. The version is just a number that is manually incremented on every change, independently for every builder. (Any comparable could be used but I would stick to numbers).

When the number is incremented and a the builder is added to a runtime, all entries with previous versions of the builder are pruned from the DB along with their children (probably automatically).
On one hand, this risks losing some data on reckless version bump, but is safer for data consistency.

Alternatively, old entries can be just ignored, but their children would need to be somehow ignored too, which seems too complex (unless you have an excellent idea ;) )

@spirali What do you think?

Config into key conversion does not support booleans

As the title says. JSON supports them. Will fix it.

@spirali Alternatively, we can serialize the config with pickle and skip the whole JSON thing, just using something like JSON for key generation.

This would a) ensure more consistency (and even preserve named parameter order in OrderedDict) and b) allow adding other types being supported in key generation.

Possible drawback: JSON e.g. unifies lists and tuples into lists, adding another type of consistency.

TODO

Proposal: Add context object accessible to bulder fns

Builder functions have limited access to anything but their input config and dependency entries. I can imagine the following may be also useful (not proposing to add all of this now, though):

the runtime object
the entry being created (in a wider sense)
- record additional outputs (generic files, images, TF logs, ...) [I want to make this]
- adding custom tags or information (logging?)
- performance of certain blocks in the computation (e.g. via context managers)
- online updates to progress (so it is seen in the DB and dashboard)
the set of all requested dependencies
get builder by name (from runtime)

This would likely be merged into _CONTEXT functionality.

Open question: how should builder fn get the context?

First argument - IMO ugly and cumbersome for simpler functions, needs some more argument magic
Function of orco, e.g. orco.ctx() - an explicit call (people will want to cache it) but semantically clean
Module "attribute" like orco.ctx - even nicer, but modules can't have properties with getters.
- We can have orco.ctx be a global proxy that redirects all attr access to thread-local objects ... buut may not be worth the extra fancy

Feature: Storing associated blobs and data with Entries

Some computations have multiple outputs, and some of those are naturally files. E.g. training a neural net outputs: the model parameters (data or file), resulting stats (data), TF summarywriter logs (file), sometimes graphs or images (files), stout/stderr captures (data). It would be great if some of those types could be also displayed in the browser (images, text files, logs, ...)

Table / properties

Add a table for storing blobs, every currently valid Entry has associated blobs. It would make sense to include the serialized output value (for consistency or e.g. external blob storage).

Every blob has:

id
entry - reference to entry, M:1 (TODO: update to match the current schema)
data (blob)
name - filename (relative to the workdir) or empty (for pickled returned value) or any name withut slash (just a blob, may still be instantiated as a file)
Some notion of kind/type/intent - which should be displayed in browser, which are images, which are (viewable) text files, how to highlight the text, etc. Plugins may define more (e.g. tensorboard).
- Mime seems to be too much and insufficient (e.g. TF logs)? (But good for browser open/download)
- We can just have tag field mixing role (full, thumbnail, ...) and type (text, json, png, jpg)
- Or we can have both mime for type and tags for role/intent/plugin (for distinguishing e.g. TF logs ..).

API

Managed through context for creation (see #22):

ctx.add_blob(data, name, mimetype, tags=()) - add data blob
ctx.add_file(path, name=None, mimetype, tags=()) - add an existing file
And some type-specific functions (more for text/logs, etc.)
ctx.add_figure(fig, name, tags=('thumbnail', )) - render and insert Matplotlib/plotly/bokeh/... image
ctx.add_pickled(obj, name, tags=(pickled)) - pickle and add object

Properties and methods on Entry:

Entry.files - dictionary name: EntryFile

EntryFile (bikesheddable) has similar properties to the table above. In addition, it has methods:

EntryFile.write_file(filename=None) - write as real file, returns Path object
EntryFile.as_file() - return a readable file-like object (SQLite supports this)
EntryFile.data() - return binary data

spirali / orco Goto Github PK

orco's Introduction

About me 🧚‍♀️

Selected projects where I am main author/co-author

orco's People

Contributors

Stargazers

Watchers

Forkers

orco's Issues

Proposal: Add versioning to builders

Config into key conversion does not support booleans

TODO

Proposal: Add context object accessible to bulder fns

Open question: how should builder fn get the context?

Feature: Storing associated blobs and data with Entries

Table / properties

API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent