Giter Site home page Giter Site logo

orco's Introduction

About me ๐Ÿงšโ€โ™€๏ธ

  • I am a developer and research engineer at ACS Research Group and IT4Innovations.
  • I am interested in AI Safety, HPC, and formal verification.
  • I have PhD in Computer Science.
  • I work mostly in Rust and Python; sometimes in TypeScript. In past C, C++, Haskell.
  • I love digital painting and playing the cello.

Selected projects where I am main author/co-author

  • HyperQueue - user-friendly and scalable job scheduler for supercomputers
  • Nelsie - slide making software
  • Rain - framework for large distributed pipelines
  • Interlab - toolkit for multi-agent interactions
  • RSDS - Dask server reimplemented in Rust
  • Nedoc - non-evaluating documentation generator for Python
  • Haydi - Python framework for generating discrete structures)
  • Aislinn - dynamic verifier for MPI programs
  • Estee - simulator for task-based workflow schedulers
  • ORCO - Python package for defining, executing, and persisting computations
  • Replay-cache - replay cache for LangChain
  • RMahjong - Riichi Mahjong
  • LabLab - A simple image anotation tool

orco's People

Contributors

gavento avatar kobzol avatar spirali avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

kobzol gavento

orco's Issues

Proposal: Add versioning to builders

Idea: We can have a version=1 parameter for @builder and Builder. The version is just a number that is manually incremented on every change, independently for every builder. (Any comparable could be used but I would stick to numbers).

When the number is incremented and a the builder is added to a runtime, all entries with previous versions of the builder are pruned from the DB along with their children (probably automatically).
On one hand, this risks losing some data on reckless version bump, but is safer for data consistency.

Alternatively, old entries can be just ignored, but their children would need to be somehow ignored too, which seems too complex (unless you have an excellent idea ;) )

@spirali What do you think?

Config into key conversion does not support booleans

As the title says. JSON supports them. Will fix it.

@spirali Alternatively, we can serialize the config with pickle and skip the whole JSON thing, just using something like JSON for key generation.

This would a) ensure more consistency (and even preserve named parameter order in OrderedDict) and b) allow adding other types being supported in key generation.

Possible drawback: JSON e.g. unifies lists and tuples into lists, adding another type of consistency.

TODO

  • Finish collection API (.compute, .remove, ..., .get)
  • collection.clean()
  • command line API
  • Obj API (or just use dict?)
  • Generating cartesian product for Obj (or dict?) Builder(graph=[..], model=[..]) = [Obj<graph=.. model=..>, .... ]
  • Simple LocalExecutor that respects dependencies and shows some progress (tqdm)
  • Queries API for getting results from DB
  • Export a collection into Pandas frame
  • Dask Executor
  • "repeat" API
  • Iterative collection API
  • Saving information about computation (computation duration, etc)
  • Prediction of a computation
  • fail-safe mode for executor
  • Possibility to return JSON like object with refs from dep_fn
  • Runtime context manager

Proposal: Add context object accessible to bulder fns

Builder functions have limited access to anything but their input config and dependency entries. I can imagine the following may be also useful (not proposing to add all of this now, though):

  • the runtime object
  • the entry being created (in a wider sense)
    • record additional outputs (generic files, images, TF logs, ...) [I want to make this]
    • adding custom tags or information (logging?)
    • performance of certain blocks in the computation (e.g. via context managers)
    • online updates to progress (so it is seen in the DB and dashboard)
  • the set of all requested dependencies
  • get builder by name (from runtime)

This would likely be merged into _CONTEXT functionality.

Open question: how should builder fn get the context?

  • First argument - IMO ugly and cumbersome for simpler functions, needs some more argument magic
  • Function of orco, e.g. orco.ctx() - an explicit call (people will want to cache it) but semantically clean
  • Module "attribute" like orco.ctx - even nicer, but modules can't have properties with getters.
    • We can have orco.ctx be a global proxy that redirects all attr access to thread-local objects ... buut may not be worth the extra fancy

Feature: Storing associated blobs and data with Entries

Some computations have multiple outputs, and some of those are naturally files. E.g. training a neural net outputs: the model parameters (data or file), resulting stats (data), TF summarywriter logs (file), sometimes graphs or images (files), stout/stderr captures (data). It would be great if some of those types could be also displayed in the browser (images, text files, logs, ...)

Table / properties

Add a table for storing blobs, every currently valid Entry has associated blobs. It would make sense to include the serialized output value (for consistency or e.g. external blob storage).

Every blob has:

  • id
  • entry - reference to entry, M:1 (TODO: update to match the current schema)
  • data (blob)
  • name - filename (relative to the workdir) or empty (for pickled returned value) or any name withut slash (just a blob, may still be instantiated as a file)
  • Some notion of kind/type/intent - which should be displayed in browser, which are images, which are (viewable) text files, how to highlight the text, etc. Plugins may define more (e.g. tensorboard).
    • Mime seems to be too much and insufficient (e.g. TF logs)? (But good for browser open/download)
    • We can just have tag field mixing role (full, thumbnail, ...) and type (text, json, png, jpg)
    • Or we can have both mime for type and tags for role/intent/plugin (for distinguishing e.g. TF logs ..).

API

Managed through context for creation (see #22):

  • ctx.add_blob(data, name, mimetype, tags=()) - add data blob
  • ctx.add_file(path, name=None, mimetype, tags=()) - add an existing file
    And some type-specific functions (more for text/logs, etc.)
  • ctx.add_figure(fig, name, tags=('thumbnail', )) - render and insert Matplotlib/plotly/bokeh/... image
  • ctx.add_pickled(obj, name, tags=(pickled)) - pickle and add object

Properties and methods on Entry:

  • Entry.files - dictionary name: EntryFile

EntryFile (bikesheddable) has similar properties to the table above. In addition, it has methods:

  • EntryFile.write_file(filename=None) - write as real file, returns Path object
  • EntryFile.as_file() - return a readable file-like object (SQLite supports this)
  • EntryFile.data() - return binary data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.