Giter Site home page Giter Site logo

machinable-org / machinable Goto Github PK

View Code? Open in Web Editor NEW
34.0 34.0 2.0 9.43 MB

A modular system for machinable research code

Home Page: https://machinable.org

License: MIT License

Python 100.00%
convention-over-configuration data-science framework-agnostic machine-learning python-3 research-and-development

machinable's People

Contributors

dependabot[bot] avatar frthjf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ivansche skim0119

machinable's Issues

Re-run relationships

machinable should keep track of the relationships between Experiments, e.g. re-runs of the same experiment etc.

Improve Storage indexing

Various improvements to indexing

  • improve performance by pruning the tree
  • remove ComponentStorage if not found to enable reindexing after moving files

Documentation improvements

  • Add a 'How-to' section with in-depth tutorials like 'how to continue execution', 'how to reproduce results' etc.
  • Resource inheritance documenation
  • Slurm engine documentation
  • Schema-validation

Add from_code option for Execution from code backups

It is often useful to run from a code backup rather the current code state to exactly reproduce an execution. The Execution or Project interface should have an interface to run directly from a code backup.

The Slurm engine may use this by default to ensure that the code does not change between queue and execution time.

Registration API

Projects should be able to expose custom elements like Drivers, host_info etc from a registration class to make it easy to share them

Handle non-jsonable types in records

Currently, we just convert anything to a string which is not ideal as the user might not be aware of it. To address

  • Record should allow for registration of a custom json-serialiser to convert complex types to strings
  • if no serialiser is registered, an Exeception should be raised if type is not JSONable
  • the exception should explain alternatives like storage.save_data()

Commandline interface

The task API could be easily exposed to the command line, for example:

machinable --component test --version learning_rate=1

There could also be an interactive execution feature

$ machinable execute
What component do you want to execute? test
Do you want to specify a version [N/y]? N
...

There could also be a way to inspect the config, get dry runs and execution plans etc.

I don't think this should evolve too far because most of these use cases will be covered by the app but it could be a useful tool in server environments.

Extended GraphQL server filesystem APIs

The GraphQL server currently only provides a basic GET endpoint to read plain text files. To serve advanced use cases, the API should provide

  • A subscription based streaming interface that allows to read files partially
  • File watcher subscription based on watchgod that allows to react to updates (e.g. changes in the machinable.yaml)

Revisit events API

The events API could be improved and documented. Right now it does not have a clear use cases and it may not work well in distributed settings

ConfigMap should track the full key path

This would enable better KeyError messages as well as smart 'reflective' transformations, for example

test.example.toDict(patch=true) => {"test": {"example": test.example.toDict()}}

Allow dependency specifications between executions

This would be useful to define component execution that need others to finish first etc. The implementation would be left to the respective engines (Slurm already supports dependency pipelines via --dependency).

API proposal (WIP)

Execution('example').dependency([Execution('next'),], condition="any_finished")
                                 .dependency(Execution('failure'), condition="failure")
                                 .submit()
def dependency(executions: List[Union[Execution, Experiment]], condition=None):
      self._dependencies.append(locals())
# self-dependency to 're-run' if failed?

Engines:

  • engine on top propagates to all non-executed dependencies; exception if non-executed dependent declare engines
  • if already executed should check what type of engine was used and if same time go ahead to allow for dependencys on already running jobs
  • engines should expose 'supports_dependencies' flag similar to 'supports_resources'

At execution,

  • check if engine supports_dependencies, if not, raise error if execution has dependency
  • in future, if an engine does not support dependencies, we can fall back on synchronous logic to handle it even if the engine itself does not support it

Useful example: Slurm dependency reference

Simplify checkpoint functionality

Currently users have to manage the filenames manually, there should be an interface to select checkpoints more easily directly from the Excution interface, e.g. 'from_last_checkpoint' etc.

Handle structured version updates correctly

Currently, version updates fail if they don't match the structure, for instance

components:
 - demo:
     test: 1
Experiment().component('demo', {'test': 2 }) # fine
Experiment().component('demo', {'test': {'nested': 'structure'}) # fails

I'm not sure whether that's a bug or a feature. In any case, the exception should be handled more gracefully.

Native engine should allow multiprocessing with 1 process

Currently, we do not use multiprocessing if processes is set to 1. Since there are good use cases for using a process isolation (e.g. catching SEGV stack traces etc) we should allow use of 1 process and use it to capture process level failures gracefully.

Automatic dependency managment

Currently, the user is reponsible for downloading dependency repos etc. It might be nice if machinable would take care of this automatically. This could also include some conflict management. However, I guess we could just rely on some other library that already solved this type of problem rather than reinventing some custom solution.

machinable init command

To scaffold new projects from a starter-template. Probably as simple as cloning a Git repo

Provide options for 'unit' testing

The structure of a machinable project allows for automated execution of all available components in a project similar. It would thus be possible to introduce an execution mode in which every component is being executed like a unit test. The component class could expose basic 'assert' APIs that only apply when executed in this new 'test mode'.

Ideally, this could be implemented by using an existing testing framework like PyTest.

  • disable output capturing to not interfere with Pytest's output capturing

Improve Ray Tune integration

Ray 0.8.7 improved the Trainable interface that is used in the machinable tune integration. This will allow us to improve the integration and get rid of some of the current limitations like missing checkpoint support.

Ignore schema-validation for mixins

Using Experiment.component('example', ('_test_')) will fail unless _test_ is specified in the example. To enable dynamic mixins, the schema-validation should be relaxed when it comes to mixin versions.

Local mode execution should report Exception right away

Currently, Exceptions are being caught and reported at the very end of the jobs as parallel execution is assumed. However, that's not useful behaviour in local mode because you will only learn about an exception after all iterative executions have been finished.

Add Task() reflection/serialization

It should be possible to recover a Task object using some representation of the Task specification. That would enable easy reconstruction from the observations etc.

Mixins should be able to expose config methods

Currently, users have to manually 'forward' config methods into mixins, e.g.

def config_the_method(self):
      self._the_mixin_.config_the_method()

It would be useful to invoke such forward calls automatically.

Improve mixin overrides

It is currently not possible to override the _mixin_ list of parent components in a fine-grained way (e.g. remove a particular mixin etc) because they are expanded in the parent before the inheritance occurs. To allow for more flexible use, the _mixin_ elements should not be resolved until after inheritance is applied.

Infrastructure backlog

  • Deprecate Python 3.7
  • Enable mypy
  • Switch to ruff
  • Switch to pydantic v2
  • Remove commandlib dependency

Make engine and index imports lazy

The available engines and indexes are either imported eagerly if the overhead is small or not imported at all if they require heavy dependencies. Ideally, the imports in these models should be handled lazily using the new Module getattr option (available in Python >=3.7)

Allow engines to write meta data

Engine often have useful meta data, for instance, the Slurm engine could infer the slurm submission ID and save it to the storage. Engine() should provide an interface to easily store meta data.

Smart reloading of observations

It should be possible to implement some smart automatic reload of observations, for example:

  • destroy observations cache every 5 minutes unless the job is finished/died
  • option to enable automatic re-index current storages in some interval

get_component improvements

  • introduce get_component(or_fail=False) option to raise an exception if the component cannot be found
  • rename index argument to avoid confusion with Indexes
  • the method should act as identity when being passed a Component, i.e. use StorageComponent.create() rather than constructor

Make store operations resumable

Currently, the Store() interface makes the assumption that execution is continuous from start to finish. However, in many cases workers are actually interrupted and resumed from checkpoints and using self.store may lead to inconsistent results. machinable should automatically 'resume' from existing store/ data to enable seamless spot execution.

Introduce sync for non-pyfilesystem locations

Observer should have a sync(directory, target, frequency) method that enables automatic syncing of local file directory to the non-local pyfilesystem under 'storage/$target' of the observer storage. This is useful if you want to sync custom things like tf checkpoints etc to the storage. When the method is called multiple times multiple syncs are setup. If the filesystem is already local, nothing should be done. We can automatically sync in regular frequency using the heartbeat event, self.events.on('heartbeat', syncer.sync_if_needed)

Ray has sync features that we could potentially reuse.

Replace sh with commandlib

Follow up to #18. We currently rely on sh in an inconsistent way throughout the code base. Moving to commandlib should simplify the setup and may also allow to drop dependencies like GitPython etc.

Allow to register global config methods

Some config methods are fairly generic and not bound to the particular component (e.g parsing dtypes etc). It should thus be possible to register them globally with support for imports etc so it would be easy to provide sensible defaults

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.