machinable-org / machinable Goto Github PK
View Code? Open in Web Editor NEWA modular system for machinable research code
Home Page: https://machinable.org
License: MIT License
A modular system for machinable research code
Home Page: https://machinable.org
License: MIT License
Once Python 3.11 becomes available, use the stdlib toml parser to make settings more readable.
machinable should keep track of the relationships between Experiments, e.g. re-runs of the same experiment etc.
Various improvements to indexing
It is often useful to run from a code backup rather the current code state to exactly reproduce an execution. The Execution or Project interface should have an interface to run directly from a code backup.
The Slurm engine may use this by default to ensure that the code does not change between queue and execution time.
Projects should be able to expose custom elements like Drivers, host_info etc from a registration class to make it easy to share them
The signature is sometimes cumbersome to use as a lot of tuples have to be wrapped in each other affecting readablity.
Currently, we just convert anything to a string which is not ideal as the user might not be aware of it. To address
The current functional API has a number of limitation and inconsistencies with the recommended Component API. This leads to poor user experience and unnecessary code duplication. To resolve this, the functional API should be using standard Component classes.
With options to disable this behaviour in settings.yaml and storage argument.
There should also be a progress bar if the processing takes longer
This feature was previously removed due to conflicts with the pytest output capturing mechanism.
The task API could be easily exposed to the command line, for example:
machinable --component test --version learning_rate=1
There could also be an interactive execution feature
$ machinable execute
What component do you want to execute? test
Do you want to specify a version [N/y]? N
...
There could also be a way to inspect the config, get dry runs and execution plans etc.
I don't think this should evolve too far because most of these use cases will be covered by the app but it could be a useful tool in server environments.
It can be hard to relate the running processes to machinable Experiments. We should use setproctitle to name every process after it's UID/experiment_id
Since on_execute_start is written in execute after create events are triggered.
The GraphQL server currently only provides a basic GET endpoint to read plain text files. To serve advanced use cases, the API should provide
watchgod
that allows to react to updates (e.g. changes in the machinable.yaml
)machinable should detect if the project is not a repository like in an jupyter environment and disable the code backup.
The events API could be improved and documented. Right now it does not have a clear use cases and it may not work well in distributed settings
This would enable better KeyError
messages as well as smart 'reflective' transformations, for example
test.example.toDict(patch=true) => {"test": {"example": test.example.toDict()}}
GitPython is currently only used to infer commit information which could be done with sh
This would be useful to define component execution that need others to finish first etc. The implementation would be left to the respective engines (Slurm already supports dependency pipelines via --dependency
).
API proposal (WIP)
Execution('example').dependency([Execution('next'),], condition="any_finished")
.dependency(Execution('failure'), condition="failure")
.submit()
def dependency(executions: List[Union[Execution, Experiment]], condition=None):
self._dependencies.append(locals())
# self-dependency to 're-run' if failed?
Engines:
At execution,
Useful example: Slurm dependency reference
Currently users have to manage the filenames manually, there should be an interface to select checkpoints more easily directly from the Excution interface, e.g. 'from_last_checkpoint' etc.
Currently, version updates fail if they don't match the structure, for instance
components:
- demo:
test: 1
Experiment().component('demo', {'test': 2 }) # fine
Experiment().component('demo', {'test': {'nested': 'structure'}) # fails
I'm not sure whether that's a bug or a feature. In any case, the exception should be handled more gracefully.
Currently, we do not use multiprocessing if processes
is set to 1
. Since there are good use cases for using a process isolation (e.g. catching SEGV stack traces etc) we should allow use of 1 process and use it to capture process level failures gracefully.
The Store() interface duplicates a lot of the APIs of the more general StorageFileSystemModel and should thus be replaced.
components:nested.module:
- example
currently fails in validation
Currently, the user is reponsible for downloading dependency repos etc. It might be nice if machinable would take care of this automatically. This could also include some conflict management. However, I guess we could just rely on some other library that already solved this type of problem rather than reinventing some custom solution.
Currently, Index.find_latest(since)
requires a DateTime
to be passed that has to be constructed by the user. It would be more convenient if users could also specify relative time as a string argument, for example since="1d"
etc.
To scaffold new projects from a starter-template. Probably as simple as cloning a Git repo
The structure of a machinable project allows for automated execution of all available components in a project similar. It would thus be possible to introduce an execution mode in which every component is being executed like a unit test. The component class could expose basic 'assert' APIs that only apply when executed in this new 'test mode'.
Ideally, this could be implemented by using an existing testing framework like PyTest.
Ray 0.8.7 improved the Trainable interface that is used in the machinable tune integration. This will allow us to improve the integration and get rid of some of the current limitations like missing checkpoint support.
Using Experiment.component('example', ('_test_'))
will fail unless _test_
is specified in the example
. To enable dynamic mixins, the schema-validation should be relaxed when it comes to mixin versions.
Currently, Exceptions are being caught and reported at the very end of the jobs as parallel execution is assumed. However, that's not useful behaviour in local mode because you will only learn about an exception after all iterative executions have been finished.
It should be possible to recover a Task object using some representation of the Task specification. That would enable easy reconstruction from the observations etc.
Should be managed in a similar way as in jupyter notebooks.
Currently, users have to manually 'forward' config methods into mixins, e.g.
def config_the_method(self):
self._the_mixin_.config_the_method()
It would be useful to invoke such forward calls automatically.
It is currently not possible to override the _mixin_
list of parent components in a fine-grained way (e.g. remove a particular mixin etc) because they are expanded in the parent before the inheritance occurs. To allow for more flexible use, the _mixin_
elements should not be resolved until after inheritance is applied.
The available engines and indexes are either imported eagerly if the overhead is small or not imported at all if they require heavy dependencies. Ideally, the imports in these models should be handled lazily using the new Module getattr option (available in Python >=3.7)
Engine often have useful meta data, for instance, the Slurm engine could infer the slurm submission ID and save it to the storage. Engine() should provide an interface to easily store meta data.
It should be possible to implement some smart automatic reload of observations, for example:
Maintaining a hash of the machinable.yaml files for caching would probably provide some performance benefits.
get_component(or_fail=False)
option to raise an exception if the component cannot be foundSupport auto-completion by statically analyzing the project
Currently, the Store()
interface makes the assumption that execution is continuous from start to finish. However, in many cases workers are actually interrupted and resumed from checkpoints and using self.store
may lead to inconsistent results. machinable should automatically 'resume' from existing store/
data to enable seamless spot execution.
Observer should have a sync(directory, target, frequency) method that enables automatic syncing of local file directory to the non-local pyfilesystem under 'storage/$target' of the observer storage. This is useful if you want to sync custom things like tf checkpoints etc to the storage. When the method is called multiple times multiple syncs are setup. If the filesystem is already local, nothing should be done. We can automatically sync in regular frequency using the heartbeat event, self.events.on('heartbeat', syncer.sync_if_needed)
Ray has sync features that we could potentially reuse.
Follow up to #18. We currently rely on sh
in an inconsistent way throughout the code base. Moving to commandlib should simplify the setup and may also allow to drop dependencies like GitPython
etc.
Some config methods are fairly generic and not bound to the particular component (e.g parsing dtypes etc). It should thus be possible to register them globally with support for imports etc so it would be easy to provide sensible defaults
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.