aechpro / distrib-rl Goto Github PK
View Code? Open in Web Editor NEWDistributable Reinforcement Learning Platform
License: Apache License 2.0
Distributable Reinforcement Learning Platform
License: Apache License 2.0
To cut back on dead code we will remove the MinAtarWrapper.
I think we can bring it back at some stage, but ideally in a way where we don't need to take a direct dependency on it from the core distrib-rl
Configurator
module.
In order to be able to properly resume a terminated run we need to be able to capture any/all state necessary to "revive" the run. For much of this we can probably rely on pytorch's built-in checkpointing features, but we'll need to augment the checkpoint with any additional state of which pytorch is not likely aware, such as the learning rate controller's state.
Note: this almost certainly requires that we change the checkpoint format away from the current .npy
format (assuming that #42 doesn't change it first). In doing so, we should also make sure that our checkpoints include a format version specifier so that as we change the checkpoint data over time we can be sure that we can either maintain backward compatibility, and/or error on checkpoint versions that we no longer understand.
Should still be capable of parsing JSON as well. Since this is only parsing a few files at a time we can just check file extension for which to use, and if that doesn't tell us do a simple try/except
fallback.
A requirement for this will be to annotate checkpoints after resuming in such a way that indicates that they weren't produced in a single run.
Perhaps this annotation could be accomplished by incrementing a "run id" number, and maybe an optional (or required?) field that captures whether the previous run terminated cleanly or not. If we do include that additional field, it will need to be regarded as best-effort however, as there's no such thing as a tamperproof mechanism for determining whether or not there was a clean exit between runs.
I gather from conversation with @AechPro and @lucas-emery that there's a strong contingent of researchers out there who depend on OpenAI gym
v0.24.0
, and may continue to do so for quite some time.
To be accommodating to those researchers, I think it would be great if distrib-rl
had some sort of compatibility layer that supported numerous versions of the gym
API spec and ultimately allowed consumers of distrib-rl
to bring their prefered version of the gym
API into distrib-rl
at runtime.
This compatibility layer should document the minimum gym API version supported, and include tests that exercise the bounds of the various API revisions to ensure that we maintain compatibility going forward.
I've noticed that while training bots on rocket-learn
that tracking the mean episode length is handy for knowing when the bot has reached certain early learning stages. For example, the minima following the initial peak in mean episode length tends to correspond to the bot learning an efficient kickoff, and the growth in episode length thereafter tends to correspond with the bot learning to become more competitive post-kickoff (rather than just relying on random chance for kickoff goal rewards).
See this chart, as an example.
Right now most directories containing submodules in this project use PascalCase
. I believe it would be more idiomatic (and more aligned with PEP-8) if we used a snake_case
naming style.
Since this change is very cross-cutting and liable to touch a lot of files, we should also take the opportunity to ensure that any other aspects of this project's file structure align with PEP-8 or other similar accepted PEPs.
In order to maintain reproducibility, checkpoints should capture the version of distrib-rl
that produced them.
Similarly when distrib-rl
is executed either from its own git repo or as part of a project that is contained within a git repo, the checkpoint should include details about the state of the repo, such as the origin
, branch name, commit hash, whether the working directory is clean, and the current working directory diff in the case when the working directory is not clean.
Note: this will likely involve changing the checkpoint format over to something other than .npy
exports.
RIght now to run training we need to have some adjustment in the experiment in order to train, even if we want to keep our hyperparameters fixed.
We can work around this easily enough, but experiment files for training will be more readable w/o that.
It would be nice if I didn't have to fork this project to use it in my own project, apart from setup.py
(addressed separately), the following items are blockers:
The main goal of this is to separate out the main distributed learning functions of distrib-rl from the environment-specific initialization logic that is currently found in the distrib_rl/Environments/Custom
directory.
To do this we should move the RocketLeague env to its own repo (likely in the nexus-rl
org), and move the other two envs into their own packages that live alongside the distrib_rl
package directory in this repo.
To take a first step toward an IoC architecture, and to make it easier to wrangle our project state, we should introduce a single top-level component that serves as an object/component registry for long-lived objects that are used in cross-cutting ways such as the active config, the network client (e.g. RedisClient
or RedisServer
), and an eventual state manager.
Ideally it'd support 2s and 3s as well :-)
At the moment distrib-rl
does not have any mechanism that allows one to resume training from a checkpoint with config modifications.
It's often the case that practitioners wish to modify environment details and/or hyperparameters during learning. Sometimes these details are understood at the outset and scheduled in advance, and sometimes they're discovered/decided only after training has begun.
In order to not sacrifice reproducibility, it's critical that it be possible to produce a single configuration that, if run unattended from start to finish, would reproduce the full set of checkpoints, including the ones prior to a mid-stream config change.
As a result, I'd propose a change to the config format that allows for defining arbitrary config "scheduling," with any mid-stream config modifications being required to reference the original configuration over the range for which that config was valid.
For example, assuming that you are training a model with a given config that we'll call config-v1
, in order to resume training from checkpoint 5 that was produced by config-v1
, a hypothetical config-v2
must reference (in some TBD way) the original config-v1
as having been valid for checkpoints 0-5, inclusive. This makes it relatively trivial to compose a series of modified configs into a single snapshotted config on each checkpoint, with that snapshotted config being executable as-is to reproduce the checkpoint that contains it.
Having that composite config stored as part of each snapshot is a requirement for this feature.
To make it easier to move to another protocol for comms between the workers and learner, it would be good if all communication between the workers and the learner was encapsulated into one or more abstract service objects (objects that express access to a remote resource in terms of the needs of the application itself, not the capabilities of the remote service).
The existing binding to Redis should also be encapsulated in an implementation of the abstraction introduced by this change, with minimal leakage of transport details to consumers of the communication service code.
Rather than parsing json files into dicts, they should be parsed into a domain model (objects) with validation.
Ideally that domain model would be capable of generating a JSON schema so that tools like VSCode can validate and provide hints while those files are being edited.
This issue should not be considered complete until there are tests that guard the ability for the config data model to read and validate JSON config.
Setuptools is quite useful, especially if we ever want to release to pypi for install via pip.
Assuming that the functionality that someone needs is implemented in this tool, it'd ideally be possible use this tool by editing config only.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.