aechpro / distrib-rl Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 3.0 279 KB

Distributable Reinforcement Learning Platform

License: Apache License 2.0

Python 100.00%

distrib-rl's People

Contributors

Stargazers

Watchers

Forkers

nevercast benjamincburns icunnyngham

distrib-rl's Issues

Revive MinAtarWrapper

To cut back on dead code we will remove the MinAtarWrapper.

I think we can bring it back at some stage, but ideally in a way where we don't need to take a direct dependency on it from the core distrib-rl Configurator module.

Enhancement: Capture the full learner state details in the checkpoint artifact

In order to be able to properly resume a terminated run we need to be able to capture any/all state necessary to "revive" the run. For much of this we can probably rely on pytorch's built-in checkpointing features, but we'll need to augment the checkpoint with any additional state of which pytorch is not likely aware, such as the learning rate controller's state.

Note: this almost certainly requires that we change the checkpoint format away from the current .npy format (assuming that #42 doesn't change it first). In doing so, we should also make sure that our checkpoints include a format version specifier so that as we change the checkpoint data over time we can be sure that we can either maintain backward compatibility, and/or error on checkpoint versions that we no longer understand.

Enhancement: Accept YAML format for config & experiment files

Should still be capable of parsing JSON as well. Since this is only parsing a few files at a time we can just check file extension for which to use, and if that doesn't tell us do a simple try/except fallback.

Chore: Document config & experiments interfaces

I think this should probably go hand-in-hand with #5 and probably needs to be somewhat of a prerequisite for #23

Enhancement: Make it possible to resume experiments if the server crashes

A requirement for this will be to annotate checkpoints after resuming in such a way that indicates that they weren't produced in a single run.

Perhaps this annotation could be accomplished by incrementing a "run id" number, and maybe an optional (or required?) field that captures whether the previous run terminated cleanly or not. If we do include that additional field, it will need to be regarded as best-effort however, as there's no such thing as a tamperproof mechanism for determining whether or not there was a clean exit between runs.

Enhancement: Make it possible for users to specify the version of the OpenAI gym lib they want to use

I gather from conversation with @AechPro and @lucas-emery that there's a strong contingent of researchers out there who depend on OpenAI gym v0.24.0, and may continue to do so for quite some time.

To be accommodating to those researchers, I think it would be great if distrib-rl had some sort of compatibility layer that supported numerous versions of the gym API spec and ultimately allowed consumers of distrib-rl to bring their prefered version of the gym API into distrib-rl at runtime.

This compatibility layer should document the minimum gym API version supported, and include tests that exercise the bounds of the various API revisions to ensure that we maintain compatibility going forward.

Add logging for rocket league mean episode length

I've noticed that while training bots on rocket-learn that tracking the mean episode length is handy for knowing when the bot has reached certain early learning stages. For example, the minima following the initial peak in mean episode length tends to correspond to the bot learning an efficient kickoff, and the growth in episode length thereafter tends to correspond with the bot learning to become more competitive post-kickoff (rather than just relying on random chance for kickoff goal rewards).

See this chart, as an example.

Chore: Use `snake_case` directory naming style and conform to other filesystem-level aspects of PEP-8

Right now most directories containing submodules in this project use PascalCase. I believe it would be more idiomatic (and more aligned with PEP-8) if we used a snake_case naming style.

Since this change is very cross-cutting and liable to touch a lot of files, we should also take the opportunity to ensure that any other aspects of this project's file structure align with PEP-8 or other similar accepted PEPs.

Enhancement: Capture the `distrib-rl` version and project git details in the checkpoint

In order to maintain reproducibility, checkpoints should capture the version of distrib-rl that produced them.

Similarly when distrib-rl is executed either from its own git repo or as part of a project that is contained within a git repo, the checkpoint should include details about the state of the repo, such as the origin, branch name, commit hash, whether the working directory is clean, and the current working directory diff in the case when the working directory is not clean.

Note: this will likely involve changing the checkpoint format over to something other than .npy exports.

Add tests to guard reading experiments and configs

Make it possible to train without needing an adjustment defined in the experiment

RIght now to run training we need to have some adjustment in the experiment in order to train, even if we want to keep our hyperparameters fixed.

We can work around this easily enough, but experiment files for training will be more readable w/o that.

Enhancements that make distrib-rl easier to consume as a library

It would be nice if I didn't have to fork this project to use it in my own project, apart from setup.py (addressed separately), the following items are blockers:

Make it possible to specify different directories (or full file paths) for experiments, configs, and data output
Make it possible to register additional builders under config keys with the various factories

Chore: move environment-specific code to its own package(s)

The main goal of this is to separate out the main distributed learning functions of distrib-rl from the environment-specific initialization logic that is currently found in the distrib_rl/Environments/Custom directory.

To do this we should move the RocketLeague env to its own repo (likely in the nexus-rl org), and move the other two envs into their own packages that live alongside the distrib_rl package directory in this repo.

Enhancement: include a checkpoint version number in checkpoint output so that we can maintain backward compatibility with old checkpoints

Chore: Add a component registry object to make it easier to get and track singleton lifecycles

To take a first step toward an IoC architecture, and to make it easier to wrangle our project state, we should introduce a single top-level component that serves as an object/component registry for long-lived objects that are used in cross-cutting ways such as the active config, the network client (e.g. RedisClient or RedisServer), and an eventual state manager.

RocketLeague environment only works w/ team_size of 1

Ideally it'd support 2s and 3s as well :-)

Make it possible to configure terminal condition for rocket league in config file

Add rocket league reward logging

Enhancement: Make it possible to rollback to a specific checkpoint and resume with modified parameters

At the moment distrib-rl does not have any mechanism that allows one to resume training from a checkpoint with config modifications.

It's often the case that practitioners wish to modify environment details and/or hyperparameters during learning. Sometimes these details are understood at the outset and scheduled in advance, and sometimes they're discovered/decided only after training has begun.

In order to not sacrifice reproducibility, it's critical that it be possible to produce a single configuration that, if run unattended from start to finish, would reproduce the full set of checkpoints, including the ones prior to a mid-stream config change.

As a result, I'd propose a change to the config format that allows for defining arbitrary config "scheduling," with any mid-stream config modifications being required to reference the original configuration over the range for which that config was valid.

For example, assuming that you are training a model with a given config that we'll call config-v1, in order to resume training from checkpoint 5 that was produced by config-v1, a hypothetical config-v2 must reference (in some TBD way) the original config-v1 as having been valid for checkpoints 0-5, inclusive. This makes it relatively trivial to compose a series of modified configs into a single snapshotted config on each checkpoint, with that snapshotted config being executable as-is to reproduce the checkpoint that contains it.

Having that composite config stored as part of each snapshot is a requirement for this feature.

Chore: Encapsulate communication between the learner and workers into a clean abstraction

To make it easier to move to another protocol for comms between the workers and learner, it would be good if all communication between the workers and the learner was encapsulated into one or more abstract service objects (objects that express access to a remote resource in terms of the needs of the application itself, not the capabilities of the remote service).

The existing binding to Redis should also be encapsulated in an implementation of the abstraction introduced by this change, with minimal leakage of transport details to consumers of the communication service code.

Enhancement: Make a proper domain model for the various JSON files, with validation

Rather than parsing json files into dicts, they should be parsed into a domain model (objects) with validation.

Ideally that domain model would be capable of generating a JSON schema so that tools like VSCode can validate and provide hints while those files are being edited.

This issue should not be considered complete until there are tests that guard the ability for the config data model to read and validate JSON config.