Giter Site home page Giter Site logo

distrib-rl's People

Contributors

aechpro avatar benjamincburns avatar icunnyngham avatar nevercast avatar some-rando-rl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

distrib-rl's Issues

Revive MinAtarWrapper

To cut back on dead code we will remove the MinAtarWrapper.

I think we can bring it back at some stage, but ideally in a way where we don't need to take a direct dependency on it from the core distrib-rl Configurator module.

Enhancement: Capture the full learner state details in the checkpoint artifact

In order to be able to properly resume a terminated run we need to be able to capture any/all state necessary to "revive" the run. For much of this we can probably rely on pytorch's built-in checkpointing features, but we'll need to augment the checkpoint with any additional state of which pytorch is not likely aware, such as the learning rate controller's state.

Note: this almost certainly requires that we change the checkpoint format away from the current .npy format (assuming that #42 doesn't change it first). In doing so, we should also make sure that our checkpoints include a format version specifier so that as we change the checkpoint data over time we can be sure that we can either maintain backward compatibility, and/or error on checkpoint versions that we no longer understand.

Enhancement: Make it possible to resume experiments if the server crashes

A requirement for this will be to annotate checkpoints after resuming in such a way that indicates that they weren't produced in a single run.

Perhaps this annotation could be accomplished by incrementing a "run id" number, and maybe an optional (or required?) field that captures whether the previous run terminated cleanly or not. If we do include that additional field, it will need to be regarded as best-effort however, as there's no such thing as a tamperproof mechanism for determining whether or not there was a clean exit between runs.

Enhancement: Make it possible for users to specify the version of the OpenAI gym lib they want to use

I gather from conversation with @AechPro and @lucas-emery that there's a strong contingent of researchers out there who depend on OpenAI gym v0.24.0, and may continue to do so for quite some time.

To be accommodating to those researchers, I think it would be great if distrib-rl had some sort of compatibility layer that supported numerous versions of the gym API spec and ultimately allowed consumers of distrib-rl to bring their prefered version of the gym API into distrib-rl at runtime.

This compatibility layer should document the minimum gym API version supported, and include tests that exercise the bounds of the various API revisions to ensure that we maintain compatibility going forward.

Add logging for rocket league mean episode length

I've noticed that while training bots on rocket-learn that tracking the mean episode length is handy for knowing when the bot has reached certain early learning stages. For example, the minima following the initial peak in mean episode length tends to correspond to the bot learning an efficient kickoff, and the growth in episode length thereafter tends to correspond with the bot learning to become more competitive post-kickoff (rather than just relying on random chance for kickoff goal rewards).

See this chart, as an example.

Chore: Use `snake_case` directory naming style and conform to other filesystem-level aspects of PEP-8

Right now most directories containing submodules in this project use PascalCase. I believe it would be more idiomatic (and more aligned with PEP-8) if we used a snake_case naming style.

Since this change is very cross-cutting and liable to touch a lot of files, we should also take the opportunity to ensure that any other aspects of this project's file structure align with PEP-8 or other similar accepted PEPs.

Enhancement: Capture the `distrib-rl` version and project git details in the checkpoint

In order to maintain reproducibility, checkpoints should capture the version of distrib-rl that produced them.

Similarly when distrib-rl is executed either from its own git repo or as part of a project that is contained within a git repo, the checkpoint should include details about the state of the repo, such as the origin, branch name, commit hash, whether the working directory is clean, and the current working directory diff in the case when the working directory is not clean.

Note: this will likely involve changing the checkpoint format over to something other than .npy exports.

Enhancements that make distrib-rl easier to consume as a library

It would be nice if I didn't have to fork this project to use it in my own project, apart from setup.py (addressed separately), the following items are blockers:

  • Make it possible to specify different directories (or full file paths) for experiments, configs, and data output
  • Make it possible to register additional builders under config keys with the various factories

Chore: move environment-specific code to its own package(s)

The main goal of this is to separate out the main distributed learning functions of distrib-rl from the environment-specific initialization logic that is currently found in the distrib_rl/Environments/Custom directory.

To do this we should move the RocketLeague env to its own repo (likely in the nexus-rl org), and move the other two envs into their own packages that live alongside the distrib_rl package directory in this repo.

Enhancement: Make it possible to rollback to a specific checkpoint and resume with modified parameters

At the moment distrib-rl does not have any mechanism that allows one to resume training from a checkpoint with config modifications.

It's often the case that practitioners wish to modify environment details and/or hyperparameters during learning. Sometimes these details are understood at the outset and scheduled in advance, and sometimes they're discovered/decided only after training has begun.

In order to not sacrifice reproducibility, it's critical that it be possible to produce a single configuration that, if run unattended from start to finish, would reproduce the full set of checkpoints, including the ones prior to a mid-stream config change.

As a result, I'd propose a change to the config format that allows for defining arbitrary config "scheduling," with any mid-stream config modifications being required to reference the original configuration over the range for which that config was valid.

For example, assuming that you are training a model with a given config that we'll call config-v1, in order to resume training from checkpoint 5 that was produced by config-v1, a hypothetical config-v2 must reference (in some TBD way) the original config-v1 as having been valid for checkpoints 0-5, inclusive. This makes it relatively trivial to compose a series of modified configs into a single snapshotted config on each checkpoint, with that snapshotted config being executable as-is to reproduce the checkpoint that contains it.

Having that composite config stored as part of each snapshot is a requirement for this feature.

Chore: Encapsulate communication between the learner and workers into a clean abstraction

To make it easier to move to another protocol for comms between the workers and learner, it would be good if all communication between the workers and the learner was encapsulated into one or more abstract service objects (objects that express access to a remote resource in terms of the needs of the application itself, not the capabilities of the remote service).

The existing binding to Redis should also be encapsulated in an implementation of the abstraction introduced by this change, with minimal leakage of transport details to consumers of the communication service code.

Enhancement: Make a proper domain model for the various JSON files, with validation

Rather than parsing json files into dicts, they should be parsed into a domain model (objects) with validation.

Ideally that domain model would be capable of generating a JSON schema so that tools like VSCode can validate and provide hints while those files are being edited.

This issue should not be considered complete until there are tests that guard the ability for the config data model to read and validate JSON config.

Add a setup.py

Setuptools is quite useful, especially if we ever want to release to pypi for install via pip.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.