onetimepad / advantage Goto Github PK

2.0 1.0 0.0 383 KB

A framework for making RL easy!

Python 99.02% Shell 0.98%

reinforcement-learning deep-reinforcement-learning deep-learning ai python3 tensorflow machine-learning openai-gym framework advantage

advantage's People

Contributors

Stargazers

Watchers

advantage's Issues

DQN train_iteration fix

The train_iteration method for DeepQModel has two if statements one for improve_policy_modulo and improve_target_modulo. If policy improvement happens, target improvement maybe shouldn't happen after ? (like one is a multiple of the other)

Approximators: ability to select variables to train

both for train, and restore for transfer learning

Learning Model Runner

Tests and complete learning model interface

Model/Agent/Approximator saving params and restoration

Model, Agent and Approximator need to have functionality for saving and restoring parameters.

Fix tests for approximators

has no attribute optimizer issue

OOP paradigm: separate uses of approximators_builder and base_approximators

The "config" parameter in base_approximators seems to bind approximators_builder and base_approximators together when the config should only be handled by approximators_builder. The proper pattern is used in agents_builder and base_agents. There might need to be separate "builder" for each approximator similar to agents. Basically, the config param shouldn't be passed around.

Approximators inference

There can be multiple feed_dict elements to for an approximator but only one concatenated input tensor. Have a way to keep track of all of them. Change the way inference() works.

Scripts for creating templates for adding new models

Pretty much a script that generates the models, agents, and protobufs files. Also always users to add elements, buffers, utils, etc... basically a CLI.

Wrapper in DiscreteActionSpaceAgent awkward

There is a wrapper (_action_wrapper) in DiscreteActionSpaceAgent. This wrapper is used to extract the one element np.array returned by the DQNAgent or any Discrete and ActionValue agent. However, it being placed in DiscreteActionSpaceAgent is a bit awkward, since technically an ActionValue agent could be continuous and a DiscreteAgent doesn't necessarily return an action that requires such a wrapper.

There should be a better fix for this...maybe a DiscreteActionValueAgent ?

See base_agents

onetimepad / advantage Goto Github PK

advantage's People

Contributors

Stargazers

Watchers

advantage's Issues

Recommend Projects

Recommend Topics

Recommend Org