Make cost-aware versions of gym envs

Need some good cost-aware versions of Gym environments. Making one or two with intuitive cost functions based on Gym environments with positive rewards would be ideal. Cost functions for our current MountainCar and Acrobot CostAwareEnvs look pretty arbitrary.

In this issue, I want to start a conversation about how we can make descriptive plots when the data series we are making are quite noisy.

Problem

I am currently using synthetic data, but this can be revisited once we finish the remaining experiment scripts. The synthetic data has the form

y = signal(x) + noise(x)

where

signal(x) := L / (1 + exp(-k(x-x0)))

is the generalized logistic function (which is—roughly—what our models give us) and

noise(x) ~ N(mu, sigma)

Suppose I have two sets of signal/noise parameters, with a fixed number of realizations of each. Below, I plot the mean realizations for both sets, as well as the 95% (student) confidence intervals for each.

Obviously there are stylistic questions to be addressed, but the plot actually looks pretty good. But look at the number of steps in the iterations. Let me show what happens when we go from 50 to 500 steps.

It's starting to get hard to read this figure. Of course, we are actually working on the scale of 50,000 runs or even 500,000. Let's see those too.

These figures are basically junk.

Some solutions

One option on these large series is to do direct downsampling. Set a ds threshold, and then only plot the series like series[::ds]. This gives us something like (50,000 steps, ds=500):

Alternatively, we could do moving averages. As an example, here is a simple moving average (window of 500):

But it's unclear to me which of these is preferable, or if we should take an entirely different approach.

Seeding RandomMDPEnv

Issue

We need to be able to seed RandomMDPEnv so that, whenever identical seeds are provided, identical MDPEnvs are produced.

Question

Is this already possible with the current class definition?

Implement RayController

Gaussian policy

We should create a Gaussian policy for use with the DeepACAgent.

Make a nice README and choose an appropriate license

TrialRunner for episodic envs

Issue

TrialRunner needs to be able to handle episodic environments to make it compatible with Gym-style environments. Continuing settings can then be handled by specifying a single, long episode.

Implement TrialConstructor

Running scripts, saving to data directory

Issue

Need to decide on a standard way to run scripts in the scripts directory, then implement it. One of the main issues is how to refer to the data directory from within the script.

Suggestion

Use

import data

data_dir = data.__path__[0]

to get the local absolute path to data.

Question

Is this reasonable? Is there a better way, like doing import costaware and using costaware.data.__path__, instead?

run_experiment must be run from costaware directory

Issue

The references to the configs directory in scripts/run_experiment.py appear to assume the script is being run from the top-level costaware directory. It would be nice if these references were absolute paths to configs on the local machine.

Rearrangements

Need to do the following:

move main.utils.experiment to main.core and make corresponding changes elsewhere in the code
combine main.experimental.util into main.experimental.experimental_envs and move the environments defined therein into main.core.envs, then make corresponding changes elsewhere

Create plotting utilities

Need to make plotting utilities for experiment data.

Implement EnvConstructor

Implement TrialRunner

Need a Ray Actor version of this class.

Plotting noisy data

In this issue, I want to start a conversation about how we can make descriptive plots when the data series we are making are quite noisy.

Problem

I am currently using synthetic data, but this can be revisited once we finish the remaining experiment scripts. The synthetic data has the form

y = signal(x) + noise(x)

where

signal(x) := L / (1 + exp(-k(x-x0)))

is the generalized logistic function (which is—roughly—what our models give us) and

noise(x) ~ N(mu, sigma)

Suppose I have two sets of signal/noise parameters, with a fixed number of realizations of each. Below, I plot the mean realizations for both sets, as well as the 95% (student) confidence intervals for each.

Obviously there are stylistic questions to be addressed, but the plot actually looks pretty good. But look at the number of steps in the iterations. Let me show what happens when we go from 50 to 500 steps.

It's starting to get hard to read this figure. Of course, we are actually working on the scale of 50,000 runs or even 500,000. Let's see those too.

These figures are basically junk.

Some solutions

One option on these large series is to do direct downsampling. Set a ds threshold, and then only plot the series like series[::ds]. This gives us something like (50,000 steps, ds=500):

Alternatively, we could do moving averages. As an example, here is a simple moving average (window of 500):

But it's unclear to me which of these is preferable, or if we should take an entirely different approach.

Access to agent class names

Current setup

Each instantiation agent = AgentClassName() has an attribute agent.title = 'AgentClassName'. This is used in scripts/run_experiment.py to record the agent's class type when logging.

Suggestion

type(agent).__name__ == agent.title, so removing agent.title and instead using type(agent).__name__ avoids duplication.

Questions

@DavidNKraemer Is there another reason to keep agent.title? Is agent.title used anywhere other than scripts/run_experiment.py?

Proper Dockerfile and/or other build system

We need to make a proper Dockerfile or build procedure to make sure our Python versions and other dependencies are all the same.

Implement TrialCoordinator

Renaming

Need to do the following:

rename main to costware and make corresponding changes throughout the code

Get rid of all extraneous stuff

ExperimentRunner class diagram

Background

We're creating an ExperimentRunner object that reads in an experiment_config file and launches a corresponding experiment. I just took a first crack at the class diagram, which can be found in the notes directory on the experiments branch.

Issue

@DavidNKraemer Suggestions? Comments? My UML usage may need correcting.

Get deep RL agents working on portfolio problem

Now that we know the Q-learning algorithm works reasonably well on cost-aware gym environments, we should try to get it working on our portfolio management environment.

Test ExperimentRunner

Get it working on some reasonable examples.

Plotter requires working LaTeX installation

@DavidNKraemer Plotter appears to require a working LaTeX installation by default. I agree we need to keep the ability to plot LaTeX, but is there a way to do this without requiring a working installation? I just use Overleaf for my LaTeX needs and want to avoid installing it locally, if possible. One possible workaround is to finally get around to #32.

Make `publication` branch into `master`; delete `dev` and `plotting` branches

Merge experiments into master

We need to merge experiments into master and get everything into publishable shape.

Get DeepACAgent working on cost-aware gym envs

We should try to get the DeepACAgent working on our cost-aware gym environments. First step would be to create Gaussian policy to use with the agent.

Agents should be initialized with envs

Issue

We currently initialize agents in a hodgepodge of different ways depending on the dimension of the state and action spaces, the actions themselves, and the specific environment expected. The result is that the arguments we pass in to various agents are too diverse to make a more uniform interface for agent initialization.

Solution

Passing the created environment into the agent itself can help sidestep this, since the agent can inspect the environment and collect the required information internally.

TODO

All agents on the experiments branch need to be redefined to accept envs on initialization and collect the appropriate information.
All scripts on experiments depending on agents need to be altered to reflect the new agent definitions.
Plans must be made to fix any conflicts and rebase other branches on experiments once the experiments branch becomes master.

OPTIONAL: add nice graphics/videos to illustrate things

Plotting branch?

Issue:

There are two lines of ongoing development that will be based on the experiments branch:

plotting utilities
the Experiment object

Question:

@DavidNKraemer Should we make a sub-branch of experiments for development of plotting utilities?

Debug ExperimentRunner

Need to debug ExperimentRunner once we have working versions of the Env-, Agent-, and IOManagerConstructors.

Make a publication branch (out of deep experiments branch)

Ratio computation

Ratios are currently being computed by taking averages over two fixed-length buffers at each timestep. For small buffers this is probably okay, but for larger ones this is an awful lot of additional computation. Is there a better way we could be doing this?

Make some nice example scripts to help people understand the code and paper results

Implement ConfigManager

It would be nice to have a working version of this to test with ExperimentRunner once the latter is almost done being debugged.

Documentation!

We sorely need to add documentation throughout the repo.

Implement IOManagerConstructor

Config file formats for trials and experiments

Issue

We need to come up with standard config file formats for specifying both trials and experiments (which are just collections of trials). These are important because we need to have an easy way of saving the (hyper-)parameters we used to generate data alongside the data itself. This will make it easy to identify and replicate experiments, if necessary.

Get ExperimentRunner working with deep RL agents

Code style guidelines for the repo

Suggestion

We should come up with code style guidelines for the repository and write them down somewhere. I suggest following PEP8 and emulating @DavidNKraemer's style otherwise.

Plotter handles some experiments inappropriately

Plotter appears to group data together by the name of the agent used. This makes it difficult to run an experiment that tests multiple hyperparameter configurations for a single type of agent, for example. This seems to be because

sns.lineplot(data=data, x='step', y='ratio', hue='agent',
             ci=self.confidence)

in Plotter.plot() forms groups using hue='agent'.

One flexible solution might be to group according to the name of subdirectories (e.g. AC_trials, Q_trials) instead of agents.

Deep Q-learning agent for cost-aware gym envs

Get DeepRVIQLearningAgent working on the cost-aware gym environments.

Trials appear to slow down

Issue

When running experiment_runner.py for longer periods, it seems like trials take an increasing amount of time to complete the same number of steps.

A proper linear softmax policy

Issue

The SoftmaxPolicy currently in use in the LinearACAgent uses atypical feature vectors.

Solution

We need to refactor so that LinearACAgent uses a classic, standard feature vector mapping in SoftmaxPolicy. One standard approach to try is a polynomial mapping: each state-action pair (s, a) gets mapped to the vector [s, a, s * a, 1] (well, this vector will actually be appropriately normalized, but you get the idea).

Implement ExperimentRunner

Fun with `experiment_runner_example.py`

I tried running examples/experiment_runner_example.py and encountered the following runtime error:

Traceback (most recent call last):
  File "examples/experiment_runner_example.py", line 18, in <module>
    default=f'{data.__path__[0]}/experiment_runner_example',
TypeError: '_NamespacePath' object is not subscriptable

I looked through this error and came to this interpretation: The data module has a dunder attribute __path__ which is a _NamespacePath object. Since the _NamespacePath smells like a list, we subscript it to get a hard path (a good idea in theory, given local machine compatibility issues). The problem is that _NamespacePath isn't actually a list and doesn't support subscripting.

My guess is that this is an error on my end, but I'm unsure where it might be coming from. It seems like just a Python problem. I worry that we may be having a conflict between Python 3.* versioning, in which case we need to pin down exactly what we are using.

wessle / costaware Goto Github PK

costaware's People

Contributors

Stargazers

Watchers

costaware's Issues

Problem

Some solutions

Issue

Question

Issue

Issue

Suggestion

Question

Issue

Problem

Some solutions

Current setup

Suggestion

Questions

Background

Issue

Issue

Solution

TODO

Issue:

Question:

Issue

Suggestion

Issue

Issue

Solution

Recommend Projects

Recommend Topics

Recommend Org