mauicv / gerel Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 644 KB

evolutionary algorithms for reinforcement learning

License: MIT License

Python 100.00%

gerel's Introduction

GeReL

GeReL is a simple library for genetic algorithms applied to reinforcement learning.

NOTE: GeReL is in development.

Example:

The following uses REINFORCE-ES to solve openai cartpole environment

from gerel.genome.factories import dense
from gerel.algorithms.RES.population import RESPopulation
from gerel.algorithms.RES.mutator import RESMutator
from gerel.model.model import Model
import gym
import numpy as np
from gerel.populations.genome_seeders import curry_genome_seeder
from string import Template


def compute_fitness(genome):
    model = Model(genome)
    env = gym.make("CartPole-v0")
    state = env.reset()
    fitness = 0
    action_map = lambda a: 0 if a[0] <= 0 else 1  # noqa
    for _ in range(1000):
        action = model(state)
        action = action_map(action)
        state, reward, done, _ = env.step(action)
        fitness += reward
        if done:
            break

    return fitness


if __name__ == '__main__':
    genome = dense(
        input_size=4,
        output_size=1,
        layer_dims=[2, 2, 2]
    )

    weights_len = len(genome.edges) + len(genome.nodes)
    init_mu = np.random.uniform(-1, 1, weights_len)

    mutator = RESMutator(
        initial_mu=init_mu,
        std_dev=0.1,
        alpha=0.05
    )

    seeder = curry_genome_seeder(
        mutator=mutator,
        seed_genomes=[genome]
    )

    population = RESPopulation(
        population_size=50,
        genome_seeder=seeder
    )

    report_temp = Template('generation: $generation, mean: $mean, best: $best')
    for generation in range(100):
        for genome in population.genomes:
            genome.fitness = compute_fitness(genome.to_reduced_repr)
        population.speciate()
        data = population.to_dict()
        mutator(population)
        report = report_temp.substitute(
            generation=generation,
            mean=data['mean_fitness'],
            best=data['best_fitness'])
        print(report)

Tests:

To run all unittests:

nosetests

gerel's People

Contributors

Watchers

gerel's Issues

NEATMutator doesn't cull species

The NEAT mutator is supposed to kill off species if they havn't imporved in some number of generations.

Improve datastore implementation

Add meta data to datastore implementation.

Add examples

Instead of having integration tests replace with Example folder and include documentation for each case.

Seems like running populations around ~300 with network hidden layer sizes ~ [100, 100] ends up resulting in significant slow downs. I don't feel these sizes should be an issue. Figure out what cuases this? Is it just memory?

get_addmissable_edges returns emptylist?

This may be an error that occurs. Very intermittent so mostly leaving here in case it crops up so as to provide more notes on. Do not persue as may not exist!

Make metric a property of the population class rather than passing it as a parameter.

Test running times for in parallel and in series batches of environement runs

If series runs of batches of environments is more efficent then we don't need to store the entirity of the population data in memory. This is dependent on what we're using to simulate the environment.

tests for pybullet
tests for box2d

Project aims

Currently pyg will successfully run NEAT even with some minor errors #5. The aim is to build pyg so that:

NEAT is easy
- See this
Reinforce-ES is implementable
- See this and this
Simple to serve as a backend to a realtime mixed human selection system.
- Idea is that using a very simple baseline reward function in a given environment we have a human select best stratigies. We both use the human data as a fitness function and train a critic network that adjusts the baseline reward function.
- As well as this it would be useful to be able to use policy gradient methods to fine tune evolved solutions. To do this we need to be able to convert between out Model some other ML frame work model such as keras...

Add momentum to learning

see this blog for available options

add genome_seeder_from_ds

Add function that creates a genomes_seeder or population from DataStore and specified generation.

Make SIMPLEMutator

Simple mutator just selects the top n genomes and mutates each some number of times to refil the population.

Make into package

see

Add Interface for Mutator Class

Mutator Class should a) be an interface

from src import Population
from src import Mutator
from src import generate_neat_metric
from src import Model

class CustomlMutator(Mutator):
    def mutate(self, genome):
        # do something to genome

    def crossover(self, genome):
        # crossover genome

custom_mutator = CustomMutator()
.
.
.
custom_mutator(population)

Questions:

How do we consolodate the NEAT method of selecting weights and the REINFORCE-ES method?

Add different weight initalization schemes

imporve from_genes implementation

Currently datastore saves the genome structure as all nodes and edges includeing input and output nodes but from_genes hydrator factory expects num_inputs and num_outputs and that the nodes and edge lists passed are only hidden nodes. Hence to use from_nodes with datastore we have to do:

generation = ds.load(generation_in)
nodes, edges = last_gen['best_genome']
input_num = len([n for n in nodes if n[4] == 'input'])
output_num = len([n for n in nodes if n[4] == 'output'])
nodes = [n for n in nodes if n[4] == 'hidden']
genome = from_genes(
    nodes, edges,
    input_size=input_num,
    output_size=output_num,
    weight_low=-2,
    weight_high=2,
    depth=len(LAYER_DIMS))

Similarly computing the number of layers is not trivial and should be just be a value stored in some meta data either on the generation or the DataStore class itself.

change to_reduced_repr

should be a method not a property
change name

Weight and Bias Initialization in Population Class

Looks something like:

p = Population(weight_init_fn=Uniform(-2,2), 
               bias_init_fn=Uniform(0, 1))

related to #13

Add Adpt REINFORCE-ES

Same as REINFORCE-ES #19 but we select the highest perfoming solution from population that lies along the derivative vector rather than taking a fixed step size each time.

Add REINFORCE-ES algorthim objects

from here, here and here

ES Weights and biases updates

How do we consolodate the NEAT method of selecting weights and the REINFORCE-ES method?

Documentation

see

Mutator class should act on Population objects

Mutator Class should act on populations. This would look like:

from src import Population
from src.Neat import NEATMutator
from src import generate_neat_metric
from src import Model

def compute_fitness(genome):
    model = Model(genome)
    # compute fitness of model here
    return fitness

mutator = NEATMutator()
metric = generate_neat_metric()
population = Population(metric=metric)

for i in range(10):
    for genome in population.genomes:
        genome.fitness = compute_fitness(genome.to_reduced_repr)
    population.speciate()
    mutator(population)

Pretrain Network

Add some functionality to pretrain network to fit certain inital conditions. This is in order to solve the problem where all the networks within a certain population end up exhibiting similar behavours.

Add species object

Currently we're using a dict to store the data on each species this should be abstracted into a class becuase at the very least it'll enable code completion.

Integrate Multiprocessing docs and tool

The aspect of evolutionary algorithms applied to RL that benifits from multi processing is really the environment simulation step that has to be performed for each genome in the class. This is outside the scope of what this library tries to do but...

Should add examples in the docs.
Should have a simple tool that makes it easy out of the box.

class CustomPopulation(Population):
    def speciate(self):
       for genome in self.genomes:
            # do something here

p = CustomPopulation(metric=NEAT_metric())