Giter Site home page Giter Site logo

goktug97 / pepg-es Goto Github PK

View Code? Open in Web Editor NEW
15.0 2.0 0.0 11.46 MB

Python Implementation of Parameter-exploring Policy Gradients Evolution Strategy

License: MIT License

Python 100.00%
evolution-strategies neural-network artificial-intelligence policy-gradient

pepg-es's Introduction

Parameter-exploring Policy Gradients

Python Implementation of Parameter-exploring Policy Gradients [3] Evolution Strategy

Bipedal

Reward: 189.16

Requirements

  • Python >= 3.6
  • Numpy

Optional

  • gym
  • mpi4py

Install

  • From PyPI
pip3 install pepg-es
  • From Source
git clone https://github.com/goktug97/PEPG-ES
cd PEPG-ES
python3 setup.py install --user

About Implementation

I implemented several things differently from the original paper;

  • Applied rank transformation [1] to the fitness scores.
  • Used Adam [2] optimizer to update the mean.
  • Weight decay is applied to the mean, similar to [4].

Usage

Refer to PEPG-ES/examples folder for more complete examples.

XOR Example

  • Find Neural Network parameters for XOR Gate.
  • Black-box optimization algorithms like PEPG are competitive in the area of reinforcement learning because they don't require backpropagation to calculate the gradients. In supervised learning using backpropagation is faster and more reliable. Thus, using backpropagation to solve the XOR problem would be faster. I demonstrated library by solving XOR because it was easy and understandable.
from pepg import PEPG, NeuralNetwork, Adam, sigmoid

import numpy as np


network = NeuralNetwork(input_size = 2, output_size = 1, hidden_sizes = [2],
                        hidden_activation = sigmoid,
                        output_activation = sigmoid)

# Adam Optimizer is the default optimizer, it is written for the example
optimizer_kwargs = {'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08} # Adam Parameters

es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = Adam,
          optimizer_kwargs = optimizer_kwargs)

truth_table = [[0, 1],[1, 0]]
solution_found = False

while True:
    print(f'Step: {es.step}')
    solutions = es.get_parameters()
    rewards = []
    for solution in solutions:
        network.weights = solution
        error = 0
        for input_1 in range(len(truth_table)):
            for input_2 in range(len(truth_table[0])):
                output = int(round(network([input_1, input_2])[0]))
                error += abs(truth_table[input_1][input_2] - output)
        reward = (4 - error) ** 2
        rewards.append(reward)
    es.update(rewards)
    if es.best_fitness == 16:
        print('Solution Found')
        print(f'Parameters: {es.best_theta}')
        break
  • Output:
Step: 0
Step: 1
Step: 2
Step: 3
Step: 4
Step: 5
Step: 6
Step: 7
Step: 8
Step: 9
Step: 10
Step: 11
Step: 12
Step: 13
Step: 14
Step: 15
Step: 16
Step: 17
Step: 18
Step: 19
Step: 20
Solution Found
Parameters: [ 2.69265669 -2.80113868  2.95878579 -4.21097193 -4.62368205  0.72005261
 -0.66755995 -2.50694535  0.39457738]

Documentation

PEPG Class

es = PEPG(self, population_size, theta_size,
          mu_init, sigma_init, mu_lr,
          sigma_lr, l2_coeff = 0.005,
          optimizer = Adam, optimizer_kwargs = {})
  • Parameters:
    • population_size: int: Population size of the evolution strategy.
    • theta_size int: Number of parameters that will be optimized.
    • mu_init float: Initial mean.
    • sigma_init float: Initial sigma.
    • mu_lr float: Learning rate for the mean.
    • sigma_lr float: Learning rate for the sigma.
    • l2_coeff float: Weight decay coefficient.
    • optimizer Optimizer: Optimizer to use
    • optimizer_kwargs Dict[str, Any]: Parameters for optimizer except learning rate.

solutions = self.get_parameters(self)
  • Creates symmetric samples around the mean and returns a numpy array with the size of [population_size, theta_size]

self.update(self, rewards)
  • Parameters:
    • rewards: List[float]: Rewards for the given solutions.
  • Update the mean and the sigma.

self.save_checkpoint(self)
  • Creates a checkpoint and save it into created time.time().checkpoint file.

es = PEPG.load_checkpoint(cls, filename)
  • Creates a new PEPG class and loads the checkpoint.

self.save_best(self, filename)
  • Saves the best theta and the mu and the sigma that used to create the best theta.

theta, mu, sigma = PEPG.load_best(cls, filename)
  • Load the theta, the mu, and the sigma arrays from the given file.

NeuralNetwork Class

NeuralNetwork(self, input_size, output_size, hidden_sizes = [],
              hidden_activation = lambda x: x,
              output_activation = lambda x: x,
              bias = True):
  • Parameters:
    • input_size: int: Input size of network.
    • output_size: int: Output size of the network.
    • hidden_sizes: List[int]: Sizes for the hidden layers.
    • hidden_activation: Callable[[float], float]: Activation function used in hidden layers.
    • output_activation: Callable[[float], float]: Activation function used at the output.
    • bias: bool: Add bias node.

self.save_network(self, filename)
  • Save the network to a file.

network = NeuralNetwork.load_network(cls, filename)
  • Creates a new NeuralNetwork class and loads the given network file.

Custom Optimizer Example

from pepg import PEPG, Optimizer, NeuralNetwork

class CustomOptimizer(Optimizer):
    def __init__(self, alpha, parameter, another_parameter):
        self.alpha = alpha
        self.parameter = parameter
        self.another_parameter = another_parameter

    def __call__(self, gradients):
        gradients = (gradients + self.parameter) * self.another_parameter
        return -self.alpha * gradients

network = NeuralNetwork(input_size = 2, output_size = 1)

optimizer_kwargs = {'parameter': 0.3, 'another_parameter': 0.2}
es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0.0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = CustomOptimizer,
          optimizer_kwargs = optimizer_kwargs)

References

  1. Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters and Jurgen Schmidhuber. Natural Evolution Strategies. 2014
  2. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. 2014
  3. F. Sehnke, C. Osendorfer, T. Ruckstiess, A. Graves, J. Peters and J. Schmidhuber. Parameter-exploring policy gradients. 2010
  4. Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor and Ilya Sutskever. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. 2017

pepg-es's People

Contributors

goktug97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.