Giter Site home page Giter Site logo

alwaysbyx / optimization-and-search Goto Github PK

View Code? Open in Web Editor NEW
18.0 1.0 2.0 81.04 MB

Implementation and visualization (some demos) of search and optimization algorithms.

License: MIT License

Python 100.00%
optimization newton-method conjugate-gradient-descent simulated-annealing-algorithm cross-entropy-method search-gradient a-star-algorithm value-iteration policy-iteration

optimization-and-search's Introduction

Optimization-and-Search

Implementation and visualization of optimization algorithms.
please add MAthJax Plugin for Github to your browser.

Table of Contents

1. Numerical Optimization

Here we are trying to solve simple quadratic problem.
$$\arg \text{min } \frac{1}{2}x^TQx + b^Tx$$ $$Q \geq 0, x \in R^n$$

The animation(left) is tested on N=10, (right) for n=2;

Gradient Descent

using first-order gradient and learning rate

Conjugate Descent

$x^{k+1} := x^k + a_kd^k$
using line search to compute the step size $\alpha$
$$a_k = \frac{\nabla f(x^k)^Td^k}{(d^k)^TQd^k}$$
find conjugate direction
$$d^{k+1} = -\nabla f(x^{k+1}) + \beta_kd^k$$
$$ \beta_k = \frac{\nabla f(x^{k+1})^T\nabla f(x^{k+1})}{\nabla f(x^k)^T\nabla f(x^k)} \text{ (FR)}$$

Newton Method

using second-order gradient
$x^{k+1} := x^k + d^k$
$d^k = -[\nabla^2 f(x^k)]^{-1}\nabla f(x^k)$

2. Stochastic Search

Here we try to use stochastic search to solve TSP problems.

Simulated Annealing

Cross-Entropy Methods

cross entropy using less steps to get converged

Search Gradient

Here trying to find lowest position. Since for TSP, I cannot find $\nabla_\theta \log (p_\theta (z_i))$. If you have any idea please be free to comment. Thanks!

3. Classical Search

A* search algorithm (A star search algorithm)

The code is writen according to the slide in CSE257, UCSD.

minimax search

4. Dynamic Programming

Value iteration

The implementation here, the reward for all the tile is -0.04, and credit tile is 4, obstable tile is -5. You can just press space key to change the [credit] or [obstable] choice and click left to add credit/obstable tile and remove them.

We can see we get the optimal policy before our utility function converges.

Policy iteration

5.

Monte Carlo Policy Evaluation

  • Policy: $\pi (s_i) = a_i$
  • Trajectories: $s_0, s_1, s_2, .., s_T$
  • Rewards: $R(s_0) + \gamma R(s_1) + .. + \gamma^TR(s_T)$

Follow policy and get a lot of samples of trajectories, and estimate the expectation $$V^{\pi}(s_0) = \mathbb{E}\pi [\sum{i=0}^T\gamma^iR(s_i)]$$

Temporal Difference Policy Evaluation

The difference between TD and MC method is that TD could update in every step and do not necessarily wait for a whole trajectory to generate.

Note that the learning rate $\alpha$ should decrease with the number of visits to a state, in this case the method is guaranteed to converge to the true action values by the law of large numbers.

Tabular Q learning

Off policy method $$V(s) = \max_a (\mathbb{E}[R(s)+\gamma V(s'|s,a)])$$

6.

Deep Q learning

In this case I used Deep Q network to train a self-driving car. The environment is created by NeuralNine (github), thought he used NEAT package to train the car.
The demo is below: you can modify the network and train your own car!
Random case:

DQL case:

7.

Monte Carlo Tree Search

8.

DPLL

optimization-and-search's People

Contributors

alwaysbyx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.