Some basic RL algorithms including DP, MC, TD, SARSA, Q-Learning, DQN, A3C
- DP (Policy Evaluation & Policy Iteration & Value Iteration)
- MC & TD (First-visit & Every-visit MC, TD(0))
- Model Free Control (SARSA & Sarsa(lambda), Q-Learning)
- DQN (basic DQN, double DQN, dueling DQN, PER DQN)
- A3C (RMSprop, Adam, SharedRMSprop, SharedAdam)
- python 3.6
- numpy
- pytorch 1.1.0
- gym
- matplotlib