Double Deep Q Network from Deep Mind's "Playing Atari with Deep Reinforcement Learning" dec 2013 with latest improvements from 2016 (target network periodical update and priorized experience replay). The algorithm is first validated on pendulum-v0 and then applied to the CarRacing-v0.
CarRacing-v0:
Actions: Steering: real value in [-1, 1] - Gas: real value in [0, 1] - Brake: real value in [0, 1]
Observations: STATE_W = 96 * STATE_H = 96 * 3 : RGB Image
Reward: +1000/N for every N tile on track else -0,1
- CarConfig.py: Global configuration
- CarRacing_RL_DDQN_linearAct.py: main file
- brain_model.png: Keras plot of brain's CNN
- SumTree.py: Class for priorized memory replay
As the training time revealed excessive, a simpler approach was adopted: the steering policiy is learned by imitation learning; throttle and braking by reinforcement learning.
RLImitation folder:
- CarRacing_ImitationPolicy.py: steering policy by imitation learning (see also https://github.com/hchkaiban/CarRacingImitationLearning)
- CarRacing_RL_Imitation.py: DDQN applied to throttle and brake
- RL_IM_cp500.webm: Video demo
Scatter plot of discretized actions function of the steering angle (note that the car accelerates mostly around 0°)
Three challenging discreet actions were chosen in order to prove that the algorithm learns a meaningful way to synchronize them: strong braking, fast acceleration or free wheeling (no action). As shown by the scatter plot and demo video, it manages to synchronize them with the steering policy: after some training time the car drives at decent speed, both on training and random tracks, does not immobilize anymore and is able to recover most of the time from off-road situations. Longer training, more data and smoother discrete actions would keep improving the performances.
Thanks to Jaara for the useful repo: https://github.com/jaara.