qfettes / deeprl-tutorials Goto Github PK

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Jupyter Notebook 97.67% Python 2.33%

python3 pytorch reinforcement-learning deep-reinforcement-learning deep-q-network double-dqn multi-step-learning dueling-dqn noisy-networks prioritized-experience-replay

deeprl-tutorials's People

Contributors

Stargazers

Watchers

Forkers

eridgd datianshi21 huanghua1668 sjyoondeltar feynman0825 wh-forker zxhxw w0lv3r1nix baowangmath leiloong aznikline allensmile falconzyx johndpope wanjinchang zylhub collector-m nkcr7 wwwanghao 174high nathaliewang lab-of-professor-zhu luojianp for-research chocowu landoufulxf kailianghu kenhehuang flmdaybreak boozyguo daominglyu xingcheng1994 taylor-liu guanlongtianzi gaylordmarville che1qian2 jiaqun123 mclearning2 duobin farouqzaib ucla-rlcourse caomw yangyutu ayeps right0808 ehosseiniasl ttklm20 colouryen ashishpatel26 mohitzsh gzqhappy goncamateus eva-n27 albertwy cezny renhongquan jungguchoi pieromacaluso b-kartal tranhoangkhuongvn jb33k sameer-arora maxmatical jsupeng hilariouss liyaangy lydonlee liliya25 eliver8801 lovesophia daiyuandian congweilin chrisxthe caowgg yanchang-liang spateria benodry wenxyabc ersawant clementcmwan khuongnd kirilllzaitsev vivienzou1 bailiping lxmwust laynewong roshray liuqi8827 davidwang527 ss47816 nanxintin rindranil xzhou2018 holarissun anabur920 qwellk kristery ralami1859 teenspirit-hao zhangjiadi23

deeprl-tutorials's Issues

01.DQN.ipynb all_rewards, losses not defined?

In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped.

In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped and I suggest that you'd better use RMSprop as the optimizer and reduce the learning rate to make these RL model easier to converge.

Quantile-Rainbow: is gamma discounting for n-step rewards included?

This is a great repo for us pytorch users trying to learning RL and I really appreciate the cleaness of the code.

I spotted that in the Quantile-Rainbow notebook, there does not seem to be n-step gamma discounting as in the Rainbow notebook.
Is there a particular reason to to do
quantiles_next = batch_reward + (self.gamma*quantiles_next)
in the Agent cell in 10.Quantile-Rainbow.ipynb ?

Cuz I was expecting something like
quantiles_next = batch_reward + ((self.gamma**self.nsteps)*quantiles_next)

Thanks in advance!

DQN not learning on stacked frame inputs

Hello! I am trying to train the DQN model (01.DQN) on the Pong task. I changed the frame_stack arg in the wrap_deepmind function to True, however, the model does not learn anything. I was curious if you had any advice for this. Also, I was wondering why your default script uses frame_stack = False? All of the papers appear to recommend feeding 4x84x84 inputs to infer temporal components of the environment such as ball velocity.

Thanks for the nice readable repo!

Ask a few questions.

Hi,
I would like to ask some question as follow:

In 'compute_loss()' in DRQN.ipynb:

First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?

Second, Why to do 'loss = self.huber(diff)'?

Third, Why to mask first half of losses?

Thanks,
Ni

Running error in 03.Double_DQN.ipynb

IndexError                                Traceback (most recent call last)
 in 
     43         try:
     44             clear_output(True)
---> 45             plot_all_data(log_dir, env_id, 'DoubleDQN', config.MAX_FRAMES, bin_size=(10, 100, 100, 1), smooth=1, time=timedelta(seconds=int(timer()-start)), ipynb=True)
     46         except IOError:
     47             pass

c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in plot_all_data(folder, game, name, num_steps, bin_size, smooth, time, save_filename, ipynb)
    211     plt.rcParams.update(params)
    212 
--> 213     tx, ty = load_reward_data(folder, smooth, bin_size[0])
    214 
    215     if tx is None or ty is None:

c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in load_reward_data(indir, smooth, bin_size)
     54             for line in f:
     55                 tmp = line.split(',')
---> 56                 t_time = float(tmp[2])
     57                 tmp = [t_time, int(tmp[1]), float(tmp[0])]
     58                 datas.append(tmp)

IndexError: list index out of range

Model has no finish_nstep

Hi Quintin,

Class Model has no attribute finish_nstep or reset_hx. Would you please add them?

Thanks.
Sean

Thanks,
Ni

qfettes / deeprl-tutorials Goto Github PK

deeprl-tutorials's People

Contributors

Stargazers

Watchers

Forkers

deeprl-tutorials's Issues

01.DQN.ipynb all_rewards, losses not defined?

In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped.

Quantile-Rainbow: is gamma discounting for n-step rewards included?

DQN not learning on stacked frame inputs

Ask a few questions.

Running error in 03.Double_DQN.ipynb

Model has no finish_nstep

About code

Reward does not increase in Colab

Flickering Pong POMDP

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent