Giter Site home page Giter Site logo

qfettes / deeprl-tutorials Goto Github PK

View Code? Open in Web Editor NEW
1.0K 30.0 326.0 155.41 MB

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Jupyter Notebook 97.67% Python 2.33%
python3 pytorch reinforcement-learning deep-reinforcement-learning deep-q-network double-dqn multi-step-learning dueling-dqn noisy-networks prioritized-experience-replay

deeprl-tutorials's People

Contributors

qfettes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeprl-tutorials's Issues

Quantile-Rainbow: is gamma discounting for n-step rewards included?

This is a great repo for us pytorch users trying to learning RL and I really appreciate the cleaness of the code.

I spotted that in the Quantile-Rainbow notebook, there does not seem to be n-step gamma discounting as in the Rainbow notebook.
Is there a particular reason to to do
quantiles_next = batch_reward + (self.gamma*quantiles_next)
in the Agent cell in 10.Quantile-Rainbow.ipynb ?

Cuz I was expecting something like
quantiles_next = batch_reward + ((self.gamma**self.nsteps)*quantiles_next)

Thanks in advance!

DQN not learning on stacked frame inputs

Hello! I am trying to train the DQN model (01.DQN) on the Pong task. I changed the frame_stack arg in the wrap_deepmind function to True, however, the model does not learn anything. I was curious if you had any advice for this. Also, I was wondering why your default script uses frame_stack = False? All of the papers appear to recommend feeding 4x84x84 inputs to infer temporal components of the environment such as ball velocity.

Thanks for the nice readable repo!

Ask a few questions.

Hi,
I would like to ask some question as follow:

In 'compute_loss()' in DRQN.ipynb:

First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?

Second, Why to do 'loss = self.huber(diff)'?

Third, Why to mask first half of losses?

Thanks,
Ni

Running error in 03.Double_DQN.ipynb

IndexError                                Traceback (most recent call last)
 in 
     43         try:
     44             clear_output(True)
---> 45             plot_all_data(log_dir, env_id, 'DoubleDQN', config.MAX_FRAMES, bin_size=(10, 100, 100, 1), smooth=1, time=timedelta(seconds=int(timer()-start)), ipynb=True)
     46         except IOError:
     47             pass

c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in plot_all_data(folder, game, name, num_steps, bin_size, smooth, time, save_filename, ipynb)
    211     plt.rcParams.update(params)
    212 
--> 213     tx, ty = load_reward_data(folder, smooth, bin_size[0])
    214 
    215     if tx is None or ty is None:

c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in load_reward_data(indir, smooth, bin_size)
     54             for line in f:
     55                 tmp = line.split(',')
---> 56                 t_time = float(tmp[2])
     57                 tmp = [t_time, int(tmp[1]), float(tmp[0])]
     58                 datas.append(tmp)

IndexError: list index out of range

Model has no finish_nstep

Hi Quintin,

Class Model has no attribute finish_nstep or reset_hx. Would you please add them?

Thanks.
Sean

About code

In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?

Flickering Pong POMDP

Hi,

Do you know how to convert 'Pong' to 'Flickering Pong POMDP' in original paper?

Thanks,
Ni

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.