qfettes / deeprl-tutorials Goto Github PK
View Code? Open in Web Editor NEWContains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
01.DQN.ipynb all_rewards, losses not defined?
In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped and I suggest that you'd better use RMSprop as the optimizer and reduce the learning rate to make these RL model easier to converge.
This is a great repo for us pytorch users trying to learning RL and I really appreciate the cleaness of the code.
I spotted that in the Quantile-Rainbow notebook, there does not seem to be n-step gamma discounting as in the Rainbow notebook.
Is there a particular reason to to do
quantiles_next = batch_reward + (self.gamma*quantiles_next)
in the Agent cell in 10.Quantile-Rainbow.ipynb ?
Cuz I was expecting something like
quantiles_next = batch_reward + ((self.gamma**self.nsteps)*quantiles_next)
Thanks in advance!
Hello! I am trying to train the DQN model (01.DQN) on the Pong task. I changed the frame_stack
arg in the wrap_deepmind
function to True, however, the model does not learn anything. I was curious if you had any advice for this. Also, I was wondering why your default script uses frame_stack = False
? All of the papers appear to recommend feeding 4x84x84 inputs to infer temporal components of the environment such as ball velocity.
Thanks for the nice readable repo!
Hi,
I would like to ask some question as follow:
In 'compute_loss()' in DRQN.ipynb:
First. diff = (expected_q_values - current_q_values) :
Why the error needs to calculate every step in GRU but not last step?
Second, Why to do 'loss = self.huber(diff)'?
Third, Why to mask first half of losses?
Thanks,
Ni
IndexError Traceback (most recent call last)
in
43 try:
44 clear_output(True)
---> 45 plot_all_data(log_dir, env_id, 'DoubleDQN', config.MAX_FRAMES, bin_size=(10, 100, 100, 1), smooth=1, time=timedelta(seconds=int(timer()-start)), ipynb=True)
46 except IOError:
47 pass
c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in plot_all_data(folder, game, name, num_steps, bin_size, smooth, time, save_filename, ipynb)
211 plt.rcParams.update(params)
212
--> 213 tx, ty = load_reward_data(folder, smooth, bin_size[0])
214
215 if tx is None or ty is None:
c:\Users\Hene\Documents\GitHub\DeepRL-Tutorials\utils\plot.py in load_reward_data(indir, smooth, bin_size)
54 for line in f:
55 tmp = line.split(',')
---> 56 t_time = float(tmp[2])
57 tmp = [t_time, int(tmp[1]), float(tmp[0])]
58 datas.append(tmp)
IndexError: list index out of range
Hi Quintin,
Class Model has no attribute finish_nstep or reset_hx. Would you please add them?
Thanks.
Sean
In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?
I've made a Google Colab to train the 01.DQN.ipynb but the reward is not increasing.
Hi,
Do you know how to convert 'Pong' to 'Flickering Pong POMDP' in original paper?
Thanks,
Ni
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.