Giter Site home page Giter Site logo

wujingda / human-in-the-loop-deep-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
44.0 2.0 10.0 2.25 MB

(Engineering) Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving

License: GNU General Public License v3.0

Python 100.00%
human-in-the-loop reinforcement-learning

human-in-the-loop-deep-reinforcement-learning's Issues

a bug about critic update

Hello,

I find your work is really helpful and I really appreciate it, however I found a bug at critic update stage which affect the final performance.

It is in TD3HUG.py at L80-L81

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(0, 1)
a_ = (self.actor_target(bs_).detach() + noise1).clamp(0, 1)

I think the first parameter in clamp should be the lower limit rather than 0, and I think noise1 should use the NOISE_CLIP hyperparameter.

I change these two lines into

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(-self.noise_clip, self.noise_clip) # self.noise_clip refer to the NOISE_CLIP hyperparameter
a_ = (self.actor_target(bs_).detach() + noise1).clamp(-1, 1)

There might be also bugs about clamp elsewhere, but I didn't check.

Would be very appreciated if you look into this.

论文程序请教

吴同学,

您好!

您发表的论文《Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving》十分精彩。也非常感谢您无私的分享了论文的相关程序,我在学习您的算法时,可能发现了程序的一个小问题。主要是当算法训练的一个episode结束后,需要更新replay memory buffer时,errors在脚本中没有被定义。我附上了相应的报错信息。

File "...\TD3_based_DRL\TD3.py", line 115, in learn
self.memory.batch_update(tree_idx, abs(errors.detach().cpu().numpy()) )
NameError: name 'errors' is not defined

不知您是否有空看一看程序这个部分?非常感谢您的帮助。

question about actor loss

Hi,
In the paper, actor loss is
1681374070503_4CE60915-EE65-45c9-B8AF-3BBA26A44F66
but the code that calculate actor loss for human intervention steps didn't consider the first term (see https://github.com/wujingda/Human-in-the-loop-Deep-Reinforcement-Learning/blob/main/TD3_based_DRL/TD3HUG.py#L148)
Also, the human intervention weight in actor loss in the code has a soft update coefficient which isn't included in the paper, and I don't understand what this coefficient is for (see https://github.com/wujingda/Human-in-the-loop-Deep-Reinforcement-Learning/blob/main/TD3_based_DRL/TD3HUG.py#L144).
Are these bugs in the code or some tricks that I misunderstand?
Looking forward for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.