deepx-inc / machina Goto Github PK
View Code? Open in Web Editor NEWControl section: Deep Reinforcement Learning framework
License: MIT License
Control section: Deep Reinforcement Learning framework
License: MIT License
iterate_rnn
in Traj class makes iterator of batches. A tail of the batches are zero padded for arranging length of episodes. For this reason we couldn't control number of steps in a batch.
Adding N-distill according to https://arxiv.org/abs/1902.02186
hs should be tuple whose length is 2 in current machina's implementation. It is compatible to LSTM. But we have to implement more general case of memory architectures something like Memory Augmented Network, GRU (hidden state's length is 1).
Environment is wrapped by many wrapper envs, so it is difficult to access original environment.
plotの機構を、本体とはべつのプロセス(multi processing)で回す(学習とは非同期)にする
もしくは、plotはべつのスクリプトの機能にしてしまう
Output of network should represent probabilistic distribution such as mean and std for gaussian for now. However if something like flow is used for policy, it is impossible to implement it without fixing loss functional. because flow's lld (log likelihood) is computed through network with determinant of jacobian.
Branch
airl
Data parallel is not working on CEMDeteminisiticSAVfunc.
Is it valid?
rnn方策でdeterministic_ac_realをするとlog_stdが参照されず、エラーを起こします。
machina/machina/pols/gaussian_pol.py
Line 93 in a471ead
adamwなんですが、
machina/machina/optims/adamw.py
Line 69 in 8e31ddd
現在、traj内のindexなどもgpuに乗せているが、パフォーマンスを見て、cpuに変える場合も検討する
現在Off PolicyとOn Policyで異なった�Data型を用いているが、その部分を統一したクラスにし、メソッドによってDataにadvantage functionや、returnなどを追加するように変更する
python example/run_trpo.pyを実行したとき
GAE_Dataのpreprocessメソッドを使うとSegmentation Fault (core dump)が起きます。
preprocessメソッドの下記の箇所でセグフォが起こっていて、どうやらvfで推論しようとするとセグフォが起こるようです。
all_path_vs = [vf(torch.tensor(path['obs'], dtype=torch.float,
device=get_device())).cpu().numpy() for path in self.paths]
下記のコードでもセグフォが起こったのでvfの推論時にセグフォが起こると考えて間違いなさそうです。
vf(torch.tensor(self.paths[0]['obs'], dtype=torch.float,
device=get_device())
なお、手元のノートPCではうまくいきましたが、サーバーで実行するとエラーが出ます。
現実環境でデプロイする時Gaussian noiseを入れていない出力かつaction spaceでスケーリングされたものが望ましい
agent_infoにmean_realを追加し、スケーリングされたmeanを出力する
cd example
python run_ppo.py --env_name CartPole-v0 --rnn --cuda -1
And then, the error below occurs
Traceback (most recent call last):
File "run_ppo.py", line 153, in <module>
kl_beta = result_dict['new_kl_beta']
File "/home/rarilurelo/.pythons/Python-3.5.2/entity/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/raid/work/machina/machina/utils.py", line 47, in measure
yield
File "run_ppo.py", line 149, in <module>
optim_pol=optim_pol, optim_vf=optim_vf, epoch=args.epoch_per_iter, batch_size=args.batch_size, max_grad_norm=args.max_grad_norm)
File "/raid/work/machina/machina/algos/ppo_clip.py", line 58, in train
pol_loss = update_pol(pol, optim_pol, batch, clip_param, ent_beta, max_grad_norm)
File "/raid/work/machina/machina/algos/ppo_clip.py", line 30, in update_pol
pol_loss = lf.pg_clip(pol, batch, clip_param, ent_beta)
File "/raid/work/machina/machina/loss_functional.py", line 44, in pg_clip
_, _, pd_params = pol(obs, h_masks=h_masks)
File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/raid/work/machina/machina/pols/categorical_pol.py", line 54, in forward
ac = self.pd.sample(dict(pi=pi))
File "/raid/work/machina/machina/pds/categorical_pd.py", line 30, in sample
pi_sampled = Categorical(probs=pi).sample(sample_shape)
File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/distributions/categorical.py", line 110, in sample
sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:298
Because of None passing via pi.
#dependencyにpybulletが入っておらず、run_trpo.pyがそのままではうごかない。
There no test on MPC, behavior clone, gail, and airl.
gpu番号とか
none, element_wise_mean, sum
@takerfume
I tried to run nosetests -x tests, but it seems not working right now. Or did i do something wrong?
E
======================================================================
ERROR: Failure: ModuleNotFoundError (No module named 'tests')
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/loader.py", line 406, in loadTestsFromName
module = resolve_name(addr.module)
File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/util.py", line 312, in resolve_name
module = __import__('.'.join(parts_copy))
ModuleNotFoundError: No module named 'tests'
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (errors=1)```
ac_spaceに対して、shapeを渡す実装になってしまっている
Traj's tensor is now allocated to gpu for fast computing. However it is difficult to allocate all tensors of Off-policy traj to gpu.
Solution
Write meanings of args in a code of example/run_*.py
Contributors should write comment in the codes they themselves wrote.
When RNN is used, loss is averaged through (timestep, batchsize). However steps after terminate are masked by output_masks. Episode length must be arranged same length for using RNN.
pol_loss = torch.mean(pol_loss * out_masks)
We have to calculate this like below.
timestep = torch.sum(out_masks, dim=0)
pol_loss = torch.sum(pol_loss * out_masks) / (timestep * batchsize)
Explain steps about how to make expert trajectories.
Where should I write?
Contents is like this.
Download the model of expert from here(link).
Store the expert model to data/expert_pols/
run python expert_epis_make.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.