The machina's discuss from deepx-inc

iterate_rnn in Traj class makes iterator of batches. A tail of the batches are zero padded for arranging length of episodes. For this reason we couldn't control number of steps in a batch.

Add N-distill

Adding N-distill according to https://arxiv.org/abs/1902.02186

Add next observation to trajectory data structure
Directly compute gradient using the given update rule (This is the difference compared to Teacher distill, on-policy distill and entropy regularised distillation
Update nn parameters accordingly
Test is policy distillation works using available teacher policy

More general hs (hidden state)

hs should be tuple whose length is 2 in current machina's implementation. It is compatible to LSTM. But we have to implement more general case of memory architectures something like Memory Augmented Network, GRU (hidden state's length is 1).

Add names of implemented algorithms to readme

EpiSamplerでの待機時に、sleepを入れると、cpu占有率が下がるのではないか？

Transparent environment

Environment is wrapped by many wrapper envs, so it is difficult to access original environment.

loggerのplotで、学習時間が多くなると、csv読み込みに時間がかかる

plotの機構を、本体とはべつのプロセス（multi processing）で回す（学習とは非同期）にする
もしくは、plotはべつのスクリプトの機能にしてしまう

Wrapper environment in which observation includes action

Remove pds (probabilistic distributions) class and incorporating to pol (policy) class.

Output of network should represent probabilistic distribution such as mean and std for gaussian for now. However if something like flow is used for policy, it is impossible to implement it without fixing loss functional. because flow's lld (log likelihood) is computed through network with determinant of jacobian.

Diversity is All You Need: Learning Skills without a Reward Function

Paper

https://arxiv.org/abs/1802.06070

Branch

diayn

Adversarial Inverse Reinforcement Learning

Paper
https://arxiv.org/abs/1812.07252
Branch
airl

Learning Self-Imitating Diverse Policies

Paper
https://arxiv.org/pdf/1805.10309.pdf

machinaの利点を反映させたサンプルコードを作成する

異なる環境で動くエージェントの学習
異なるアルゴリズムを組み合わせた学習
hyper parameterをチューニングしながらの学習

Data parallel on CEMDeteminisiticSAVfunc

Data parallel is not working on CEMDeteminisiticSAVfunc.

`lf.likelihood` seems to be log-likelihood

Is it valid?

log_std referenced before assignment

rnn方策でdeterministic_ac_realをするとlog_stdが参照されず、エラーを起こします。

machina/machina/pols/gaussian_pol.py

Line 93 in a471ead

return mean_real, mean, dict(mean=mean, log_std=log_std, hs=hs)

adamwの実装確認依頼

adamwなんですが、

machina/machina/optims/adamw.py

Line 69 in 8e31ddd

p.data.add_(-group['weight_decay'], p.data)

-weight_decayの部分に、η(=step_size)がかかっていないような気がしていますが、
どうでしょう？？
原著では、
-η*weight_decay * x
のような感じになっていると思っています。

performance check

現在、traj内のindexなどもgpuに乗せているが、パフォーマンスを見て、cpuに変える場合も検討する

detachを使っているところで、torch.no_gradを使う

maximum a posteriori policy optimization

https://openreview.net/forum?id=S1ANxQW0b

Data型の統一

現在Off PolicyとOn Policyで異なった�Data型を用いているが、その部分を統一したクラスにし、メソッドによってDataにadvantage functionや、returnなどを追加するように変更する

quick startの説明

GAE_Dataのpreprocessメソッドを使うとSegmentation Fault (core dumped)が起きる

python example/run_trpo.pyを実行したとき
GAE_Dataのpreprocessメソッドを使うとSegmentation Fault (core dump)が起きます。
preprocessメソッドの下記の箇所でセグフォが起こっていて、どうやらvfで推論しようとするとセグフォが起こるようです。

all_path_vs = [vf(torch.tensor(path['obs'], dtype=torch.float,
                                           device=get_device())).cpu().numpy() for path in self.paths]

下記のコードでもセグフォが起こったのでvfの推論時にセグフォが起こると考えて間違いなさそうです。

vf(torch.tensor(self.paths[0]['obs'], dtype=torch.float,
                                           device=get_device())

なお、手元のノートPCではうまくいきましたが、サーバーで実行するとエラーが出ます。

Multi-node Sampler

Gaussian Polのagent_infoのスケーリングについて

現実環境でデプロイする時Gaussian noiseを入れていない出力かつaction spaceでスケーリングされたものが望ましい
agent_infoにmean_realを追加し、スケーリングされたmeanを出力する

None Error in Categorical and rnn policy with cpu

cd example
python run_ppo.py --env_name CartPole-v0 --rnn --cuda -1

And then, the error below occurs

Traceback (most recent call last):
  File "run_ppo.py", line 153, in <module>
    kl_beta = result_dict['new_kl_beta']
  File "/home/rarilurelo/.pythons/Python-3.5.2/entity/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/raid/work/machina/machina/utils.py", line 47, in measure
    yield
  File "run_ppo.py", line 149, in <module>
    optim_pol=optim_pol, optim_vf=optim_vf, epoch=args.epoch_per_iter, batch_size=args.batch_size, max_grad_norm=args.max_grad_norm)
  File "/raid/work/machina/machina/algos/ppo_clip.py", line 58, in train
    pol_loss = update_pol(pol, optim_pol, batch, clip_param, ent_beta, max_grad_norm)
  File "/raid/work/machina/machina/algos/ppo_clip.py", line 30, in update_pol
    pol_loss = lf.pg_clip(pol, batch, clip_param, ent_beta)
  File "/raid/work/machina/machina/loss_functional.py", line 44, in pg_clip
    _, _, pd_params = pol(obs, h_masks=h_masks)
  File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/raid/work/machina/machina/pols/categorical_pol.py", line 54, in forward
    ac = self.pd.sample(dict(pi=pi))
  File "/raid/work/machina/machina/pds/categorical_pd.py", line 30, in sample
    pi_sampled = Categorical(probs=pi).sample(sample_shape)
  File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:298

Because of None passing via pi.

Script for taking movies of learned policy

dependencyにpybulletが入っておらず、run_trpo.pyがそのままではうごかない。

#dependencyにpybulletが入っておらず、run_trpo.pyがそのままではうごかない。

Test for new algorithm

There no test on MPC, behavior clone, gail, and airl.

pytorch v0.4

Saving method in Traj class

gpu番号とか

argument of reduction in loss_functional

none, element_wise_mean, sum

Testing policy distillation

@takerfume
I tried to run nosetests -x tests, but it seems not working right now. Or did i do something wrong?

E
======================================================================
ERROR: Failure: ModuleNotFoundError (No module named 'tests')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/loader.py", line 406, in loadTestsFromName
    module = resolve_name(addr.module)
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/util.py", line 312, in resolve_name
    module = __import__('.'.join(parts_copy))
ModuleNotFoundError: No module named 'tests'

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)```

OUActionNoiseの引数について

ac_spaceに対して、shapeを渡す実装になってしまっている

Temporal difference model, Hindsight experience replay

TDM (https://arxiv.org/abs/1802.09081)
HER (https://arxiv.org/abs/1707.01495)

Allocate Traj's tensor to cpu

Traj's tensor is now allocated to gpu for fast computing. However it is difficult to allocate all tensors of Off-policy traj to gpu.

Solution

allocating traj's tensor to cpu
setting max_length of traj's tensor

Write meanings of args

Write meanings of args in a code of example/run_*.py
Contributors should write comment in the codes they themselves wrote.

Branch
write_meaning_of_args

pol_loss = torch.mean(pol_loss * out_masks)

We have to calculate this like below.

timestep = torch.sum(out_masks, dim=0)
pol_loss = torch.sum(pol_loss * out_masks) / (timestep * batchsize)

Add Explanation about Imitation Learning

Explain steps about how to make expert trajectories.
Where should I write?

Contents is like this.

Download the model of expert from here(link).
Store the expert model to data/expert_pols/
run python expert_epis_make.py

deepx-inc / machina Goto Github PK

machina's Issues

Paper

Branch

Recommend Projects

Recommend Topics

Recommend Org