Giter Site home page Giter Site logo

machina's Issues

Managing number of steps in a batch

iterate_rnn in Traj class makes iterator of batches. A tail of the batches are zero padded for arranging length of episodes. For this reason we couldn't control number of steps in a batch.

Add N-distill

Adding N-distill according to https://arxiv.org/abs/1902.02186

  • Add next observation to trajectory data structure
  • Directly compute gradient using the given update rule (This is the difference compared to Teacher distill, on-policy distill and entropy regularised distillation
  • Update nn parameters accordingly
  • Test is policy distillation works using available teacher policy

More general hs (hidden state)

hs should be tuple whose length is 2 in current machina's implementation. It is compatible to LSTM. But we have to implement more general case of memory architectures something like Memory Augmented Network, GRU (hidden state's length is 1).

Transparent environment

Environment is wrapped by many wrapper envs, so it is difficult to access original environment.

adamwの実装確認依頼

adamwなんですが、

p.data.add_(-group['weight_decay'], p.data)

-weight_decayの部分に、η(=step_size)がかかっていないような気がしていますが、
どうでしょう??
原著では、
-η*weight_decay * x
のような感じになっていると思っています。

performance check

現在、traj内のindexなどもgpuに乗せているが、パフォーマンスを見て、cpuに変える場合も検討する

Data型の統一

現在Off PolicyとOn Policyで異なった�Data型を用いているが、その部分を統一したクラスにし、メソッドによってDataにadvantage functionや、returnなどを追加するように変更する

GAE_Dataのpreprocessメソッドを使うとSegmentation Fault (core dumped)が起きる

python example/run_trpo.pyを実行したとき
GAE_Dataのpreprocessメソッドを使うとSegmentation Fault (core dump)が起きます。
preprocessメソッドの下記の箇所でセグフォが起こっていて、どうやらvfで推論しようとするとセグフォが起こるようです。

all_path_vs = [vf(torch.tensor(path['obs'], dtype=torch.float,
                                           device=get_device())).cpu().numpy() for path in self.paths]

下記のコードでもセグフォが起こったのでvfの推論時にセグフォが起こると考えて間違いなさそうです。

vf(torch.tensor(self.paths[0]['obs'], dtype=torch.float,
                                           device=get_device())

なお、手元のノートPCではうまくいきましたが、サーバーで実行するとエラーが出ます。

Gaussian Polのagent_infoのスケーリングについて

現実環境でデプロイする時Gaussian noiseを入れていない出力かつaction spaceでスケーリングされたものが望ましい
agent_infoにmean_realを追加し、スケーリングされたmeanを出力する

None Error in Categorical and rnn policy with cpu

cd example
python run_ppo.py --env_name CartPole-v0 --rnn --cuda -1

And then, the error below occurs

Traceback (most recent call last):
  File "run_ppo.py", line 153, in <module>
    kl_beta = result_dict['new_kl_beta']
  File "/home/rarilurelo/.pythons/Python-3.5.2/entity/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/raid/work/machina/machina/utils.py", line 47, in measure
    yield
  File "run_ppo.py", line 149, in <module>
    optim_pol=optim_pol, optim_vf=optim_vf, epoch=args.epoch_per_iter, batch_size=args.batch_size, max_grad_norm=args.max_grad_norm)
  File "/raid/work/machina/machina/algos/ppo_clip.py", line 58, in train
    pol_loss = update_pol(pol, optim_pol, batch, clip_param, ent_beta, max_grad_norm)
  File "/raid/work/machina/machina/algos/ppo_clip.py", line 30, in update_pol
    pol_loss = lf.pg_clip(pol, batch, clip_param, ent_beta)
  File "/raid/work/machina/machina/loss_functional.py", line 44, in pg_clip
    _, _, pd_params = pol(obs, h_masks=h_masks)
  File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/raid/work/machina/machina/pols/categorical_pol.py", line 54, in forward
    ac = self.pd.sample(dict(pi=pi))
  File "/raid/work/machina/machina/pds/categorical_pd.py", line 30, in sample
    pi_sampled = Categorical(probs=pi).sample(sample_shape)
  File "/home/rarilurelo/.virtuals/py3/lib/python3.5/site-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:298

Because of None passing via pi.

Testing policy distillation

@takerfume
I tried to run nosetests -x tests, but it seems not working right now. Or did i do something wrong?

E
======================================================================
ERROR: Failure: ModuleNotFoundError (No module named 'tests')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/loader.py", line 406, in loadTestsFromName
    module = resolve_name(addr.module)
  File "/home/pierre/anaconda3/lib/python3.7/site-packages/nose/util.py", line 312, in resolve_name
    module = __import__('.'.join(parts_copy))
ModuleNotFoundError: No module named 'tests'

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)```

Allocate Traj's tensor to cpu

Traj's tensor is now allocated to gpu for fast computing. However it is difficult to allocate all tensors of Off-policy traj to gpu.

Solution

  1. allocating traj's tensor to cpu
  2. setting max_length of traj's tensor

Write meanings of args

Write meanings of args in a code of example/run_*.py
Contributors should write comment in the codes they themselves wrote.

  • Branch
    write_meaning_of_args

Inappropriate mean in loss_functional with rnn

When RNN is used, loss is averaged through (timestep, batchsize). However steps after terminate are masked by output_masks. Episode length must be arranged same length for using RNN.

pol_loss = torch.mean(pol_loss * out_masks)

We have to calculate this like below.

timestep = torch.sum(out_masks, dim=0)
pol_loss = torch.sum(pol_loss * out_masks) / (timestep * batchsize)

Add Explanation about Imitation Learning

Explain steps about how to make expert trajectories.
Where should I write?

Contents is like this.

Download the model of expert from here(link).
Store the expert model to data/expert_pols/
run python expert_epis_make.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.