Hi, Apologize me if the question is a little dumb. But I can't figure out what's g

question about test about pytorch-maml-rl HOT 5 CLOSED

tristandeleu commented on August 23, 2024

question about test

from pytorch-maml-rl.

Comments (5)

tristandeleu commented on August 23, 2024 2

MultiTaskSampler, which is responsible for sampling the trajectories, is doing adaptation locally in each worker.

pytorch-maml-rl/maml_rl/samplers/multi_task_sampler.py

Lines 251 to 275 in 0c2c7dd

    
           # Sample the training trajectories with the initial policy and adapt the 
        
           # policy to the task, based on the REINFORCE loss computed on the 
        
           # training trajectories. The gradient update in the fast adaptation uses 
        
           # `first_order=True` no matter if the second order version of MAML is 
        
           # applied since this is only used for sampling trajectories, and not 
        
           # for optimization. 
        
           params = None 
        
           for step in range(num_steps): 
        
               train_episodes = self.create_episodes(params=params, 
        
                                                     gamma=gamma, 
        
                                                     gae_lambda=gae_lambda, 
        
                                                     device=device) 
        
               train_episodes.log('_enqueueAt', datetime.now(timezone.utc)) 
        
               # QKFIX: Deep copy the episodes before sending them to their 
        
               # respective queues, to avoid a race condition. This issue would  
        
               # cause the policy pi = policy(observations) to be miscomputed for 
        
               # some timesteps, which in turns makes the loss explode. 
        
               self.train_queue.put((index, step, deepcopy(train_episodes))) 
        
               with self.policy_lock: 
        
                   loss = reinforce_loss(self.policy, train_episodes, params=params) 
        
                   params = self.policy.update_params(loss, 
        
                                                      params=params, 
        
                                                      step_size=fast_lr, 
        
                                                      first_order=True)

So in test.py, you do get both trajectories before and after adaptation with the simple call to MultiTaskSampler. And with a few changes to test.py you can even use different number of gradient steps for adaptation by changing num_steps in your call to sampler.sample().

from pytorch-maml-rl.

Maryamr314 commented on August 23, 2024

Thanks, That was really helpful.

from pytorch-maml-rl.

Maryamr314 commented on August 23, 2024

Sorry for opening this issue again but after changing num_steps I didn't get better results!!

(number near to MAML show num-batches)

from pytorch-maml-rl.

tristandeleu commented on August 23, 2024

What is the environment? Making sure you get better performance with a larger number of gradient steps at test time is not something I tested.

from pytorch-maml-rl.

Maryamr314 commented on August 23, 2024

Sorry for bothering you. It was my mistake. I found out if I lower the learning rate at both test and train time I can get better performance. (my environment is half_cheetah_vel)

from pytorch-maml-rl.

Recommend Projects

question about test about pytorch-maml-rl HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	# Sample the training trajectories with the initial policy and adapt the
	# policy to the task, based on the REINFORCE loss computed on the
	# training trajectories. The gradient update in the fast adaptation uses
	# `first_order=True` no matter if the second order version of MAML is
	# applied since this is only used for sampling trajectories, and not
	# for optimization.
	params = None
	for step in range(num_steps):
	train_episodes = self.create_episodes(params=params,
	gamma=gamma,
	gae_lambda=gae_lambda,
	device=device)
	train_episodes.log('_enqueueAt', datetime.now(timezone.utc))
	# QKFIX: Deep copy the episodes before sending them to their
	# respective queues, to avoid a race condition. This issue would
	# cause the policy pi = policy(observations) to be miscomputed for
	# some timesteps, which in turns makes the loss explode.
	self.train_queue.put((index, step, deepcopy(train_episodes)))

	with self.policy_lock:
	loss = reinforce_loss(self.policy, train_episodes, params=params)
	params = self.policy.update_params(loss,
	params=params,
	step_size=fast_lr,
	first_order=True)