Giter Site home page Giter Site logo

what's the meaning of AverageTrainReturn_all_train_tasks, AverageReturn_all_train_tasks, AverageReturn_all_test_tasks? about oyster HOT 1 CLOSED

tianyma avatar tianyma commented on July 26, 2024
what's the meaning of AverageTrainReturn_all_train_tasks, AverageReturn_all_train_tasks, AverageReturn_all_test_tasks?

from oyster.

Comments (1)

katerakelly avatar katerakelly commented on July 26, 2024 1

Hello, thanks for your interest in our work!

AverageTrainReturn_all_train_tasks - average return achieved by an agent in a sampling of training tasks using context sampled from the replay buffer (implemented here:

### eval train tasks with posterior sampled from the training replay buffer
train_returns = []
for idx in indices:
self.task_idx = idx
self.env.reset_task(idx)
paths = []
for _ in range(self.num_steps_per_eval // self.max_path_length):
context = self.sample_context(idx)
self.agent.infer_posterior(context)
p, _ = self.sampler.obtain_samples(deterministic=self.eval_deterministic, max_samples=self.max_path_length,
accum_context=False,
max_trajs=1,
resample=np.inf)
paths += p
if self.sparse_rewards:
for p in paths:
sparse_rewards = np.stack(e['sparse_reward'] for e in p['env_infos']).reshape(-1, 1)
p['rewards'] = sparse_rewards
train_returns.append(eval_util.get_average_returns(paths))
train_returns = np.mean(train_returns)
)
AverageReturn_all_train_tasks - average return achieved by an agent in a sampling of training tasks using context sampled by the current policy
AverageReturn_all_test_tasks - average return achieved by an agent in a sampling of testing tasks using context sampled by the current policy
(these last two implemented via this function:
def _do_eval(self, indices, epoch):
final_returns = []
online_returns = []
for idx in indices:
all_rets = []
for r in range(self.num_evals):
paths = self.collect_paths(idx, epoch, r)
all_rets.append([eval_util.get_average_returns([p]) for p in paths])
final_returns.append(np.mean([a[-1] for a in all_rets]))
# record online returns for the first n trajectories
n = min([len(a) for a in all_rets])
all_rets = [a[:n] for a in all_rets]
all_rets = np.mean(np.stack(all_rets), axis=0) # avg return per nth rollout
online_returns.append(all_rets)
n = min([len(t) for t in online_returns])
online_returns = [t[:n] for t in online_returns]
return final_returns, online_returns
)

The final metric is the one reported in our paper.

from oyster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.