Giter Site home page Giter Site logo

make-an-agent's Introduction

Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

[Paper][Project Website]

This repository is the official PyTorch implementation of Make-An-Agent. Make-An-Agent, a policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation, which demonstrates remarkable versatility and scalability on multiple tasks and has a strong generalization ability on unseen tasks to output well-performed policies with only few-shot demonstrations as inputs.



💻 Installation

  1. create a virtual environment and install all required packages.
conda env create -f environment.yml 
conda activate makeagent
  1. install Metaworld and mujoco_py for evaluations. Following instructions in DrM.

🛠️ Code Usage

For training autoencoder and behavior embedding, you could download the training dataset from Huggingface: train_data/training_dataset.pt or specific data for each task, e.g. train_data/door-open.pt.

Training the parameter autoencoder to encode and decode policy network parameters: Change data_root in autoencoder/config.yaml.

cd autoencoder
python train.py

Training behavior embeddings to process trajectory data: Change data_root in behavior_embedding/config_embed.yaml.

cd behavior_embedding
python train.py

Training the policy generator with conditional diffusion models:

Data processing:

Make-An-Agent uses a latent diffusion model, so the data should be processed using the autoencoder and behavior embedding.

You can directly use the pretrained models in HuggingFace.

Or use the processed training data in train_data/process_data.pt to train the policy generator.

If you want to process your own data, change the paths of data and pretrained model root in dataset/config.yaml.

cd dataset.py
python process_data.py

Ensure you now have processed data to match the latent representation dimensions, then change data_root in PolicyGenerator/config.yaml with your processed data.

cd PolicyGenerator
python train.py

Tips: We save both the best model and the last model during training.

Evaluating the synthesized parameters:

First we need to decode the latent parameters into policy parameters. Then deploy synthesized policy networks in the simulator.

We provide the processed test data in test_data/processed/. You also can process your test data using dataset/process_data.py.

Change ckpt_dir, encoder_dir and data_dir in PolicyGenerator/config.yaml.

cd PolicyGenerator
python eval.py

📗 Dataset and Pretrained Models

We release all pretrain models and data in Huggingface.

Pretrained models: Include autoencoder.pt, behavior_embedding.pt and model-best.pt.

Dataset: Training data is in train_data/ and test data is in test_data/

📝 Citation

If you find our work or code useful, please consider citing as follows:

@article{liang2024make,
title={Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion},
author={Liang, Yongyuan and Xu, Tingqiang and Hu, Kaizhe and Jiang, Guangqi and Huang, Furong and Xu, Huazhe},
journal={arXiv preprint arXiv:2407.10973},
year={2024}
}

🌷 Acknowledgement

Our work is primarily based on the following projects: Diffusion Policy, pytorch-lightning, Metaworld, Robosuite, walk-these-ways. We thank these authors for their contributions to the open-source community.

For any questions or suggestions, please contact Yongyuan Liang.

make-an-agent's People

Contributors

cheryyunl avatar

Stargazers

爱可可-爱生活 avatar Shyam Sudhakaran avatar William Zhou avatar  avatar Achilles avatar TED Vortex (Teodor-Eugen Duțulescu) avatar Michael Feldman avatar Yetmens avatar  avatar Maxim Bobrin avatar Abdul Adhil PK avatar  avatar kyle avatar  avatar Rui Shao avatar Kuk Jin Kim avatar  avatar Yu Lei avatar Jose Cohenca avatar 唐国梁Tommy avatar Guangqi Jiang avatar Xiao avatar

Watchers

 avatar  avatar

make-an-agent's Issues

Some notes about the HF artifacts

Hi there,

Congrats on releasing the models and dataset! Niels here from the open-source team at Hugging Face. I was wondering whether you would be up for making your dataset as a dedicated dataset repository, rather than adding them as part of the model repo. This way, people can do:

from datasets import load_dataset

dataset = load_dataset("cheryyunl/make-an-agent")

? This would also ensure a dataset viewer, which allows people to easily discover and read the first few rows of the dataset in the browser. See here for a guide: https://huggingface.co/docs/datasets/loading.

Regarding the models, we usually prefer to have a single checkpoint per model repository, each with a dedicated model card. This way, download stats can work out-of-the-box as explained here: https://huggingface.co/docs/hub/models-uploading#upload-a-pytorch-model-using-huggingfacehub. This will also automatically link your models to the Space.

Let me know if you need any help!

Information about policy params in data.pt and thier corresponding metaworld tasks

Hi,

I have a query about the policy params. In your data.pt file, i can see 17286 policies (SAC) {self.data['param'].shape : torch.Size([17286, 22664])} and corresponding traj and task embeddings. But there is no information about what task that policy corresponds to (I mean which one of the 10 metaworld tasks). Could you please provide additional information on this regarding the corresponding task?

Also, in paper its mentioned that policy network architecture used for MetaWorld is a 4-layer MLP with 128 hidden units, containing a total of 22,664 parameters. But you also mentioned in Dataset section that you collect 1500 policy networks for each task in MetaWorld and Robosuite. These networks are sourced from policy checkpoints during SAC (11 ) training. So, is the SAC same as the 4-layer MLP?

Contents of the training dataset file data.pt

Hi,
Thanks for this excellent work. I am trying to understand and use this framework on another dataset but I have few doubts regarding the training data being used i.e. data.pt that is used to train the autoencoder and the behavior embedding. Seems like there are three types of data in the file with keys: 'params', 'traj', 'task'. with each having dim 0 length as 17286 and traj, task having dim 1 of length 1020, 117 respectively.
I understood the 'traj' tensor to be related to the long trajectory recorded for each checkpoint i.e. s_0, a_0, s_1, a_1, ....s_n, a_n
but I am not sure.
Also I am unsure about the data stored in the task tensor? Am I correct in understanding this is where the s_K to s_K+m post success states are stored or is there any other tensor storing s_K to s_K+m?
Can you please help me understand the contents of the training data file, data.pt, especially the traj tensor since I am unable to understand the shapes of each state and action and also the format in which they are stored ? This will help me in implementing the framework on my concerned dataset.
TIA!

Issue during evaluation

Hi,

Congrats on the great work and thanks for releasing codebase. I was able to train all the 3 phases (encoder, behaviour embedding and diffusion model) but I am not able to evaluate those checkpoints. I want to evaluate them on an unseen task (coffee button)

Command run: python eval.py in PolicyGenerator folder.
Error: env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLEenv_name
KeyError: '-v2-goal-observable'

Eval configs in config.yaml of PolicyGenerator:
eval:
env_name: ''
ckpt_dir: '/home/jayaram/Make-An-Agent/ckpts/model-best.torch'
encoder_dir: '/home/jayaram/Make-An-Agent/ckpts/autoencoder.ckpt'
data_dir: '/home/jayaram/Make-An-Agent/Make-An-Agent_dataset/test_data/unseen/processed/coffee-button.pt'

Am i providing the inputs correctly? If yes, how do I resolve this issue?

Regards
Jayaram

Format of param tensor in data.pt

Hey @cheryyunl , I would like to know about the param tensor too, which stores the parameters of the policy network for every checkpoint. I am trying to adapt this framework onto my own dataset that I have acquired by training with PPO instead of SAC. In my training dataset, I tried storing the parameters for every checkpoint, in a list, as it is given by the agent.state_dict() function, which returns a dictionary of the agent's parameters. This format doesn't allow me to train on my dataset, so I would like to ask for the format in which the parameters are stored in the tensor for every checkpoint. Thanks!

Originally posted by @AmoghJuloori in #3 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.