Giter Site home page Giter Site logo

core-robotics-lab / mixture Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 88 KB

Public Repository for the Mixed-Initiative Multi-Agent Apprenticeship Learning (MixTURE) for Human Training of Multi-Robot Teams

License: GNU General Public License v3.0

Python 100.00%

mixture's Introduction

Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams

Paper Information

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023

Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay

Experiment: heuristic demonstrations

  1. Decompress demonstration files: cd demos ; python decompress.py
  2. Change to baselines directory: cd ../baselines
  3. Run easy (5x5) experiments with all 4 baselines: PYTHONPATH=.. python run_easy_baselines.py
  4. Repeat with medium and hard

Experiment: human study

  1. Augment demonstration files: cd demos; python augment_demos.py
  2. Change to user_study_lfd directory: cd ../user_study_lfd
  3. PYTHONPATH=.. python run.py

File Structure

The most relevant scripts are:

  1. marl.py: defines PPO training loop, instantiated using an environment, an AgentGroup, and a RewardSignal
  2. agents/*.py: define agent architectures, including recurrent policies, fully-connected communication, attention communication, MIM.
    • choose MIM / no MIM, attention / no attention, etc. by selecting appropriate class to instantiate
  3. reward_signals.py: defines discriminator architectures
  4. envs/*.py: defines environment observation/action spaces, dynamics
    • envs/comm_wrapper.py: defines a wrapper which adds discrete (one-hot) communication observations and actions into an existing environments
  5. ablations/simultaneous.py: defines a combined PPO+BC trainer which adds BC term to the loss during online updates
  6. expert_heuristics/*.py: used to create heuristic demonstration datasets
  7. ez_tuning.py: defines hyperparameter tuning framework + statistics (IQM, boostrapped confidence intervals)

Major tunable hyperparameters:

  • lr: learning rate for policy and critic can be specified when instantiating an AgentGroup
    • learning rate for the discriminator can be specified when instantiating the appropriate RewardSignal
    • note that CombinedTrainer constructor overrides AgentGroup-specified learning rates.
  • fc_dim: the hidden dimensionality (width) of policy and critic architectures, can be specified when instantiated an AgentGroup

Examples

Running the MixTURE w/o behavioral cloning

from agents import FullAttentionMIMAgents
from envs import FireCommander5x5
from reward_signals import MixedGAILReward
from marl import PPOTrainer

agents = FullAttentionMIMAgents(1e-3, 1e-3, mim_coeff=0.01, fc_dim=64)
reward = MixedGAILReward("demos/firecommander_5x5.pickle", lr=1e-5)

trainer = PPOTrainer(
    FireCommander5x5(),
    agents,
    reward.normalized(),
    gae_lambda=0.5,
    minibatch_size=32,
)

for _ in range(100):
    trainer.run()

trainer.evaluate()
print(trainer.logger.data["episode_len"][-1])

Running the MixTURE w/ behavioral cloning

from ablations.simultaneous import CombinedTrainer, ExposedBCTrainer, ExposedPPOTrainer
from agents import FullAttentionMIMAgents
from envs import FireCommander10x10
from reward_signals import MixedGAILReward

demo_filename = "demos/firecommander_10x10.pickle"
agents = FullAttentionMIMAgents(0, 0, mim_coeff=0.01, fc_dim=64)
env = FireCommander10x10(n_fires=1)
reward = MixedGAILReward(demo_filename, lr=1e-5)

trainer = CombinedTrainer(
    lr = 1e-3,
    bc_trainer=ExposedBCTrainer(
        env,
        agents,
        demo_filename=demo_filename,
        minibatch_size=32,
    ),
    ppo_trainer=ExposedPPOTrainer(
        env,
        agents,
        reward.normalized(),
        gae_lambda=0.5,
        minibatch_size=32,
    ),
    bc_weight=0.1,
)

for _ in range(100):
    trainer.run()

trainer.evaluate()
print(trainer.logger.data["episode_len"][-1])

Questions

In case of any questions, please reach out directly to Esmaeil Seraj at [email protected]

Citation

@inproceedings{seraj2023mixed,
  title={Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams},
  author={Seraj, Esmaeil and Xiong, Jerry Yuyang and Schrum, Mariah L and Gombolay, Matthew},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

mixture's People

Contributors

esiseraj avatar

Stargazers

 avatar  avatar Nick Imanzi avatar Michael Ferguson avatar

Watchers

 avatar Zac Chen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.