Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams

Paper Information

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023

Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay

Full-Read Link: https://openreview.net/forum?id=VCOZaczCHg

Short Presentation: https://youtu.be/COGGl3lFH94?si=Z3CugC5PDTSST8gA

Experiment: heuristic demonstrations

Decompress demonstration files: cd demos ; python decompress.py
Change to baselines directory: cd ../baselines
Run easy (5x5) experiments with all 4 baselines: PYTHONPATH=.. python run_easy_baselines.py
Repeat with medium and hard

Experiment: human study

Augment demonstration files: cd demos; python augment_demos.py
Change to user_study_lfd directory: cd ../user_study_lfd
PYTHONPATH=.. python run.py

File Structure

The most relevant scripts are:

marl.py: defines PPO training loop, instantiated using an environment, an AgentGroup, and a RewardSignal
agents/*.py: define agent architectures, including recurrent policies, fully-connected communication, attention communication, MIM.
- choose MIM / no MIM, attention / no attention, etc. by selecting appropriate class to instantiate
reward_signals.py: defines discriminator architectures
envs/*.py: defines environment observation/action spaces, dynamics
- envs/comm_wrapper.py: defines a wrapper which adds discrete (one-hot) communication observations and actions into an existing environments
ablations/simultaneous.py: defines a combined PPO+BC trainer which adds BC term to the loss during online updates
expert_heuristics/*.py: used to create heuristic demonstration datasets
ez_tuning.py: defines hyperparameter tuning framework + statistics (IQM, boostrapped confidence intervals)

Major tunable hyperparameters:

lr: learning rate for policy and critic can be specified when instantiating an AgentGroup
- learning rate for the discriminator can be specified when instantiating the appropriate RewardSignal
- note that CombinedTrainer constructor overrides AgentGroup-specified learning rates.
fc_dim: the hidden dimensionality (width) of policy and critic architectures, can be specified when instantiated an AgentGroup

Examples

Running the MixTURE w/o behavioral cloning

from agents import FullAttentionMIMAgents
from envs import FireCommander5x5
from reward_signals import MixedGAILReward
from marl import PPOTrainer

agents = FullAttentionMIMAgents(1e-3, 1e-3, mim_coeff=0.01, fc_dim=64)
reward = MixedGAILReward("demos/firecommander_5x5.pickle", lr=1e-5)

trainer = PPOTrainer(
    FireCommander5x5(),
    agents,
    reward.normalized(),
    gae_lambda=0.5,
    minibatch_size=32,
)

for _ in range(100):
    trainer.run()

trainer.evaluate()
print(trainer.logger.data["episode_len"][-1])

Running the MixTURE w/ behavioral cloning

from ablations.simultaneous import CombinedTrainer, ExposedBCTrainer, ExposedPPOTrainer
from agents import FullAttentionMIMAgents
from envs import FireCommander10x10
from reward_signals import MixedGAILReward

demo_filename = "demos/firecommander_10x10.pickle"
agents = FullAttentionMIMAgents(0, 0, mim_coeff=0.01, fc_dim=64)
env = FireCommander10x10(n_fires=1)
reward = MixedGAILReward(demo_filename, lr=1e-5)

trainer = CombinedTrainer(
    lr = 1e-3,
    bc_trainer=ExposedBCTrainer(
        env,
        agents,
        demo_filename=demo_filename,
        minibatch_size=32,
    ),
    ppo_trainer=ExposedPPOTrainer(
        env,
        agents,
        reward.normalized(),
        gae_lambda=0.5,
        minibatch_size=32,
    ),
    bc_weight=0.1,
)

for _ in range(100):
    trainer.run()

trainer.evaluate()
print(trainer.logger.data["episode_len"][-1])

Questions

In case of any questions, please reach out directly to Esmaeil Seraj at [email protected]

Citation

@inproceedings{seraj2023mixed,
  title={Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams},
  author={Seraj, Esmaeil and Xiong, Jerry Yuyang and Schrum, Mariah L and Gombolay, Matthew},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

core-robotics-lab / mixture Goto Github PK

mixture's Introduction

Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams

Paper Information

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023

Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay

Full-Read Link: https://openreview.net/forum?id=VCOZaczCHg

Short Presentation: https://youtu.be/COGGl3lFH94?si=Z3CugC5PDTSST8gA

Experiment: heuristic demonstrations

Experiment: human study

File Structure

Major tunable hyperparameters:

Examples

Running the MixTURE w/o behavioral cloning

Running the MixTURE w/ behavioral cloning

Questions

Citation

mixture's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org