facebookresearch / diplomacy_searchbot Goto Github PK

View Code? Open in Web Editor NEW

44.0 12.0 12.0 1.91 MB

Code to accompany "Human-Level Performance in No-Press Diplomacy via Equilibrium Search", published at ICLR 2021

License: MIT License

Dockerfile 0.05% Makefile 0.04% Shell 0.14% Python 66.04% CMake 0.21% C++ 21.31% HTML 12.22%

diplomacy_searchbot's Introduction

Diplomacy SearchBot and DORA

This code contains checkpoints and training code the following papers:

"Human-Level Performance in No-Press Diplomacy via Equilibrium Search" from ICLR'21. Exact code version is at iclr21 branch.
"No-press Diplomacy From Scratch" from NeurIPS'21.

Code

A very brief orientation:

The implementation for SearchBot lives here
The supervised learning model architecture lives here, and the bulk of the training logic lives here
RL training - both for policy gradient and deep nash value iteration - lives here

Models

The pretrained models are available for download under this repo's "Releases". The easiest way to download them is via bin/download_dora_models.sh script. Configurations of agents, i.e., combinations of checkpoints and parameters are stored in conf/common/agents. Below is the list of the most important ones:
- model_sampled is Blueprint from ICLR'21 paper
- searchbot_02_fastbot is SearchBot from ICLR'21 paper
- searchbot_neurips21_fva_dora is the FvA DORA agent that played with humans from NeurIPS21 paper.
- searchbot_neurips21_human_dnvi_npu is HumanDNVI-NPU from NeurIPS21 paper
- searchbot_neurips21_dora is DORA from NeurIPS21 paper
- searchbot_neurips21_supervised is SearchBot-Transf from NeurIPS21 paper.

Evals

The command to play a couple of agents against each other:

python run.py --adhoc --cfg conf/c01_ag_cmp/cmp.prototxt \
    I.agent_one=agents/CFG_NAME1.prototxt \
    I.agent_six=agents/CFG_NAME2.prototxt

Runn an FVA game:

python run.py --adhoc --cfg conf/c01_ag_cmp/cmp.prototxt \
    I.agent_one=agents/CFG_NAME1.prototxt \
    I.agent_six=agents/CFG_NAME2.prototxt \
    start_game=./fva_starting_position.json

There is also a script that will run sequentially multiple games between all pairings of a set of agents.

python conf/exps/h2h_example.py

It will run a few 1vs6 games of DipNet vs Human-DNVI-NPU and print averate score:

--> square_score
agent_six       dipnet
agent_one
human_dnvi_npu   0.287

Note that it may take a long time to run many games - if you have multiple GPUs you may prefer your own infrastructure to parallelize things.

For more info about the configuration and command-line args, see the Running Tasks section below.

Training a supervised model

See iclr21 branch.

Game info

Diplomacy is a strategic board game set in 1914 Europe. The board is divided into fifty-six land regions and nineteen sea regions. Forty-two of the land regions are divided among the seven Great Powers of the game: Austria-Hungary, England, France, Germany, Italy, Russia, and Turkey. The remaining fourteen land regions are neutral at the start of the game.

Each power controls some regions and some units. The number of the units controlled depends on the number of the controlled key regions called Supply Centers (SCs). Simply put, more SCs means more units. The goal of the game is to control more than half of all SCs by moving units into these regions and convincing other players to support you.

You can find the full rules here. To get the game's spirit, watch some games with comments. What's more, you can play the game online on WebDiplomacy either against bots or humans.

Installation

# Clone the repo with submodules:

git clone --recursive [email protected]:facebookresearch/diplomacy_searchbot.git
cd diplomacy_searchbot

# Apt installs
apt-get install -y wget bzip2 ca-certificates curl git build-essential clang-format-8 git wget cmake build-essential autoconf libtool pkg-config libgoogle-glog-dev

# Install conda
wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.7.10-Linux-x86_64.sh -O ~/miniconda.sh
/bin/bash ~/miniconda.sh -b

# Create conda env
conda create --yes -n diplomacy python=3.7
source activate diplomacy

# Install pytorch, pybind11
conda install --yes pytorch=1.7.1 torchvision cudatoolkit=11.0 -c pytorch
conda install pybind11

# Install go for boringssl in grpc
# We have some hacky patching code for protobuf that is not guaranteed
# to work on versions other than this.
conda install go protobuf=3.19.1 --yes

# Install python requirements
pip install -r requirements.txt

# Local pip installs
pip install -e ./thirdparty/github/fairinternal/postman/nest/
pip install -e ./thirdparty/github/fairinternal/postman/postman/
pip install -e . -vv

# Make
make

# Run unit tests
make test_fast

After each pull it's recommended to run make to re-compile internal C++ and protobuf code.

Running tasks

The code has a single entry point, run.py, that can be used to train a model, compare agents, profile them, etc. We refer to this kind of activity as a task. To specify which task to run and what parameters to use, we use configs. Below an example of a config that is used to train an agent with imitation learning on human data:

train {
    dataset_params: {
        data_dir: "/path/to/games/"
        value_decay_alpha: 0.9;
    }
    batch_size: 2500;
    lr: 0.001;
    lr_decay: 0.99;
    clip_grad_norm: 0.5
    checkpoint: "./checkpoint.pth";
    lstm_dropout: 0.1;
    encoder_dropout: 0.2;
    num_encoder_blocks: 8;
}

We use text protobuf format to specify the configs. Each task has a schema, a formal description of what parameters are allowed in each config, e.g., here's the definition for the train task above.

Protobufs could be confusing, but good news - you don't have to understand them to run tasks. Instead, you need to find the config for your task and run it. We describe all tasks in the next section. Here is an example of how to launch training on human data:

python run.py --adhoc --cfg conf/c02_sup_train/sl_20200901.prototxt

You can override any config parameter in command line using argparse-like syntax:

python run.py --adhoc --cfg conf/c02_sup_train/sl_20200901.prototxt batch_size=200 --dataset_params.value_decay_alpha=1.0

Note that it's optional to use "--" in front of overrides.

Check documentation for HeyHi, the configuration library, for more details.

Tasks overview

In general, all configs are stored in conf/ folder and grouped by tasks. You can find all possible arguments for all tasks in conf/conf.proto file. Below are the most important tasks:

Supervised training task. Configs in c02_sup_train, docs here.
Making 2 agents play against each other for evaluation. Configs in c01_ag_cmp, docs here.
Training RL agent. Configs in c04_exploit, docs here.

Going deeper

We use an in-house fast C++ implementation of the diplomacy environment. See here for how to interact with it.

The games could be serialized as JSON files, e.g., our human data and test situations use this format.

Code structure:

fairdiplomacy - datasets, agents, and trainers
conf - all the configs for fairdiplomacy/ part.

External links:

"No Press Diplomacy: Modeling Multi-Agent Gameplay" (Paquette et al, 2019), on which our supervised learning is based.
"Learning to Play No-Press Diplomacy with Best Response Policy Iteration" (Anthony et al, 2020). We use some model improvement from the paper.

Pre-commit hooks

Run pre-commit install to install pre-commit hooks that will auto-format python code before commiting it.

Or you can do this manually. Use black auto-formatter to format all python code. For protobufs use clang-format-8 conf/*.proto -i. Circle CI tests check for that.

Tests

To run tests locally run make test. Or you can wait Circle CI to run the test once you push your changes to any branch.

We have 2 level of tests: fast, unit tests (run with make test_fast) and slow, integration tests (run with make test_integration). The latter aim to use the same entry point as users do, i.e., run.py

There are some differences between running the tests locally vs on CI.

Most integration tests use small fake data in the repo, but some use real data to check that the latest models are working. Obviuously, these tests are skipped on CI and so local tests have better coverage.
CI use CPUs for everything.

We use nose to discover tests. It searches for *test*.py files and extracts all unittests from them. So usually your tests will be automatically included into CircleCI.

Some useful commands. Integration tests are notoriously slow and so sometimes one want to execute only one particular test. Here's how to do this. First, list all the test:

$ nosetests integration_tests/integration_test.py --collect-only -v --with-id
#2 integration_test.test_build_cache ... ok
#3 integration_test.test_rl_configs('exploit_06.prototxt', {}) ... ok
#3 integration_test.test_rl_configs('selfplay_01.prototxt', {}) ... ok
#4 integration_test.test_rl_configs_real_data('exploit_06.prototxt', {}) ... ok
#4 integration_test.test_rl_configs_real_data('selfplay_01.prototxt', {}) ... ok
#5 integration_test.test_train_configs('sl.prototxt', {}) ... ok
...

And then pass the id of the test to the nose:

nosetests integration_tests/integration_test.py  -v --with-id 3

Dataset

The diplomacy model was trained via supervised learning on a dataset of 46,148 games provided by webdiplomacy.net.

The dataset can be downloaded from this repo's "Releases" page. The file has been compressed (via gzip) and encrypted (via gpg) with a password. To obtain the password, please email [email protected]

Once you have downloaded the data_cache.pt.gz.gpg file, you can run:

gpg data_cache.pt.gz.gpg

and enter the password to decrypt the file, and then:

gunzip data_cache.pt.gz

to uncompress the file. Then you can train a model by running the command in the overview section of this README

Note that re-training the blueprint model is not necessary to run an agent comparison -- a pre-trained model can be downloaded from this repo's "Releases" page.

License

MIT License

Copyright (c) Facebook, Inc. and its affiliates.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

diplomacy_searchbot's People

Contributors

Stargazers

Watchers

Forkers

suegreen bowers codeaudit rainwangphy j-ratchford bkj maayanorner wwongkamjan aslansd ashleyyca99

diplomacy_searchbot's Issues

make clean: No rule to make target 'clean'

make clean command does not work, even though it is in the Makefile:

#12 83.31 make -C dipcc clean
#12 83.31 make[1]: *** No rule to make target 'clean'.  Stop.
#12 83.31 make[1]: Entering directory '/diplomacy_searchbot/dipcc'
#12 83.31 make[1]: Leaving directory '/diplomacy_searchbot/dipcc'
#12 83.31 make: *** [Makefile:28: clean] Error 2

forward() got an unexpected keyword

Greetings

I have been trying to retrain your diplomacy_searchbot. However, I could not run this line which is to compare 1vs 6

python run.py --adhoc --cfg conf/c01_ag_cmp/cmp.prototxt \
    I.agent_one=agents/dipnet_20200827_iclr_v_humans.prototxt \
    I.agent_six=agents/searchbot_neurips21_human_dnvi_npu.prototxt

The full error log is here

I0826 18:48:35 [run:147] Config: conf/c01_ag_cmp/cmp.prototxt
I0826 18:48:35 [run:148] Overrides: ['I.agent_one=agents/dipnet_20200827_iclr_v_humans.prototxt', 'I.agent_six=agents/searchbot_neurips21_human_dnvi_npu.prototxt']
I0826 18:48:35 [run:156] Exp dir: /home/wwongkam/results/diplomacy/adhoc/2022-08-26T184835.381498/c01_ag_cmp/cmp/I.age@[email protected]@agents_searchbot_neurips21_human_dnvi_npu._56704896
I0826 18:48:35 [run:157] Job status [before run]: Status.NOT_STARTED
I0826 18:48:35 [run:83] Cwd: /home/wwongkam/diplomacy_searchbot
I0826 18:48:35 [run:84] Task: compare_agents
I0826 18:48:35 [run:85] Cfg:
agent_one {
  model_sampled {
    model_path: "blueprint.pt"
    temperature: 0.5
  }
}
agent_six {
  searchbot {
    model_path: "models/neurips21_human_dnvi_npu_epoch000500.ckpt"
    n_rollouts: 512
    value_model_path: "models/neurips21_human_dnvi_npu_value_epoch000500.ckpt"
    cache_rollout_results: true
    loser_bp_value: 0.019999999552965164
    loser_bp_iter: 64.0
    rollouts_cfg {
      n_threads: 56
      temperature: 0.75
      top_p: 0.949999988079071
      max_rollout_length: 0
      average_n_rollouts: 1
    }
    plausible_orders_cfg {
      n_plausible_orders: 30
      max_actions_units_ratio: 3.5
      req_size: 1024
    }
  }
}
power_one: AUSTRIA
out: "output.json"
seed: 0
draw_on_stalemate_years: 5
launcher {
  local {
    use_local: true
  }
}

I0826 18:48:35 [util:222] Git revision: ed2966d285ae828720b7b817bb42c617da194b9f
I0826 18:48:35 [util:238] Found unsubmitted diff. Saving to /home/wwongkam/diplomacy_searchbot/workdir.diff
I0826 18:48:35 [run:88] Is on slurm: False
I0826 18:48:35 [run:89] Job env: JobEnvironment(job_id=1, hostname=MSI, local_rank=0(1), node=0(1), global_rank=0(1))
I0826 18:48:35 [run:92] Is master: True
I0826 18:48:35 [run:34] Set seed to 0
I0826 18:48:40 [searchbot_agent:212] Initialized SearchBotAgent: {'max_batch_size': 700, 'model': <fairdiplomacy.agents.model_wrapper.ModelWrapper object at 0x7fe1f7286810>, 'model_rollouts': <fairdiplomacy.agents.model_rollouts.ModelRollouts object at 0x7fe1f7286ad0>, 'n_rollouts': 512, 'cache_rollout_results': True, 'precompute_cache': False, 'enable_compute_nash_conv': False, 'n_plausible_orders': 30, 'use_optimistic_cfr': True, 'use_final_iter': True, 'use_pruning': False, 'bp_iters': 0, 'bp_prob': 0.0, 'loser_bp_iter': 64.0, 'loser_bp_value': 0.019999999552965164, 'share_strategy': False, 'reset_seed_on_rollout': False, 'max_seconds': 0, 'order_sampler': <fairdiplomacy.agents.plausible_order_sampling.PlausibleOrderSampler object at 0x7fe1f7269ad0>, 'order_aug_cfg': }
I0826 18:48:40 [env:100] Starting order prediction for turn SPRING 1901 MOVEMENT
Traceback (most recent call last):
  File "run.py", line 102, in <module>
    heyhi.parse_args_and_maybe_launch(main)
  File "/home/wwongkam/diplomacy_searchbot/heyhi/run.py", line 114, in parse_args_and_maybe_launch
    maybe_launch(main, exp_root=get_exp_dir(PROJECT_NAME), overrides=overrides, **kwargs)
  File "/home/wwongkam/diplomacy_searchbot/heyhi/run.py", line 166, in maybe_launch
    util.run_with_config(main, exp_handle, cfg, overrides, ckpt_dir, log_level)
  File "/home/wwongkam/diplomacy_searchbot/heyhi/util.py", line 588, in run_with_config
    callable()
  File "/home/wwongkam/diplomacy_searchbot/heyhi/util.py", line 371, in wrapped
    result = f(*args, **kwargs)
  File "run.py", line 98, in main
    return TASKS[task](cfg)
  File "run.py", line 53, in compare_agents
    result = run_1v6_trial(agent_one, agent_six, power_string, cfg, cf_agent=cf_agent)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/compare_agents.py", line 63, in run_1v6_trial
    scores = env.process_all_turns(max_turns=cfg.max_turns)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/env.py", line 135, in process_all_turns
    self.process_turn()
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/env.py", line 101, in process_turn
    power_orders = self.policy_profile.get_all_power_orders(self.game)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/env.py", line 49, in get_all_power_orders
    game, self._agent_one_power
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/agents/model_sampled_agent.py", line 32, in get_orders
    return self.get_orders_many_powers(game, [power], **kwargs)[power]
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/agents/model_sampled_agent.py", line 45, in get_orders_many_powers
    actions, _, _ = self.model.do_model_request(inputs, temperature=temperature, top_p=top_p)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/agents/model_wrapper.py", line 99, in do_model_request
    device=self.device,
  File "/home/wwongkam/miniconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/utils/batching.py", line 21, in batched_forward
    batch_results = callable(batch_data)
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/agents/model_wrapper.py", line 95, in <lambda>
    batch_repeat_interleave=batch_repeat_interleave,
  File "/home/wwongkam/diplomacy_searchbot/fairdiplomacy/agents/model_wrapper.py", line 157, in _forward_policy
    y = self.model(**x, need_value=False, pad_to_max=True)
  File "/home/wwongkam/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'x_scoring_system'

I am having trouble finding out where is the root cause of this error. May I ask for your time to look into this issue, please?

Thank you so much

`heyhi` version / FROZEN_SYM_BD issue

Hi --

I'm following the installation instructions in the README, and make test_fast fails with a bunch of errors like:

======================================================================
ERROR: testLoadSimple (test_conf.TestRootWithIncludesConf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/test_conf.py", line 85, in testLoadSimple
    cfg = self._load(overrides=[])
  File "/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/test_conf.py", line 80, in _load
    task, meta_cfg = heyhi.conf.load_cfg(root_cfg, overrides)
AttributeError: module 'heyhi.conf' has no attribute 'load_cfg'

I guessed that maybe those calls should be changed from heyhi.conf.load_cfg to heyhi.conf.load_config -- though if I do that, I get errors like

======================================================================
ERROR: testLoadSimple (test_conf.TestRootWithIncludesAndRedefinesConf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/test_conf.py", line 106, in testLoadSimple
    cfg = self._load(overrides=[])
  File "/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/test_conf.py", line 101, in _load
    task, meta_cfg = heyhi.conf.load_cfg(root_cfg, overrides)
  File "/home/ubuntu/projects/diplomacy_searchbot/heyhi/conf.py", line 389, in load_config
    return cfg.to_frozen()
  File "/home/ubuntu/projects/diplomacy_searchbot/conf/conf_pb2.py", line 63, in to_frozen
    value = maybe_to_dict(value)
  File "/home/ubuntu/projects/diplomacy_searchbot/conf/conf_pb2.py", line 47, in maybe_to_dict
    return msg.to_frozen()
  File "/home/ubuntu/projects/diplomacy_searchbot/conf/conf_pb2.py", line 63, in to_frozen
    value = maybe_to_dict(value)
  File "/home/ubuntu/projects/diplomacy_searchbot/conf/conf_pb2.py", line 47, in maybe_to_dict
    return msg.to_frozen()
  File "/home/ubuntu/projects/diplomacy_searchbot/conf/conf_pb2.py", line 83, in to_frozen
    return FROZEN_SYM_BD[descriptor.full_name](ret, self)
KeyError: 'fairdiplomacy.TestTask.SubMessage'
-------------------- >> begin captured logging << --------------------
root: INFO: Going to guess message type by trying all of them
root: INFO: Guessed type: <class 'conf.conf_pb2.MetaCfg'>
root: DEBUG: Constructing MetaCfg from /home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/data/root_with_includes_redefined.prototxt with include overrides {} and scalar overrides {}
root: DEBUG: <class 'conf.conf_pb2.MetaCfg'> defaults {'sub': 'redefine_subscalar_22', 'sub2': 'redefine_subscalar_22'}
root: DEBUG: Constructing MetaCfg. Applying include: mount='test.sub' include=PosixPath('/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/data/redefine_subscalar_22.prototxt') subcfg=SubMessage
root: DEBUG: Constructing SubMessage from /home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/data/redefine_subscalar_22.prototxt with include overrides {} and scalar overrides {}
root: DEBUG: <class 'conf.conf_pb2.SubMessage'> defaults {}
root: DEBUG: Constructing MetaCfg. Applying include: mount='test.sub2' include=PosixPath('/home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/data/redefine_subscalar_22.prototxt') subcfg=SubMessage
root: DEBUG: Constructing SubMessage from /home/ubuntu/projects/diplomacy_searchbot/heyhi/tests/data/redefine_subscalar_22.prototxt with include overrides {} and scalar overrides {}
root: DEBUG: <class 'conf.conf_pb2.SubMessage'> defaults {}
--------------------- >> end captured logging << ---------------------

Any thoughts? Thanks!

make: bin/install_deps.sh: No such file or directory

Hi,
I have tried to use $ make deps, which points to: bin/install_deps.sh - but the file is not in the repo.

Also, thank you for your great contribution!

Example of running a Transformer based model?

Hello --

I'm interested in running one of your Transformer-based models -- is there a run.py command that runs one of these models? All the ones I've tested so far use the GNN-based models ...

Thanks!

Any installation instructions to avoid anaconda?

We are trying to avoid using anaconda and pip in the same installation.

Has anyone successfully installed this with pip only. We are failing to make due to missing pybind11 cmake files. Is there a trick to getting pybind11 installed? Our sysadmin does not want to do the global install of pybind11, which has been suggested.

Some questions about the code

I wonder what the C and S in fairdiplomacy/models/diplomacy_model/diplomacy_model.py mean.
Best regards.