Giter Site home page Giter Site logo

google-research / seed_rl Goto Github PK

View Code? Open in Web Editor NEW
790.0 46.0 147.0 27.4 MB

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.

License: Apache License 2.0

Python 80.77% Shell 4.11% C++ 5.41% Starlark 0.12% Jupyter Notebook 9.59%
rl impala r2d2 atari deepmind-lab google-research-football tf2 gcp

seed_rl's Introduction

SEED (archived)

This repository contains an implementation of distributed reinforcement learning agent where both training and inference are performed on the learner.

The project is a research project and has now been archived. There will be no further updates.

Architecture

Four agents are implemented:

The code is already interfaced with the following environments:

However, any reinforcement learning environment using the gym API can be used.

For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work.

Bibtex

@article{espeholt2019seed,
    title={SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference},
    author={Lasse Espeholt and Rapha{\"e}l Marinier and Piotr Stanczyk and Ke Wang and Marcin Michalski},
    year={2019},
    eprint={1910.06591},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Pull Requests

At this time, we do not accept pull requests. We are happy to link to forks that add interesting functionality.

Prerequisites

There are a few steps you need to take before playing with SEED. Instructions below assume you run the Ubuntu distribution.

apt-get install git
  • Clone SEED git repository:
git clone https://github.com/google-research/seed_rl.git
cd seed_rl

Local Machine Training on a Single Level

To easily start with SEED we provide a way of running it on a local machine. You just need to run one of the following commands (adjusting number of actors and number of envs. per actor/env. batch size to your machine):

./run_local.sh [Game] [Agent] [number of actors] [number of envs. per actor]
./run_local.sh atari r2d2 4 4
./run_local.sh football vtrace 4 1
./run_local.sh dmlab vtrace 4 4
./run_local.sh mujoco ppo 4 32 --gin_config=/seed_rl/mujoco/gin/ppo.gin

It will build a Docker image using SEED source code and start the training inside the Docker image. Note that hyper parameters are not tuned in the runs above. Tensorboard is started as part of the training. It can be viewed under http://localhost:6006 by default.

We also provide a sample script for running training with tuned parameters for HalfCheetah-v2. This setup runs training with 8x32=256 parallel environments to make training faster. The sample complexity can be improved at the cost of slower training by running fewer environments and increasing the unroll_length parameter.

./mujoco/local_baseline_HalfCheetah-v2.sh

Distributed Training using AI Platform

Note that training with AI Platform results in charges for using compute resources.

The first step is to configure GCP and a Cloud project you will use for training:

gcloud auth login
gcloud config set project [YOUR_PROJECT]

Then you just need to execute one of the provided scenarios:

gcp/train_[scenario_name].sh

This will build the Docker image, push it to the repository which AI Platform can access and start the training process on the Cloud. Follow output of the command for progress. You can also view the running training jobs at https://console.cloud.google.com/ml/jobs

DeepMind Lab Level Cache

By default majority of DeepMind Lab's CPU usage is generated by creating new scenarios. This cost can be eliminated by enabling level cache. To enable it, set the level_cache_dir flag in the dmlab/config.py. As there are many unique episodes it is a good idea to share the same cache across multiple experiments. For AI Platform you can add --level_cache_dir=gs://${BUCKET_NAME}/dmlab_cache to the list of parameters passed in gcp/submit.sh to the experiment.

Baseline data on ATARI-57

We provide baseline training data for SEED's R2D2 trained on ATARI games in the form of training curves (checkpoints and Tensorboard event files coming soon). We provide data for 4 independent seeds run up to 40e9 environment frames.

The hyperparameters and evaluation procedure are the same as in section A.3.1 in the paper.

Training curves

Training curves are available on this page.

Checkpoints and Tensorboard event files

Checkpoints and tensorboard event files can be downloaded individually here or as a single (70GBs) zip file.

Additional links

SEED was used as a core infrastructure piece for the What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study paper. A colab that reproduces plots from the paper can be found here.

seed_rl's People

Contributors

cstankonrad avatar lespeholt avatar qstanczyk avatar raphaelmarinier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seed_rl's Issues

unexpected error when running r2d2

after running rougly about 300M frames, I run into this error in my learner:

[Derived] indices[0] = 100000 is not in [0, 100000)
[[{{node StatefulPartitionedCall/PrioritizedReplay/Gather_3}}]]
[[StatefulPartitionedCall]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNext]]
[[add_14/_26]] [Op:__inference_minimize_22442]
Function call stack:
minimize -> minimize -> minimize -> minimize

any ideas on how to fix?

note: I am running 2 docker instance on 1 machine (ie run run_local.sh twice), not sure if this is the cause but since its docker it should not affect things I suspect.

full error msg:
Screenshot from 2020-05-16 13-32-27

Unable to reproduce Pong results with a local single-GPU run and paper hyper-params

Hi,
I have made minimal changes to run a Pong experiment on a local machine with one visible V100 GPU and 80 CPUs:
master...Antymon:exp/original_seed_gcp_like
Hyper-parameters' defaults were overridden with ones from gcp/train_atari.sh except for number of actors (256 with 10 for evaluation). I run the experiment for 0.54e9 frames with the intent of just witnessing the evident improvement of episode reward over the minimal one, i.e., -21. Unfortunately, that didn't happen within the computational budget (as reported in logs of my branch):
image
whereas your csv file suggests that some improvement should be noticeable:

...
Pong,SEED_R2D2,0,259025600.0,-20.17
Pong,SEED_R2D2,0,278963200.0,-19.541999093381687
Pong,SEED_R2D2,0,280024000.0,-19.508585675430645
Pong,SEED_R2D2,0,289027200.0,-19.225
Pong,SEED_R2D2,0,293569600.0,-19.029020588235294
Pong,SEED_R2D2,0,311820800.0,-18.241582352941176
Pong,SEED_R2D2,0,324985600.0,-17.67359411764706
Pong,SEED_R2D2,0,335267200.0,-17.23
Pong,SEED_R2D2,0,341740800.0,-17.270653732602277
Pong,SEED_R2D2,0,381289600.0,-17.519017292281738
Pong,SEED_R2D2,0,386675200.0,-17.552838464782795
Pong,SEED_R2D2,0,399758400.0,-17.635
Pong,SEED_R2D2,0,432996800.0,-17.430631034482758
Pong,SEED_R2D2,0,472219200.0,-17.18946896551724
Pong,SEED_R2D2,0,478638400.0,-17.15
Pong,SEED_R2D2,0,482174400.0,-17.080805186972256
Pong,SEED_R2D2,0,496182400.0,-16.806687273823883
Pong,SEED_R2D2,0,568833600.0,-15.385
....

Therefore I decided to create a reproducibility issue.

My questions would be:

  1. Can you spot any obvious mistake that might have caused the discrepancy?
  2. Have you ever run your local, single-GPU setup for anything but startup demo?

Thanks

EDIT: I span another run with 1 billion frames, still no obvious learning curve.

Re-initialize agent in the middle of learner

I try to do the hyperparmeter tuning and I need to compare different agents with different network. However, when I try to re-initialize an agent with different network in the learner loop, I always get the error when minimize function is executed. Is re-initialize agent is allowed? Or is there any way to change the agent somewhere?

Thanks

Definition of batched changed ?

The last version of the bind function doesn't accept batched any more. According to the documentation only the input_spec shape matters

# This function is batched meaning it will be called once there are, in this
# case, 5 incoming calls.
@tf.function(input_signature=[tf.TensorSpec([5], tf.int32)])
def foo(x):
  return x + 1

server.bind(foo)

But it is not clear if each call should be a batch of 5 elements or only 1 elements.

This example fails if you call it with 5 clients sending 1 element each.

How to decouple grpc folder for other projects?

Hi,all! Thanks for the repo, it make me learn scalable rl more efficiently!
I want to only use the grpc for my projects now, i.e. i want to compile grpc ops for transform TF Tensors. How to decouple the grpc folder and only compile it in my project.
Thanks a lot~

Training on Standalone Machine Fails

Hi,
I am trying to run training on standalone machine. Install/training scripts are taken verbatim from Kaggle Competition Seed RL notebook

./train.sh football vtrace 16 '--total_environment_frames=600000
--game=11_vs_11_kaggle --reward_experiment=scoring,checkpoints --logdir=/kaggle_simulations/agent/'

Fails with :
...
'../grpc_cc.so'))
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/load_library.py", line 58, in load_op
_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/ranko_mosic_gmail_com/football/seed_rl/grpc/python/
../grpc_cc.so: undefined symbol: _ZN4absl14lts_2020_02_2518container_internal18global_next_sampleE

How to run an agent locally

After following the instructions and running the command ./run_local.sh dmlab vtrace 2, I encounter a console with the following on it

root@985411f3a892:/seed_rl/docker# cat /tmp/seed_rl/instructions
Welcome to the SEED local training of dmlab with vtrace.
SEED uses tmux for easy navigation between different tasks involved
in the training process. To switch to a specific task, press CTRL+b, [tab id].
You can stop training at any time by executing '../stop_local.sh'
root@985411f3a892:/seed_rl/docker# python3 check_gpu.py 2> /dev/null
../stop_local.shroot@985411f3a892:/seed_rl/docker# ../stop_local.sh

What do I do next? How do I start the training and how do I monitor and evaluate the performance?
Please help.

How to load a saved_model.pb file and continue training on it?

Hi everyone,
I realize that this is not an issue, per se, but I couldn't find any documentation on this problem.

Let's say I use the following code to train a model and create a checkpoint.

!bash train.sh football vtrace 4 '--total_environment_frames=10000 --game=11_vs_11_kaggle --reward_experiment=scoring,checkpoints --logdir=/agent/'

Now, I have three files in agent folder.

How can I continue training by loading these files and train more frames?

CUDA and driver requirements

I'm trying to get the run_local.sh script to run on my computer, but I keep getting cuInit errors when trying to create the Conv2D layers (the exact error seems to change with CUDA/driver version). My franctic googling has suggested that this is due to a CUDA and/or cudNN version mismatch with Tensorflow 2.1. However, I'm confused because I'm not sure how much of the CUDA versioning is handled by docker, and how much needs to be set up on my host machine.

Could you please provide a list of the required versions of CUDA, cudNN, docker and the NVIDIA Container Toolkit and graphics driver to get the code to run properly?
I'm currently using a RTX 2070 SUPER gfx card with drivers 440.64, CUDA 10.2, cudNN 7.6.5, docker 19.03.8 and the most recent release of NVIDIA Container Toolkit.

Dockerfile examples for custom env?

Hi seed_rl experts,
I'm adding a custom env via Gym API. This repo briefly mentioned that Gym API is supported but there is no guidance on how to do so? In the meantime I'm looking for example Dockerfiles for my new Gym env. Any pointers are welcome! Thx

Local cluster

Hi,
Is it possible to run this on a local cluster environment? I wasn't sure if the local run was limited to a single node.
What would be the hardware requirements? (e.g. would a GPU on the head node and CPUs on compute nodes work ok?)

weight sync during rollout creation

Hi,

Based on your article you have an example with "Off-policy in IMPALA. For the entire trajectory the policy stays the same. By the time the trajectory is sent to the queue for optimization, the policy has changed twice."

Nevertheless, looking at your code, I could not find the place, where you save the weight for the concrete sampler_id & run_id. Could you please show this code and explain how have you sync correct weight for each trajectory?

Change number of environments

Hi,

I am recently doing experiments related to seed-rl, and I want to change number of environments.

Based on your article, I guess there should be a way to change number of environments.

However, looking at your code, I could not find the place, where we can change the number of environments. Could you please show this code and explain how to change the number of environments?

Thank you so much!

missing grpc_cc.so file

hi, I have re-cloned this repo but couldn't run it as i noticed that the grpc_cc.so file is missing in grpc/ folder. I have tried to run build.sh in grpc/ folder but the grpc_cc.so file is still not generated. I have also tried to manually run the Dockerfile.grpc to no success. Below is the results when I run build.sh under grpc/ folder.

Sending build context to Docker daemon  62.98MB
Step 1/13 : FROM tensorflow/tensorflow:2.2.0-custom-op-gpu-ubuntu16 as grpc_compile
 ---> 4f52a8af55d1
Step 2/13 : RUN git clone https://github.com/tensorflow/custom-op.git
 ---> Using cache
 ---> ad802b5e9897
Step 3/13 : WORKDIR custom-op
 ---> Using cache
 ---> 4967ee6c88c8
Step 4/13 : RUN ./configure.sh
 ---> Using cache
 ---> f8fac4c15f9b
Step 5/13 : RUN echo '\nload("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")\n\nhttp_archive(\n    name = "com_github_grpc_grpc",\n    urls = [\n        "https://github.com/grpc/grpc/archive/ac1c5de1b36da4a1e3d72ca40b0e43f24266121a.tar.gz",\n    ],\n    strip_prefix = "grpc-ac1c5de1b36da4a1e3d72ca40b0e43f24266121a",\n)\n\nload("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")\ngrpc_deps()\nload("@com_github_grpc_grpc//bazel:grpc_extra_deps.bzl", "grpc_extra_deps")\ngrpc_extra_deps()' >> WORKSPACE
 ---> Using cache
 ---> 394cf60db842
Step 6/13 : ADD grpc/ grpc/
 ---> Using cache
 ---> fd6b4b6cc598
Step 7/13 : RUN bazel build grpc:ops/grpc.so grpc:service_py_proto --incompatible_remove_legacy_whole_archive=0
 ---> Using cache
 ---> a3adc0030562
Step 8/13 : ADD . /seed_rl
 ---> aa87974eefdc
Step 9/13 : RUN cp bazel-bin/grpc/ops/grpc.so /seed_rl/grpc/grpc_cc.so
 ---> Running in d12fd318a443
Removing intermediate container d12fd318a443
 ---> 0c4b868f40ba
Step 10/13 : RUN cp bazel-bin/grpc/service_pb2.py /seed_rl/grpc/service_pb2.py
 ---> Running in 83f66ec4890b
Removing intermediate container 83f66ec4890b
 ---> 4e7b53239638
Step 11/13 : WORKDIR /seed_rl/
 ---> Running in f5302466583e
Removing intermediate container f5302466583e
 ---> 7483220588ba
Step 12/13 : RUN pip3 install tensorflow-gpu==2.2.0
 ---> Running in 947cf84d0a86
Requirement already satisfied: tensorflow-gpu==2.2.0 in /usr/local/lib/python3.6/dist-packages (2.2.0)
Requirement already satisfied: keras-preprocessing>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.1.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (0.31.1)
Requirement already satisfied: numpy<2.0,>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.18.5)
Requirement already satisfied: scipy==1.4.1; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.4.1)
Requirement already satisfied: tensorboard<2.3.0,>=2.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (2.2.2)
Requirement already satisfied: google-pasta>=0.1.8 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (3.2.1)
Requirement already satisfied: astunparse==1.6.3 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.6.3)
Requirement already satisfied: protobuf>=3.8.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (3.12.2)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (0.9.0)
Requirement already satisfied: h5py<2.11.0,>=2.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (2.10.0)
Requirement already satisfied: tensorflow-estimator<2.3.0,>=2.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (2.2.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.1.0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.29.0)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.12.1)
Requirement already satisfied: gast==0.3.3 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (0.3.3)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.2.0) (1.12.0)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (2.23.0)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (47.1.1)
Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (1.16.1)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (2.6.8)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (1.6.0.post3)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (1.0.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (0.4.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (2020.4.5.2)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (1.25.9)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (2.9)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (4.1.0)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (4.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (1.3.0)
Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.6/dist-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.3.0,>=2.2.0->tensorflow-gpu==2.2.0) (3.1.0)
Removing intermediate container 947cf84d0a86
 ---> 9d8c39b44b09
Step 13/13 : RUN PYTHONPATH=/ python3 grpc/python/ops_test.py
 ---> Running in 6d6ca3fd49ac
Running tests under Python 3.6.10: /usr/bin/python3
[ RUN      ] OpsTest.test_batched_at_least_one_input
2020-07-12 04:33:10.707013: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-07-12 04:33:10.707035: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-07-12 04:33:10.707060: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-07-12 04:33:10.707468: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-12 04:33:10.737592: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2599990000 Hz
2020-07-12 04:33:10.738067: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3ea02a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-12 04:33:10.738127: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
[       OK ] OpsTest.test_batched_at_least_one_input
[ RUN      ] OpsTest.test_batched_first_dimension_must_match
[       OK ] OpsTest.test_batched_first_dimension_must_match
[ RUN      ] OpsTest.test_batched_inputs_at_least_rank1
[       OK ] OpsTest.test_batched_inputs_at_least_rank1
[ RUN      ] OpsTest.test_batched_output_is_batched
[       OK ] OpsTest.test_batched_output_is_batched
[ RUN      ] OpsTest.test_batched_outputs_at_least_rank1
[       OK ] OpsTest.test_batched_outputs_at_least_rank1
[ RUN      ] OpsTest.test_bind_multiple_functions([], False)
[       OK ] OpsTest.test_bind_multiple_functions([], False)
[ RUN      ] OpsTest.test_bind_multiple_functions([1], True)
[       OK ] OpsTest.test_bind_multiple_functions([1], True)
[ RUN      ] OpsTest.test_binding_function_twice
2020-07-12 04:33:11.447666: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:910 : Invalid argument: Function 'foo' was bound twice.
[       OK ] OpsTest.test_binding_function_twice
[ RUN      ] OpsTest.test_call_after_shutdown([], False)
2020-07-12 04:33:11.562585: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_call_after_shutdown([], False)
[ RUN      ] OpsTest.test_call_after_shutdown([1], True)
2020-07-12 04:33:11.607132: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_call_after_shutdown([1], True)
[ RUN      ] OpsTest.test_call_after_shutdown_and_start([], False)
[       OK ] OpsTest.test_call_after_shutdown_and_start([], False)
[ RUN      ] OpsTest.test_call_after_shutdown_and_start([1], True)
[       OK ] OpsTest.test_call_after_shutdown_and_start([1], True)
[ RUN      ] OpsTest.test_client_non_scalar_server_address
[       OK ] OpsTest.test_client_non_scalar_server_address
[ RUN      ] OpsTest.test_create_variable([], False)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0712 04:33:12.043102 140415275980544 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
[       OK ] OpsTest.test_create_variable([], False)
[ RUN      ] OpsTest.test_create_variable([1], True)
[       OK ] OpsTest.test_create_variable([1], True)
[ RUN      ] OpsTest.test_deletion_while_in_blocking_call([], False)
2020-07-12 04:33:12.344345: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:33:12.374135: W tensorflow/core/kernels/queue_base.cc:277] 283: Skipping cancelled enqueue attempt with queue not closed
[       OK ] OpsTest.test_deletion_while_in_blocking_call([], False)
[ RUN      ] OpsTest.test_deletion_while_in_blocking_call([1], True)
2020-07-12 04:33:12.418763: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:33:12.447515: W tensorflow/core/kernels/queue_base.cc:277] 299: Skipping cancelled enqueue attempt with queue not closed
[       OK ] OpsTest.test_deletion_while_in_blocking_call([1], True)
[ RUN      ] OpsTest.test_empty_input
[       OK ] OpsTest.test_empty_input
[ RUN      ] OpsTest.test_empty_output([], False)
[       OK ] OpsTest.test_empty_output([], False)
[ RUN      ] OpsTest.test_empty_output([1], True)
[       OK ] OpsTest.test_empty_output([1], True)
[ RUN      ] OpsTest.test_failing_function([], False)
[       OK ] OpsTest.test_failing_function([], False)
[ RUN      ] OpsTest.test_failing_function([1], True)
[       OK ] OpsTest.test_failing_function([1], True)
[ RUN      ] OpsTest.test_invalid_number_of_arguments([], False)
[       OK ] OpsTest.test_invalid_number_of_arguments([], False)
[ RUN      ] OpsTest.test_invalid_number_of_arguments([1], True)
[       OK ] OpsTest.test_invalid_number_of_arguments([1], True)
[ RUN      ] OpsTest.test_invalid_shape([], False)
[       OK ] OpsTest.test_invalid_shape([], False)
[ RUN      ] OpsTest.test_invalid_shape([1], True)
[       OK ] OpsTest.test_invalid_shape([1], True)
[ RUN      ] OpsTest.test_invalid_type([], False)
[       OK ] OpsTest.test_invalid_type([], False)
[ RUN      ] OpsTest.test_invalid_type([1], True)
[       OK ] OpsTest.test_invalid_type([1], True)
[ RUN      ] OpsTest.test_large_tensor([], False)
[       OK ] OpsTest.test_large_tensor([], False)
[ RUN      ] OpsTest.test_large_tensor([1], True)
[       OK ] OpsTest.test_large_tensor([1], True)
[ RUN      ] OpsTest.test_nests([], False)
[       OK ] OpsTest.test_nests([], False)
[ RUN      ] OpsTest.test_nests([1], True)
[       OK ] OpsTest.test_nests([1], True)
[ RUN      ] OpsTest.test_no_output([], False)
[       OK ] OpsTest.test_no_output([], False)
[ RUN      ] OpsTest.test_no_output([1], True)
[       OK ] OpsTest.test_no_output([1], True)
[ RUN      ] OpsTest.test_not_bound
2020-07-12 04:33:14.916092: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:936 : Unavailable: No function was bound
[       OK ] OpsTest.test_not_bound
[ RUN      ] OpsTest.test_not_fully_specified_outputs
[       OK ] OpsTest.test_not_fully_specified_outputs
[ RUN      ] OpsTest.test_not_fully_specified_outputs2
[       OK ] OpsTest.test_not_fully_specified_outputs2
[ RUN      ] OpsTest.test_queue([], False)
[       OK ] OpsTest.test_queue([], False)
[ RUN      ] OpsTest.test_queue([1], True)
[       OK ] OpsTest.test_queue([1], True)
[ RUN      ] OpsTest.test_server_non_vector_server_addresses
[       OK ] OpsTest.test_server_non_vector_server_addresses
[ RUN      ] OpsTest.test_session
[  SKIPPED ] OpsTest.test_session
[ RUN      ] OpsTest.test_shutdown_waiting_for_full_batch
2020-07-12 04:33:16.670570: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_shutdown_waiting_for_full_batch
[ RUN      ] OpsTest.test_shutdown_while_in_blocking_call([], False)
2020-07-12 04:33:16.833345: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:33:16.860401: W tensorflow/core/kernels/queue_base.cc:277] 734: Skipping cancelled enqueue attempt with queue not closed
[       OK ] OpsTest.test_shutdown_while_in_blocking_call([], False)
[ RUN      ] OpsTest.test_shutdown_while_in_blocking_call([1], True)
2020-07-12 04:33:16.890666: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:33:16.918619: W tensorflow/core/kernels/queue_base.cc:277] 750: Skipping cancelled enqueue attempt with queue not closed
[       OK ] OpsTest.test_shutdown_while_in_blocking_call([1], True)
[ RUN      ] OpsTest.test_shutdown_while_in_call([], False)
2020-07-12 04:33:17.063580: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_shutdown_while_in_call([], False)
[ RUN      ] OpsTest.test_shutdown_while_in_call([1], True)
2020-07-12 04:33:18.112495: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_shutdown_while_in_call([1], True)
[ RUN      ] OpsTest.test_simple([], False)
[       OK ] OpsTest.test_simple([], False)
[ RUN      ] OpsTest.test_simple([1], True)
[       OK ] OpsTest.test_simple([1], True)
[ RUN      ] OpsTest.test_simple_two_calls([], False)
[       OK ] OpsTest.test_simple_two_calls([], False)
[ RUN      ] OpsTest.test_simple_two_calls([1], True)
[       OK ] OpsTest.test_simple_two_calls([1], True)
[ RUN      ] OpsTest.test_starting_twice
2020-07-12 04:33:19.578631: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:936 : Invalid argument: Server is already started
[       OK ] OpsTest.test_starting_twice
[ RUN      ] OpsTest.test_stress_test
2020-07-12 04:34:02.588852: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:34:02.588852: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:34:02.589077: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
2020-07-12 04:34:02.589125: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at grpc.cc:1091 : Unavailable: Read failed, is the server closed?
[       OK ] OpsTest.test_stress_test
[ RUN      ] OpsTest.test_string([], False)
[       OK ] OpsTest.test_string([], False)
[ RUN      ] OpsTest.test_string([1], True)
[       OK ] OpsTest.test_string([1], True)
[ RUN      ] OpsTest.test_tpu([], False)
[       OK ] OpsTest.test_tpu([], False)
[ RUN      ] OpsTest.test_tpu([1], True)
[       OK ] OpsTest.test_tpu([1], True)
[ RUN      ] OpsTest.test_tpu_tf_function_same_device([], False)
[       OK ] OpsTest.test_tpu_tf_function_same_device([], False)
[ RUN      ] OpsTest.test_tpu_tf_function_same_device([1], True)
[       OK ] OpsTest.test_tpu_tf_function_same_device([1], True)
[ RUN      ] OpsTest.test_two_clients([], False)
[       OK ] OpsTest.test_two_clients([], False)
[ RUN      ] OpsTest.test_two_clients([1], True)
[       OK ] OpsTest.test_two_clients([1], True)
[ RUN      ] OpsTest.test_two_clients([2], True)
[       OK ] OpsTest.test_two_clients([2], True)
[ RUN      ] OpsTest.test_upvalue([], False)
[       OK ] OpsTest.test_upvalue([], False)
[ RUN      ] OpsTest.test_upvalue([1], True)
[       OK ] OpsTest.test_upvalue([1], True)
[ RUN      ] OpsTest.test_variable_out_of_scope
[       OK ] OpsTest.test_variable_out_of_scope
[ RUN      ] OpsTest.test_wait_for_server([], False)
[       OK ] OpsTest.test_wait_for_server([], False)
[ RUN      ] OpsTest.test_wait_for_server([1], True)
[       OK ] OpsTest.test_wait_for_server([1], True)
[ RUN      ] OpsTest.test_wait_for_server2([], False)
[       OK ] OpsTest.test_wait_for_server2([], False)
[ RUN      ] OpsTest.test_wait_for_server2([1], True)
[       OK ] OpsTest.test_wait_for_server2([1], True)
----------------------------------------------------------------------
Ran 68 tests in 77.762s

OK (skipped=1)
Removing intermediate container 6d6ca3fd49ac
 ---> 5c08b5cda234
Successfully built 5c08b5cda234
Successfully tagged seed_rl:grpc
74c84f964c08b3ec22a99c5db5097ced837b3cd36ade69af47b00080594608cc

How to run sac?

Can you provide a tutorial on using sac algorithm?Thank you very much!

How can I train it on multi-GPU

I have tried change OneDeviceStrategy to be MirroredStrategy.

strategy = tf.distribute.OneDeviceStrategy(device=device_name)

But I get valueerror below.

Could not convert from `tf.VariableAggregation` VariableAggregation.NONE to`tf.distribute.ReduceOp`
 type
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", l
ine 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1314, i
n convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/values.py", line 138
1, in _tensor_conversion_sync_on_read
    return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protect
ed-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/values.py", line 137
1, in _dense_var_to_tensor
    self.get(), dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/values.py", line 322
, in get
    return self._get_cross_replica()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/values.py", line 134
6, in _get_cross_replica
    reduce_util.ReduceOp.from_variable_aggregation(self.aggregation),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/reduce_util.py", lin
e 50, in from_variable_aggregation
    "`tf.distribute.ReduceOp` type" % aggregation)
ValueError: Could not convert from `tf.VariableAggregation` VariableAggregation.NONE to`tf.distribute
.ReduceOp` type

But when I apply clip_norm to temp_grads, that error disappeared. I‘m not sure will this change cause synchronization risk.

def apply_gradients(_):
  clip_grads, _ = tf.clip_by_global_norm(temp_grads, 40)
  optimizer.apply_gradients(zip(clip_grads, agent.trainable_variables))

Unable to Instantiate gRPC Server

I'm trying to instantiate a gRPC server to perform inferencing for a bunch of CPUs

However, after cloning seed_rl and then switching my cd, I am unable to run the code provided in the readme.

I receive the following error.

NotFoundError: /content/seed_rl/grpc/python/../grpc_cc.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS8_EE

This can be reproduced in a colab notebook.

Colab nbs use tf 2.3, or whatever the latest version is. Could that be causing the issue?

What version are you guys using?

Don't have access to seed_rl bucket

This may just be me being new to google cloud, but when I follow all the instructions and run train_atari.sh it gives me the error:
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to seed_rl.

Grpc is incompatible with tf2 if tf2 was built from source

Hello, I have tried to use seed_rl framework on local machine:

Ubuntu 18.04.5 LTS
AMD Ryzen 9 3900x 12-core
GeForce RTX 3080/PCIe/SSE2 (Gigabyte)

I installed nvidia driver 455, cuda 11.1 and Cudnn 8.0.4 according to instructions from nvidia official website.

As far as I can tell, there aren’t tensorflow 2 builds compatible with Cuda 11.1 for now. So I build tf2 version from source using one of the latest tf-nightly commits (tf-nightly==2.5.0.dev20201025, tf.git_version == v1.12.1-44562-g33335ad96). After building tf2 tf.test.is_gpu_available() returns True and simple convnet tf2 examples also work as well as tests from cudnn samples.

I cannot start training in docker using my tf2 built from source, because I cannot use my wheel while building docker image – it raise the following error: “wheel is not supported on this platform”. So I start training locally (local training without docker works perfectly fine with CPU only and tf==2.2), however when using seed_rl with GPU and tf2 built from sources it raises an error:

tensorflow.python.framework.errors_impl.NotFoundError: /home/nono/PycharmProjects/seed_rl/grpc/python/../grpc_cc.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb

So I built grpc using builder from seed_rl repository (using tf-nightly==2.5.0.dev20201025 as tensoflow version), just adapted script a little according to #14 and it was built successfully and passed tests. However I experienced the same error with this grpc build and my tf2 version built from source (from the same commit as tf-nightly==2.5.0.dev20201025).
I want to mention that if I’m using tf-nightly==2.5.0.dev20201025 as my local tf2 (instead of built from source) grpc works fine (however it is not compatible with 11.1 cuda and so my GPU).

In conclusion: my problem is that RTX 3080 is compatible only with CUDA 11.1, tf2 compatible with CUDA 11.1 can only be build from source, however built from source tf2 doesn’t work with any grpc I built (even one that was built in container with tf-nightly built from the same commit as my tf2 built).


I also tried CUDA 10.1 and CUDA 11.0.
With CUDA 10.1 seed_rl starts very slow (however this can be partially fixed with CUDA_CACHE_MAXSIZE=2147483648) and gives very strange results: looks like something wrong with computations: trained on GPU models achieve different (usually very bad) results from trained on CPU ones and sometimes GPU’s results looks wrong.
With CUDA 11.0 tensorflow gives a lot of warnings and then crushing with “out of memory” message.

grpc_dockerfile.txt

no server running on /tmp/tmux-0/default

Hi

Thanks for this interesting package.

I'm close to running it but I keep getting this when I run ./run_local.sh atari r2d2 4

Do you know about this one?

+ docker run --gpus all --entrypoint ./docker/run.sh -ti -it -e HOST_PERMS=7619:5560 --name seed --rm seed_rl:atari atari r2d2 4 no current session no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default rm /tmp/agent -Rf; python3 ../atari/r2d2_main.py --run_mode=learner --logtostderr --pdb_post_mortem --num_actors=4 no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no server running on /tmp/tmux-0/default no sessions

ImportError: libGL.so.1 while running locally

Hi, I ran the atari routine following the instructions (i.e. ./run_local.sh atari r2d2 4), then checking my actors I'm getting the following error:

root@a31483e5a60d:/seed_rl/docker# CUDA_VISIBLE_DEVICES='' python3 ../atari/r2d2_main.py --run_mode=actor --logtostderr
--pdb_post_mortem  --num_actors=4 --task=1
Traceback (most recent call last):
  File "../atari/r2d2_main.py", line 24, in <module>
    from seed_rl.atari import env
  File "/seed_rl/atari/env.py", line 23, in <module>
    from seed_rl.atari import atari_preprocessing
  File "/seed_rl/atari/atari_preprocessing.py", line 23, in <module>
    import cv2
  File "/usr/local/lib/python3.6/dist-packages/cv2/__init__.py", line 5, in <module>
    from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Need help. Thanks!

running R2D2 without Docker

I'm trying to run seed rl (R2D2) without Docker on Ubuntu 18.04. I've tried to decouple the files as much as I can from docker. When I try to run r2d2_main.py in leaner mode in the terminal,
python atari/r2d2_main.py --run_mode=learner --logtostderr --pdb_post_mortem --num_actors=2,

I get this error:

Traceback (most recent call last):
  File "atari/r2d2_main.py", line 27, in <module>
    from seed_rl.agents.r2d2 import learner
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/agents/r2d2/learner.py", line 38, in <module>
    from seed_rl import grpc
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/__init__.py", line 21, in <module>
    from seed_rl.grpc.python.ops import *  
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/ops.py", line 25, in <module>
    from seed_rl.grpc.python.ops_wrapper import gen_grpc_ops
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/ops_wrapper.py", line 25, in <module>
    gen_grpc_ops = tf.load_op_library(os.path.join(tf.compat.v1.resource_loader.get_data_files_path(), '../grpc_cc.so'))
  File "/home/dave/anaconda3/envs/dave/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/../grpc_cc.so: undefined symbol: _ZN10tensorflow14DataTypeStringENS_8DataTypeE

The only change I made to r2d2_main.py is add

import sys
sys.path.insert(1, '/home/dave/Documents/AI/2020_seed_rl/')

for path purposes.

Which approach to the on-policy training/inference synchronization is best?

Hi,
I am thinking of a synchronization technique that would be suitable to introduce an on-policy algorithm such as PPO
which alternates between data collection and training. I came up with 3 basic ideas and would very much appreciate it if anyone could help me judge if any of them is valid. I am leaning towards the last one as it doesn't use busy waits (i.e., empty while loops on sync variables), although I am not entirely sure how (un)acceptable is a busy loop on a GPU. Although below I present simple pseudocode to demonstrate ideas, I implemented them minimally outside of SEED to try out synchronization via tf variables
inside of TF functions called through SEED's gRPC framework to examine potential problems. One of such problems is the
default mode of autograph which unaware of my synchronization efforts (re)moves stuff thus compromising my intents, but I did
some artificial dependencies hackery to go around this problem (guess optimizations can be controlled but ideally I would
hope for sth like much-hated C++ volatile qualifier). In any case: thanks for any feedback, if any of the below approaches, are valid and/or ideas for alternatives.

# Model is a tf.module with a constructor and 2 tf.functions: infer and train called with gRPC as in SEED.
# tf variables live on cpu of a sole host associated with learner

#1 busy waits, many sync variables
class Model() {

    def Model() {
        var training = tf.Var(False)
        var inferring = tf.Var([False]*NUM_ACTORS)
    }

    def infer(id) {
        while training:
            pass

        inferring[id] = True
        ...
        inferring[id] = False

        return result
    }

    def train() {
        training = True
        while sum(inferring) > 0:
            pass

        ...
        training = False
    }
}


#2 busy waits, a single sync flag, invalidating inference result
class Model() {

    def Model() {
        var training = tf.Var(False)
    }

    # invariant: time(inference) < time(training)
    def infer(id) {
        do {
            while training:
                pass

            ...

        } while not training

        return result
    }

    def train {
        training = True
        ...
        training = False
    }
}

#3 no busy waits, single sync variable, client needs to balance inference calls and reject some of the returns
class Model {

    def Model() {
        var training = tf.Var(False)
    }

    # invariant: time(inference) < time(training)
    def infer(id) {

        if training:
            return None

        ...

        if training:
            return None

        return result
    }

    def train {
        training = True
        ...
        training = False
    }
}

About sac_main.y

I tried to add sac_ main.py Files in Atari

 from absl import app
from absl import flags
# from seed_rl.agents.r2d2 import learner
from seed_rl.agents.sac import learner
from seed_rl.agents.sac import networks
from seed_rl.atari import env
# from seed_rl.atari import networks
from seed_rl.common import actor
from seed_rl.common import common_flags  
import tensorflow as tf

FLAGS = flags.FLAGS

# Optimizer settings.
flags.DEFINE_float('learning_rate', 0.00048, 'Learning rate.')
flags.DEFINE_float('adam_epsilon', 1e-3, 'Adam epsilon.')
flags.DEFINE_integer('stack_size', 4, 'Number of frames to stack.')


def create_agent(env_action_s, env_obs_s, parametric_action_distribution):
  return networks.ActorCriticMLP(parametric_action_distribution, 1,[32,32])


def create_optimizer(unused_final_iteration):
  learning_rate_fn = lambda iteration: FLAGS.learning_rate
  optimizer = tf.keras.optimizers.Adam(FLAGS.learning_rate,
                                       epsilon=FLAGS.adam_epsilon)
  return optimizer, learning_rate_fn


def main(argv):
  if len(argv) > 1:
    raise app.UsageError('Too many command-line arguments.')
  if FLAGS.run_mode == 'actor':
    actor.actor_loop(env.create_environment)
  elif FLAGS.run_mode == 'learner':
    learner.learner_loop(env.create_environment,
                         create_agent,
                         create_optimizer)
  else:
    raise ValueError('Unsupported run mode {}'.format(FLAGS.run_mode))


if __name__ == '__main__':
  # FLAGS.run_mode = 'learner'

  app.run(main)

But something went wrong

`run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "../atari/sac_main.py", line 61, in main
create_optimizer)
File "/seed_rl/agents/sac/learner.py", line 402, in learner_loop
initialize_agent_variables(agent)
File "/seed_rl/agents/sac/learner.py", line 400, in initialize_agent_variables
create_variables()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 627, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 506, in _initialize
*args, **kwds))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2667, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

/seed_rl/agents/sac/learner.py:399 create_variables  *
    agent.get_Q(*decode(input_), action=decode(input_[0]))]
/seed_rl/agents/sac/networks.py:110 get_action  *
    return self.__call__(*args, **kwargs)
/seed_rl/agents/sac/networks.py:126 __call__  *
    action_params = self.get_action_params(prev_action, env_output, state)
/seed_rl/agents/sac/networks.py:101 get_action_params  *
    return self._actor_mlp(self._concat_obs(env_output.observation))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__  **
    outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:291 call
    outputs = layer(inputs, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py:1183 call
    outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:4346 tensordot
    ab_matmul = matmul(a_reshape, b_reshape)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:180 wrapper
    return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:2984 matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py:5587 mat_mul
    name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:578 _apply_op_helper
    param_name=input_name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:61 _SatisfiesTypeConstraint
    ", ".join(dtypes.as_dtype(x).name for x in allowed_list)))

TypeError: Value passed to parameter 'a' has DataType uint8 not in list of allowed values: bfloat16, float16, float32, float64, int32, int64, complex64, complex128

`
I suspect it's the input as a picture, but it doesn't appear in R2D2.
Please help to solve it. Thank you very much!

'GrpcServerResourceHandleOp' is neither a type of a primitive operation nor a name of a function registered in binary running on n-b0fdb3cc-w-0.

We tried scaling up our custom model on seed_rl codebase, but it seems the TPU integration is incomplete for tf2.3.
I could be wrong but there are a few missing pieces here and there.

Tensorflow documentation indicates we need the use the
tf.config.experimental_connect_to_cluster(resolver)
but it isn't referenced anywhere in the seed_rl sourcebase.

when used, the TPU is detected, initialized, and both training and and base strategy are created. But fails at UnrollStore with a sceptic error message:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/thomas_rl/src/agents/vtrace/thomas/main.py", line 43, in main
    create_optimizer)
  File "/thomas_rl/src/agents/vtrace/learner.py", line 436, in learner_loop
    create_host(i, host, inference_devices)
  File "/thomas_rl/src/agents/vtrace/learner.py", line 328, in create_host
    (action_specs, env_output_specs, agent_output_specs))
  File "/thomas_rl/src/common/utils.py", line 145, in __init__
    timestep_specs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 635, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 635, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/thomas_rl/src/common/utils.py", line 139, in create_unroll_variable
    [num_envs, self._full_length] + spec.shape.dims, dtype=spec.dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2747, in wrapped
    tensor = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2806, in zeros
    output = fill(shape, constant(zero, dtype=dtype), name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 239, in fill
    result = gen_array_ops.fill(dims, value, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3402, in fill
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: 'GrpcServerResourceHandleOp' is neither a type of a primitive operation nor a name of a function registered in binary running on n-b0fdb3cc-w-0. Make sure the operation or function is registered in the binary running in this process. [Op:Fill]
full logs
rm /tmp/agent -Rf; python3 /thomas_rl/src/agents/vtrace/thomas/main.py --run_mode=learner  --job-dir='gs://xxxxxx//XXXXXX_20200921085830/' --logtostderr --pdb_post_mortem  --num_envs=16 --e
nv_batch_size=4 --tpu_name=boost-7j6bk                                                                                                                                                            
________                               _______________                                                                                                                                            
___  __/__________________________________  ____/__  /________      __                                                                                                                            
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /                                                                                                                            
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /                                                                                                                             
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/                                                                                                                              
                                                                                                                                                                                                  
                                                                                                                                                                                                  
WARNING: You are running this container as root, which can cause new files in                                                                                                                     
mounted volumes to be created as the root user on your host machine.                                                                                                                              
                                                                                                                                                                                                  
To avoid this, run the container by specifying your user's userid:                                                                                                                                
                                                                                                                                                                                                  
$ docker run -u $(id -u):$(id -g) args...                                                                                                                                                         
                                                                                                                                                                                                  
root@484f4c16e62e:/thomas_rl/docker# rm /tmp/agent -Rf; python3 /thomas_rl/src/agents/vtrace/thomas/main.py --run_mode=learner  --job-dir='gs://xxxxxx//XXXXXX_20200921085830/' --logtostderr
 --pdb_post_mortem  --num_envs=16 --env_batch_size=4 --tpu_name=boost-7j6bk                                                                                                                       
2020-09-21 09:58:32.157014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1                                                 
                                                                                                                                                                                                  
$$$$$$$$\ $$\   $$\  $$$$$$\  $$\      $$\  $$$$$$\   $$$$$$\                                                                                                                                     
\__$$  __|$$ |  $$ |$$  __$$\ $$$\    $$$ |$$  __$$\ $$  __$$\                                                                                                                                    
   $$ |   $$ |  $$ |$$ /  $$ |$$$$\  $$$$ |$$ /  $$ |$$ /  \__|                                                                                                                                   
   $$ |   $$$$$$$$ |$$ |  $$ |$$\$$\$$ $$ |$$$$$$$$ |\$$$$$$\                                                                                                                                     
   $$ |   $$  __$$ |$$ |  $$ |$$ \$$$  $$ |$$  __$$ | \____$$\                                                                                                                                    
   $$ |   $$ |  $$ |$$ |  $$ |$$ |\$  /$$ |$$ |  $$ |$$\   $$ |                                                                                                                                   
   $$ |   $$ |  $$ | $$$$$$  |$$ | \_/ $$ |$$ |  $$ |\$$$$$$  |                                                                                                                                   
   \__|   \__|  \__| \______/ \__|     \__|\__|  \__| \______/  version b97dd88                                                                                                                   
python /thomas_rl/src/agents/vtrace/thomas/main.py --run_mode=learner --job-dir=gs://xxxxxx//XXXXXX_20200921085830/ --logtostderr --pdb_post_mortem --num_envs=16 --env_batch_size=4 --tpu_na
me=boost-7j6bk                                                                                                                                                                                    
I0921 09:58:35.211318 140059370288960 learner.py:193] Starting learner loop                                                                                                                       
inference-batch-size: 16                                                                                                                                                                          
I0921 09:58:35.273476 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
I0921 09:58:35.358564 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
2020-09-21 09:58:35.413014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1                                                      
2020-09-21 09:58:35.413081: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: UNKNOWN ERROR (-1)                                                                       
2020-09-21 09:58:35.413104: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (484f4c16e62e): /proc/driver/nvidia/version does n
ot exist                                                                                                                                                                                          
I0921 09:58:35.442526 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
I0921 09:58:35.536518 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
2020-09-21 09:58:35.582941: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU inst
ructions in performance-critical operations:  AVX2 FMA                                                                                                                                            
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                                       
2020-09-21 09:58:35.592080: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299995000 Hz                                                                               
2020-09-21 09:58:35.594242: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a57340 initialized for platform Host (this does not guarantee that XLA will be used). Devices:       
2020-09-21 09:58:35.594291: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version                                                                  
2020-09-21 09:58:35.613283: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.240.1.2:8470}                                   
2020-09-21 09:58:35.613358: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30102}                                
2020-09-21 09:58:35.629874: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.240.1.2:8470}                                   
2020-09-21 09:58:35.629946: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30102}                                
2020-09-21 09:58:35.630474: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:405] Started server with target: grpc://localhost:30102                                                  
I0921 09:58:35.631092 140059370288960 remote.py:218] Entering into master device scope: /job:worker/replica:0/task:0/device:CPU:0                                                                 
1600678715 INFO:   detected a TPU: [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_typ
e='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(na
me='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/
task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU')]                                                                     
INFO:tensorflow:Initializing the TPU system: boost-7j6bk                                                                                                                                          
I0921 09:58:35.632414 140059370288960 tpu_strategy_util.py:73] Initializing the TPU system: boost-7j6bk                                                                                           
INFO:tensorflow:Clearing out eager caches                                                                                                                                                         
I0921 09:58:45.013357 140059370288960 tpu_strategy_util.py:108] Clearing out eager caches                                                                                                         
INFO:tensorflow:Finished initializing TPU system.                                                                                                                                                 
I0921 09:58:45.015173 140059370288960 tpu_strategy_util.py:131] Finished initializing TPU system.                                                                                                 
W0921 09:58:45.015923 140059370288960 tpu_strategy.py:320] `tf.distribute.experimental.TPUStrategy` is deprecated, please use  the non experimental symbol `tf.distribute.TPUStrategy` instead.   
I0921 09:58:45.048566 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
I0921 09:58:45.124522 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
INFO:tensorflow:Found TPU system:                                                                                                                                                                 
I0921 09:58:45.178662 140059370288960 tpu_system_metadata.py:159] Found TPU system:                                                                                                               
INFO:tensorflow:*** Num TPU Cores: 8                                                                                                                                                              
I0921 09:58:45.178842 140059370288960 tpu_system_metadata.py:160] *** Num TPU Cores: 8                                                                                                            
INFO:tensorflow:*** Num TPU Workers: 1                                                                                                                                                            
I0921 09:58:45.178928 140059370288960 tpu_system_metadata.py:161] *** Num TPU Workers: 1                                                                                                          
INFO:tensorflow:*** Num TPU Cores Per Worker: 8                                                                                                 
I0921 09:58:45.178997 140059370288960 tpu_system_metadata.py:163] *** Num TPU Cores Per Worker: 8                                                                                                 
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)                                                                                  
I0921 09:58:45.179058 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)                                
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)                                                                          
I0921 09:58:45.179831 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)                        
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)                                                                                     
I0921 09:58:45.179914 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)                                                                                     
I0921 09:58:45.179981 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)                                                                                     
I0921 09:58:45.180045 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)                                                                                     
I0921 09:58:45.180108 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)                                                                                     
I0921 09:58:45.180172 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)                                                                                     
I0921 09:58:45.180235 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)                                                                                     
I0921 09:58:45.180303 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)                                                                                     
I0921 09:58:45.180362 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)                                                                                     
I0921 09:58:45.180426 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)                                   
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)                                                                       
I0921 09:58:45.180494 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)                     
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)                                                                             
I0921 09:58:45.180557 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)                           
W0921 09:58:45.181079 140059370288960 tpu_strategy.py:320] `tf.distribute.experimental.TPUStrategy` is deprecated, please use  the non experimental symbol `tf.distribute.TPUStrategy` instead.   
I0921 09:58:45.218042 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
I0921 09:58:45.292918 140059370288960 transport.py:157] Attempting refresh to obtain initial access_token                                                                                         
INFO:tensorflow:Found TPU system:                                                                                                                                                                 
I0921 09:58:45.343351 140059370288960 tpu_system_metadata.py:159] Found TPU system:                                                                                                               
INFO:tensorflow:*** Num TPU Cores: 8                                                                                                                                                              
I0921 09:58:45.343567 140059370288960 tpu_system_metadata.py:160] *** Num TPU Cores: 8                                                                                                            
INFO:tensorflow:*** Num TPU Workers: 1                                                                                                                                                            
I0921 09:58:45.343652 140059370288960 tpu_system_metadata.py:161] *** Num TPU Workers: 1                                                                                                          
INFO:tensorflow:*** Num TPU Cores Per Worker: 8                                                                                                                                                   
I0921 09:58:45.343721 140059370288960 tpu_system_metadata.py:163] *** Num TPU Cores Per Worker: 8                                                                                                 
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
I0921 09:58:45.343784 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
I0921 09:58:45.343852 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
I0921 09:58:45.343914 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
I0921 09:58:45.343983 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
I0921 09:58:45.344045 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
I0921 09:58:45.344128 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
I0921 09:58:45.344192 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
I0921 09:58:45.344259 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
I0921 09:58:45.344321 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
I0921 09:58:45.344379 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
I0921 09:58:45.344445 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
I0921 09:58:45.344506 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
I0921 09:58:45.344567 140059370288960 tpu_system_metadata.py:165] *** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
1600678725 INFO:   Creating environment: thomas-base-v1 -- id: 0
1600678725 INFO:   FC layers size : 512, lstm cell size: (256, 256)
2020-09-21 09:58:50.191816: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2020-09-21 09:58:53.162754: W tensorflow/core/distributed_runtime/eager/remote_tensor_handle_data.cc:76] Unable to destroy remote tensor handles. If you are running a tf.function, it usually indicates some op in the graph gets an error: 'GrpcServerResourceHandleOp' is neither a type of a primitive operation nor a name of a function registered in binary running on n-b0fdb3cc-w-0. Make sure the operation or function is registered in the binary running in this process.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/thomas_rl/src/agents/vtrace/thomas/main.py", line 43, in main
    create_optimizer)
  File "/thomas_rl/src/agents/vtrace/learner.py", line 436, in learner_loop
    create_host(i, host, inference_devices)
  File "/thomas_rl/src/agents/vtrace/learner.py", line 328, in create_host
    (action_specs, env_output_specs, agent_output_specs))
  File "/thomas_rl/src/common/utils.py", line 145, in __init__
    timestep_specs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 635, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 635, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/thomas_rl/src/common/utils.py", line 139, in create_unroll_variable
    [num_envs, self._full_length] + spec.shape.dims, dtype=spec.dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2747, in wrapped
    tensor = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2806, in zeros
    output = fill(shape, constant(zero, dtype=dtype), name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 239, in fill
    result = gen_array_ops.fill(dims, value, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3402, in fill
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: 'GrpcServerResourceHandleOp' is neither a type of a primitive operation nor a name of a function registered in binary running on n-b0fdb3cc-w-0. Make sure the operation or function is registered in the binary running in this process. [Op:Fill]

 *** Entering post-mortem debugging ***

I initially thought it would be a similar problem to GH-43.

But experimental_connect_to_cluster should already be setting CPU:0 as default. and logs it aswell:
I0921 10:14:39.859340 140635681793856 remote.py:218] Entering into master device scope: /job:worker/replica:0/task:0/device:CPU:0

the error sometimes shifts and gives another custom gRPC op name. The codebase works great on GPU.

I can give reproducible steps if necessary, I am unsure if I am doing anything wrong or if it is a bug due to the lack of documentation. any help is greatly appreciated.

Installation

I want to know if SEED RL can be installed without docker ???

Any help is greatly appreciated.

Thank you

how to analyse my GPU memory usage details

I really want to know the GPU's occupancy in atari --Pong game. My gpu is 1080Ti, and I want to konw the detail useage of its memory. Can anyone give me some guidance?

run SEED_RL + R2D2 in pure inference/eval mode with one actor

is there a way to run seed_rl with r2d2 with only one eval actor for inference? I tried and I hit this assert error

assert FLAGS.num_actors > FLAGS.num_eval_actors, (
      'Total number of actors ({}) should be greater than number of actors '
      'reserved to eval ({})'.format(
          FLAGS.num_actors, FLAGS.num_eval_actors))

seems like currently the only way is to run 2 actors (1train, 1eval) at minimum.

Unable to update non-tensor variable in the tf.function

Thanks for this wonderful code which is very useful for our research. When we try to add few lines of codes in seed-rl to achieve our purpose, we found the following problem: We try to initialize an non-tensor variable outside a function which is tf.function(e.g. a=0). We try to update this variable inside inference function which is a tf.function (e.g. a+=1). after we execute this function in the actor side (using client.inference()), we found that when we call this variable again in the learner side (such as print(a)),the value of this non-tensor variable cannot be updated (remain as a=0).

We try hard to fix this, but this problem still cannot be fixed. We will be very grateful if you can give us some guidance.

Thanks

Problem running a GPU-based local version

Hi, first of all, thank for contributing this! It is awesome that more and more high-performance RL implementations appear.

I tried to run the local GPU version by following the instructions. My version of Docker is 19.03.5, build 633a0ea838, running in non-sudo mode, CUDA is installed, and in general, I have no problems running GPU-based training.
However the script ./run_local.sh dmlab vtrace fails shortly after starting, looks like in the learner.

root@c1e8d54c6fd6:/seed_rl/docker# rm /tmp/agent -Rf; python3 ../dmlab/vtrace_main.py --run_mode=learner --logtostderr --pdb_post_mortem  --num_actors=4
2020-02-05 09:30:48.413962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-05 09:30:48.415229: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
I0205 09:30:49.258689 140590619293504 learner.py:192] Starting learner loop                                                                    
2020-02-05 09:30:49.259519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1     
2020-02-05 09:30:49.264996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:                           
pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1                                                                       
coreClock: 1.607GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s                                                
2020-02-05 09:30:49.265032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-05 09:30:49.265054: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10                                                                                                                                    
2020-02-05 09:30:49.266461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-05 09:30:49.266695: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-05 09:30:49.268158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-05 09:30:49.268922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10   
2020-02-05 09:30:49.268951: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-05 09:30:49.270040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0                  
2020-02-05 09:30:49.270288: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA                                                                                              
2020-02-05 09:30:49.275845: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3300000000 Hz         
2020-02-05 09:30:49.276734: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5525b50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-05 09:30:49.276753: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version                                                  
2020-02-05 09:30:49.364341: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5473c40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:                                                                                        
2020-02-05 09:30:49.364378: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1                                                                                                                             
2020-02-05 09:30:49.365110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:      
pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1                                                                                                                                                                                                            
coreClock: 1.607GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s                                    
2020-02-05 09:30:49.365153: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-05 09:30:49.365168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10         
2020-02-05 09:30:49.365189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10                                                                                                                                     
2020-02-05 09:30:49.365204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-05 09:30:49.365219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-05 09:30:49.365233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-05 09:30:49.365246: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-05 09:30:49.366538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0      
2020-02-05 09:30:49.366583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-05 09:30:49.523013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-05 09:30:49.523051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0                                                  
2020-02-05 09:30:49.523057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N                                         
2020-02-05 09:30:49.523844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5769 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capabili
ty: 6.1)                                                                                                                                 
I0205 09:30:49.525754 140590619293504 env.py:136] Creating environment: explore_goal_locations_small                                        
2020-02-05 09:30:51.506836: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.                                                                                                 
E0205 09:30:53.173518022      78 server_chttp2.cc:40]        {"created":"@1580895053.173498150","description":"Only 1 addresses added out of total 2 resolved","file":"external/com_github_grpc_grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":404,"refere
nced_errors":[{"created":"@1580895053.173495496","description":"Address family not supported by protocol","errno":97,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":406,"os_error":"Address family not supported by protocol","
syscall":"socket","target_address":"[::1]:8686"}]}                                                                  
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future ver
sion.                                                                                                                        
Instructions for updating:                                                                                                   
If using Keras pass *_constraint arguments to layers.                                                                                                   
W0205 09:30:54.316198 140590619293504 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is depreca
ted and will be removed in a future version.                                                                   
Instructions for updating:                                                                                        
If using Keras pass *_constraint arguments to layers.               
INFO:tensorflow:Assets written to: /tmp/agent/saved_model/assets
I0205 09:30:54.494515 140590619293504 builder_impl.py:775] Assets written to: /tmp/agent/saved_model/assets               
Actor ids needing reset: [1 0]                                                                                          
2020-02-05 09:30:58.985624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-05 09:30:59.136033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Actor ids needing reset: [3 2]
2020-02-05 09:30:59.528105: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-02-05 09:30:59.883939: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-02-05 09:30:59.885489: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-02-05 09:30:59.886922: F ./tensorflow/core/kernels/conv_2d_gpu.h:453] Non-OK-status: GpuLaunchKernel(ShuffleInTensor3Simple<T, 2, 1, 0>, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, in.data(), combined_dims, out.data()) status
: Internal: out of memory
Fatal Python error: Aborted

Thread 0x00007fdb9dffb700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 67 in _worker
  File "/usr/lib/python3.6/threading.py", line 864 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fddcdebf740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 593 in _GradientsHelper
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 654 in _backprop_function
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 978 in func_graph_from_py_func
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 664 in _construct_forward_backward
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 616 in forward_backward
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 707 in _rewrite_forward_and_call_backward
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 788 in _backward_function
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/imperative_grad.py", line 77 in imperative_grad
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/backprop.py", line 1029 in gradient
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 332 in _call_unconverted
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 459 in converted_call
  File "/tmp/tmpm7l4ko07.py", line 28 in compute_gradients
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/one_device_strategy.py", line 356 in _call_for_each_replica
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1819 in call_for_each_replica
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 763 in experimental_run_v2
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/one_device_strategy.py", line 180 in experimental_run_v2
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 332 in _call_unconverted
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 459 in converted_call
  File "/tmp/tmpm7l4ko07.py", line 45 in tf__minimize
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 565 in converted_call
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 964 in wrapper
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 439 in wrapped_fn
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 978 in func_graph_from_py_func
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2593 in _create_graph_function
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2703 in _maybe_define_function
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2389 in _get_concrete_function_internal_garbage_collected
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 497 in _initialize
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 615 in _call
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568 in __call__
  File "/seed_rl/agents/vtrace/learner.py", line 474 in learner_loop
  File "../dmlab/vtrace_main.py", line 60 in main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
  File "../dmlab/vtrace_main.py", line 66 in <module>
Aborted (core dumped)

It looks like initially the GPU is discovered, but then it looks like it OOMs in the learner process.
Any tips on diagnosing the issue?

EDIT: nevermind, the problem disappeared after a reboot.

callbacks in grpc.cc

Hi,
I am new in distributed learning. I am wondering what is the 'callback' in the FnType::operator() arguments used for in the grpc.cc. It seems that the python code only passes 'args' into the function, but there are quite many codes handling the callbacks. Is the 'callback' argument used in seed_rl?

Failed to build GRPC with tf-nightly

Dear all,

I'm trying to build de GRPC package with tf-nightly but got the following error:

ERROR: /custom-op/grpc/BUILD.bazel:4:1: C++ compilation of rule '//grpc:ops/grpc.so' failed (Exit 1)
In file included from bazel-out/k8-opt/bin/grpc/service.grpc.pb.h:21:0,
                 from grpc/ops/grpc.cc:23:
bazel-out/k8-opt/bin/grpc/service.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
 #error This file was generated by an older version of protoc which is
  ^~~~~
bazel-out/k8-opt/bin/grpc/service.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
 #error incompatible with your Protocol Buffer headers. Please
  ^~~~~
bazel-out/k8-opt/bin/grpc/service.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
 #error regenerate this file with a newer version of protoc.
  ^~~~~
grpc/ops/grpc.cc: In member function 'virtual void tensorflow::{anonymous}::CreateGrpcClientOp::Compute(tensorflow::OpKernelContext*)':
grpc/ops/grpc.cc:1055:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (int i = 0; i < method_output_signatures_list.size(); ++i) {
                     ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the docker file I use:

FROM tensorflow/tensorflow:nightly-custom-op-ubuntu16 as grpc_compile

RUN git clone https://github.com/tensorflow/custom-op.git
WORKDIR custom-op

RUN ./configure.sh

RUN echo '\n\
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")\n\
\n\
http_archive(\n\
    name = "com_github_grpc_grpc",\n\
    urls = [\n\
        "https://github.com/grpc/grpc/archive/ac1c5de1b36da4a1e3d72ca40b0e43f24266121a.tar.gz",\n\
    ],\n\
    strip_prefix = "grpc-ac1c5de1b36da4a1e3d72ca40b0e43f24266121a",\n\
)\n\
\n\
load("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")\n\
grpc_deps()\n\
load("@com_github_grpc_grpc//bazel:grpc_extra_deps.bzl", "grpc_extra_deps")\n\
grpc_extra_deps()' >> WORKSPACE

ADD grpc/ grpc/
RUN bazel build grpc:ops/grpc.so grpc:service_py_proto --incompatible_remove_legacy_whole_archive=0

ADD . /seed_rl
RUN cp bazel-bin/grpc/ops/grpc.so /seed_rl/grpc/grpc_cc.so
RUN cp bazel-bin/grpc/service_pb2.py /seed_rl/grpc/service_pb2.py
WORKDIR /seed_rl/

RUN PYTHONPATH=/ python grpc/python/ops_test.py

Question on Vtrace Training Result Stability

I train atari PongNoFrameskip-v4 using vtrace method,using the networks the same as the networks of dmlab(IMPALACNN and LSTM),with no FrameStack,get unstable training result as show in the figure.How to train to be stable?
with default parameters and change below:
learing_rate:0.00048 adam_epsilon:3.125e-9 batch_size:32 num_actors:64 lambda_:0.95
1
2
2 2
3

Cannot assign a device for operation Aggregator/Gather

the TF 2.3 update broke something on the learner side.

here is the error

2020-09-14 14:52:29.201093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 10605 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2020-09-14 14:52:29.668067: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at grpc.cc:927 : Invalid argument: Cannot assign a device for operation Aggregator/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node Aggregator/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceScatterUpdate: CPU XLA_CPU XLA_GPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ResourceGather: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  aggregator_gather_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  Aggregator/Gather (ResourceGather) 
  Aggregator/ResourceScatterUpdate (ResourceScatterUpdate) 

         [[{{node Aggregator/Gather}}]]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "../football/vtrace_main.py", line 55, in main
    create_optimizer)
  File "/seed_rl/agents/vtrace/learner.py", line 423, in learner_loop
    create_host(i, host, inference_devices)
  File "/seed_rl/agents/vtrace/learner.py", line 417, in create_host
    server.bind([create_inference_fn(d) for d in inference_devices])
  File "/seed_rl/grpc/python/ops.py", line 109, in bind
    output_specs=output_specs_proto.SerializeToString())
  File "<string>", line 212, in grpc_server_bind
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation Aggregator/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node Aggregator/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceScatterUpdate: CPU XLA_CPU XLA_GPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ResourceGather: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  aggregator_gather_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  Aggregator/Gather (ResourceGather) 
  Aggregator/ResourceScatterUpdate (ResourceScatterUpdate) 

         [[{{node Aggregator/Gather}}]] [Op:GrpcServerBind]

Environment:

OS: test Ubuntu 20 and debian 9
Machine types: GCP 8 cpu 30G ram
GPU: tested NVIDIA Tesla K80, NVIDIA Tesla V100, and T4

docker version

lancelot@seed-k80:~/seed_rl$ docker version
Client: Docker Engine - Community
 Version:           19.03.9
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        9d988398e7
 Built:             Fri May 15 00:25:20 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.9
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       9d988398e7
  Built:            Fri May 15 00:23:53 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P8    31W / 149W |     14MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1071      G   /usr/lib/xorg/Xorg                             8MiB |
|    0      1404      G   /usr/bin/gnome-shell                           3MiB |
+-----------------------------------------------------------------------------+

command line:
./run_local.sh football vtrace 4

it seems the Server.bind is forced to use a GPU when the code only allows for a CPU. the error says:
assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

grpc.cc:893

    Device* cpu_device;
    OP_REQUIRES_OK(ctx, lib->device_mgr()->LookupDevice("CPU:0", &cpu_device));
    int num_args = fdef->signature().input_arg_size();

Loading and running trained models

I've successfully trained a network on my custom environment. Now I'd like to be able to observe the activities of the hidden states as the network navigates the environment. As such, I'm loading the latest checkpoint and trying to mimic the inference/environment step cycle in a single script. I will then gather the agent outputs and analyse them later. Here is the code I'm using is pasted below.

The issue is that while this code produces better than uninitialized returns on the environment, it is not nearly as good as the returns I'm getting in the eval agent during training (or even the non-eval agents). So, it seems that I must be missing something that is done during the training loop.

One possible thing is that when loading from the checkpoint the leaner wants agent AND target_agent. But, I'm only loading target agent: ckpt = tf.train.Checkpoint(target_agent=agent)

with strategy.scope():
	@tf.function
	def inference(*args):
		return agent(*decode(args))

observation = env.reset()
reward = 0.0
raw_reward = 0.0
done = False
zeroIndex = tf.constant([0], dtype=tf.int32)

agent_outputs.reset(zeroIndex)
agent_states.replace(zeroIndex, agent.initial_state(1))
		
while not done:
	env_output = utils.EnvOutput(reward, done, observation)
		
	env_outputs.replace(zeroIndex, env_output)
			
	input_ = encode((agent_outputs.read(zeroIndex), env_outputs.read(zeroIndex)))
	agent_output, agent_state = inference(input_, agent_states.read(zeroIndex))
				
	agent_outputs.replace(zeroIndex, agent_output.action)	
	agent_states.replace(zeroIndex, agent_state)	
			
	observation, reward, done, info = env.step(agent_output.action.numpy()[0])

Question on R2D2 concept & implementation

Hi I hope I can ask a question here regarding R2D2 implementation as I can't find any other complete source code on R2D2 (I dont think the authors made it public? not sure)

anyway,

does this comment imply that for one training batch with sequence length (unroll_length - burn_in + 1), this batch might contain sequences from more than one episode? (ie. one episode ends and another one begins) Thus, we need to reset the agent state to zero before passing to the network? and this 'reset' is performed by this line?:

# If the episode ended, the frame state should be reset before the next.
state = tf.nest.map_structure(
        lambda x, y, done_t=done_t: tf.where(  
            tf.reshape(done_t, [done_t.shape[0]] + [1] *
                       (x.shape.rank - 1)), x, y),
        zero_state,
        state)

Learning type

Hi,
Sorry, I'm new to RL lol

Which RL algorithms are included or used in this? e.g. D4PG MPO A3C DQN....

PrioritizedReplay importance sampling weights calculation

I want to preface this issue with the fact that it may not necessarily be an issue, but an implementation detail that may not necessarily affect the results, but can help me understand the decisions that were made in the code and following implementation.

The importance sampling weights computation (computed here) uses the normalized probabilities from the replay buffer, but does not compute the importance sampling weights normalized by the replay buffer (instead normalized by the sample size). In the original PER paper, (which may have been changed progressing from PER to Ape-X to R2D2, etc.), the index of the importance sampling weights (i) are the same as the sum indices for the probability (see original PER paper Algorithm 1, lines 9-10).

For my understanding, and others viewing the code, what is the correct way to do this (without the optimizations such as the sum-tree and segmenting)? In my implementations, I've computed the importance sampling weights from the replay buffer, as opposed to the samples, which way is more correct? Or if you compute the samples from the importance sampling weights, do you need to "correct" this by multiplying by the size of the samples instead of the replay buffer (replace limit with num_samples)?

Currently, the implementation is (1) gathering then (2) computing the max. Alternatively, according to the indices from the PER paper, would be to (1) compute the max then (2) gather, unless I am mistaken.

what is bit-packing and is it environment specific?

I am new to the concept of bit-packing and have limited knowledge on binary vs decimal.

I see in the frame_stack() method there is some bit-packing code:


# Unpacked 'frame_stacking_state'. Ordered from oldest to most recent.
  unstacked_state = []
  for i in range(stack_size - 1):
    # [batch_size, height, width]
    unstacked_state.append(tf.cast(tf.bitwise.bitwise_and(
        tf.bitwise.right_shift(frame_stacking_state, i * 8), 0xFF),
                                   tf.float32))

and


shifted = tf.bitwise.left_shift(
      tf.cast(stacked_frames[-1, ..., :-1], tf.int32),
      # We want to shift so that MSBs are newest frames.
      [8 * i for i in range(stack_size - 2, -1, -1)])
  # This is really a reduce_or, because bits don't overlap.
  new_state = tf.reduce_sum(shifted, axis=-1)

I am assuming this is to make processing faster, however, is this code specific to frame/image based environments? (I am concern about the value 8 and (0xFF, 255 decimal value from what I know)). For example, will the same seed_rl r2d2 frame_stack() method work with CartPole?

grpc error on Windows 10

hi I tried to run ./run_local.sh atari r2d2 4 in git bash on Windows 10 and got this error:

file_path\seed_rl\grpc\grpc_cc.so is either not designed to run on Windows or it contains an roor. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status 0xc000012f

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.