Giter Site home page Giter Site logo

trfl's Introduction

TRFL

TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.

Installation

TRFL can be installed from pip with the following command: pip install trfl

TRFL will work with both the CPU and GPU version of tensorflow, but to allow for that it does not list Tensorflow as a requirement, so you need to install Tensorflow and Tensorflow-probability separately if you haven't already done so.

Usage Example

import tensorflow as tf
import trfl

# Q-values for the previous and next timesteps, shape [batch_size, num_actions].
q_tm1 = tf.get_variable(
    "q_tm1", initializer=[[1., 1., 0.], [1., 2., 0.]], dtype=tf.float32)
q_t = tf.get_variable(
    "q_t", initializer=[[0., 1., 0.], [1., 2., 0.]], dtype=tf.float32)

# Action indices, discounts and rewards, shape [batch_size].
a_tm1 = tf.constant([0, 1], dtype=tf.int32)
r_t = tf.constant([1, 1], dtype=tf.float32)
pcont_t = tf.constant([0, 1], dtype=tf.float32)  # the discount factor

# Q-learning loss, and auxiliary data.
loss, q_learning = trfl.qlearning(q_tm1, a_tm1, r_t, pcont_t, q_t)

loss is the tensor representing the loss. For Q-learning, it is half the squared difference between the predicted Q-values and the TD targets, shape [batch_size]. Extra information is in the q_learning namedtuple, including q_learning.td_error and q_learning.target.

The loss tensor can be differentiated to derive the corresponding RL update.

reduced_loss = tf.reduce_mean(loss)
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(reduced_loss)

All loss functions in the package return both a loss tensor and a namedtuple with extra information, using the above convention, but different functions may have different extra fields. Check the documentation of each function below for more information.

Documentation

Check out the full documentation page here.

trfl's People

Contributors

abdel avatar aslanides avatar dhruva6 avatar diegolascasas avatar dwf avatar hartikainen avatar kaue avatar liusiqi43 avatar miljanm avatar mtthss avatar n-kats avatar superbobry avatar xiaoschannel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trfl's Issues

Legal actions mask bug

Found a bug in epsilon_greedy() in policy_ops.py when applying legal_actions_mask. It fails when masking the action with the highest action value.

For example:

action_values = [2.0, 1.0, 1.0]
legal_actions_mask = [0., 1., 1.]
epsilon = 0.1
result = policy_ops.epsilon_greedy(action_values, epsilon, legal_actions_mask).probs

Outputs:
[0.9 0.05 0.05]

import trfl not working

I am using Spyder (Python 3.6) in ubuntu 18.04
import tensorflow

import trfl

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

Traceback (most recent call last):

File "", line 1, in
import trfl

File "/home/dd/.local/lib/python3.6/site-packages/trfl/init.py", line 31, in
from trfl.dist_value_ops import categorical_dist_double_qlearning

File "/home/dd/.local/lib/python3.6/site-packages/trfl/dist_value_ops.py", line 33, in
from trfl import distribution_ops

File "/home/dd/.local/lib/python3.6/site-packages/trfl/distribution_ops.py", line 30, in
from trfl import gen_distribution_ops

File "/home/dd/.local/lib/python3.6/site-packages/trfl/gen_distribution_ops.py", line 2, in
_op_lib = tf.load_op_library(tf.resource_loader.get_path_to_datafile("_gen_distribution_ops.so"))

File "/home/dd/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)

NotFoundError: /home/dd/.local/lib/python3.6/site-packages/trfl/_gen_distribution_ops.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewEPFPNS_8OpKernelEPNS_20OpKernelConstructionEE

Removing tf.contrib

Would you be open to accepting a PR to remove code using tf.contrib as it won't be available in TF 2 ?

How is deterministic policy gradient being evaluated?

I cannot grasp the steps for lines 87 to 92 in trfl/blob/master/trfl/dpg_ops.py. Why is a target_a being created? The subsequent stop_gradient is understandable since we don't want to update the Q-network's trainable variables. But then, what does this loss represent in the next line?
DPG to me is an application of the chain rule. How is the optimization of loss helping update the network?

I don't know if there is a better way to ask this question as I could not contact the authors of the dpg_ops.py (mainly Matteo Hessel and Miljan Martic) by any other means.

Add/alias dpg critic update

Hi, the DPG critic update (see Algorithm 1 of Lillicrap et al. 2016, https://arxiv.org/abs/1509.02971) is substantively the same as your td_learning function; however, this is currently obscured. I would suggest adding a dpg_qlearning function that aliases td_learning in dpg_ops.py:

from trfl.value_ops import td_learning
...
dpg_qlearning = td_learning

Alternatively, one could add a comment referencing the td_learning fn in the dpg actor update fn.

Retrace Ops: documented return shapes

Hi, it seems like the documented returns shapes for the following functions might be off:

  1. retrace_ops.retrace(...)
  2. retrace_ops.retrace_core(...)
  3. retrace_ops._general_off_policy_corrected_multistep_target(...)

The first two are documented to return shape [B] and third shape [T, B, num_actions], while they all appear to return [T, B].

Some test code to check.

import numpy as np
import tensorflow as tf

from trfl import retrace_ops, indexing_ops


### Example input data: 
# https://github.com/deepmind/trfl/blob/08ccb293edb929d6002786f1c0c177ef291f2956/trfl/retrace_ops_test.py#L41

lambda_ = 0.9
qs = [
    [[2.2, 3.2, 4.2],
     [5.2, 6.2, 7.2]],
    [[7.2, 6.2, 5.2],
     [4.2, 3.2, 2.2]],
    [[3.2, 5.2, 7.2],
     [4.2, 6.2, 9.2]],
    [[2.2, 8.2, 4.2],
     [9.2, 1.2, 8.2]]
     ]
targnet_qs = [
    [[2., 3., 4.],
     [5., 6., 7.]],
    [[7., 6., 5.],
     [4., 3., 2.]],
    [[3., 5., 7.],
     [4., 6., 9.]],
    [[2., 8., 4.],
     [9., 1., 8.]]
     ]
actions = [
    [2, 0], 
    [1, 2], 
    [0, 1], 
    [2, 0]
    ]
rewards = [
    [1.9, 2.9], 
    [3.9, 4.9], 
    [5.9, 6.9], 
    [np.nan, np.nan]  # nan marks entries we should never use.
    ]
pcontinues = [
    [0.8, 0.9], 
    [0.7, 0.8], 
    [0.6, 0.5], 
    [np.nan, np.nan]
    ]
target_policy_probs = [
    [[np.nan] * 3,
     [np.nan] * 3],
    [[0.41, 0.28, 0.31],
     [0.19, 0.77, 0.04]],
    [[0.22, 0.44, 0.34],
     [0.14, 0.25, 0.61]],
    [[0.16, 0.72, 0.12],
     [0.33, 0.30, 0.37]]
     ]
behaviour_policy_probs = [
    [np.nan, np.nan], 
    [0.85, 0.86], 
    [0.87, 0.88], 
    [0.89, 0.84]
    ]

### Retrace Test: ###
retrace = retrace_ops.retrace(
        lambda_, qs, targnet_qs, actions, rewards,
        pcontinues, target_policy_probs, behaviour_policy_probs)

# qs: shape [(T+1), B, num_actions] 
# https://github.com/deepmind/trfl/blob/08ccb293edb929d6002786f1c0c177ef291f2956/trfl/retrace_ops.py#L85
T = len(qs) - 1  # sequence length
B = len(qs[0])  # batch dimension
N = len(qs[0][0])  # number of actions

# loss: documented shape [B] 
# https://github.com/deepmind/trfl/blob/08ccb293edb929d6002786f1c0c177ef291f2956/trfl/retrace_ops.py#L121
tf.debugging.assert_equal(retrace.loss.shape, [T, B])  # succeeds

### Multi-step target Test: ###
timesteps = tf.shape(qs)[0] # Batch size is qs_shape[1].
timestep_indices_tm1 = tf.range(0, timesteps - 1)
timestep_indices_t = tf.range(1, timesteps)

target_policy_t = tf.gather(target_policy_probs, timestep_indices_t)
behaviour_policy_t = tf.gather(behaviour_policy_probs, timestep_indices_t)
a_t = tf.gather(actions, timestep_indices_t)
r_t = tf.gather(rewards, timestep_indices_tm1)
pcont_t = tf.gather(pcontinues, timestep_indices_tm1)
targnet_q_t = tf.gather(targnet_qs, timestep_indices_t)

c_t = retrace_ops._retrace_weights(
        indexing_ops.batched_index(target_policy_t, a_t),
        behaviour_policy_t) * lambda_

target = retrace_ops._general_off_policy_corrected_multistep_target(
  r_t, pcont_t, target_policy_t, c_t, targnet_q_t, a_t
)

# target: documented shape [T, B, N] 
# https://github.com/deepmind/trfl/blob/08ccb293edb929d6002786f1c0c177ef291f2956/trfl/retrace_ops.py#L241
tf.debugging.assert_equal(target.shape, [T, B])  # succeeds

Trouble Installing TRFL 1.0.1 in Colab

I tried installing trfl version 1.0.1 in Colab and am getting an error:
import trfl

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-2-dd69192d7d7c> in <module>()
----> 1 import trfl

/usr/local/lib/python3.6/dist-packages/trfl/__init__.py in <module>()
     29 from trfl.discrete_policy_gradient_ops import discrete_policy_gradient_loss
     30 from trfl.discrete_policy_gradient_ops import sequence_advantage_actor_critic_loss
---> 31 from trfl.dist_value_ops import categorical_dist_double_qlearning
     32 from trfl.dist_value_ops import categorical_dist_qlearning
     33 from trfl.dist_value_ops import categorical_dist_td_learning

/usr/local/lib/python3.6/dist-packages/trfl/dist_value_ops.py in <module>()
     31 import tensorflow as tf
     32 from trfl import base_ops
---> 33 from trfl import distribution_ops
     34 
     35 Extra = collections.namedtuple("dist_value_extra", ["target"])

/usr/local/lib/python3.6/dist-packages/trfl/distribution_ops.py in <module>()
     28 import tensorflow as tf
     29 import tensorflow_probability as tfp
---> 30 from trfl import gen_distribution_ops
     31 
     32 

/usr/local/lib/python3.6/dist-packages/trfl/gen_distribution_ops.py in <module>()
      1 import tensorflow as tf
----> 2 _op_lib = tf.load_op_library(tf.resource_loader.get_path_to_datafile("_gen_distribution_ops.so"))
      3 project_distribution = _op_lib.project_distribution
      4 del _op_lib, tf

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/load_library.py in load_op_library(library_filename)
     58     RuntimeError: when unable to load the library or get the python wrappers.
     59   """
---> 60   lib_handle = py_tf.TF_LoadLibrary(library_filename)
     61 
     62   op_list_str = py_tf.TF_GetOpList(lib_handle)

NotFoundError: /usr/local/lib/python3.6/dist-packages/trfl/_gen_distribution_ops.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS8_EE

I was able to install TRFL previously with Colab. As discussed in earlier issues I installed TF 1.12, reset the runtime, installed TF prob 0.5, and installed TRFL. This was working until recently (past week or so?):
https://colab.research.google.com/drive/1h5QdpZZ-Vz2KdTiiidS4O28b-pU0ihgn

If I specify the TRFL version as 1.0, I am still able to run TRFL and install as I used to:
https://colab.research.google.com/drive/1YoITxCmP-3v-WWKqQxJMR1w3Kyc5nKjw

Clarification of some abbreviations?

Dear Deepminder:

During a group meeting I was raised a question about the meanings of abbreviations in the demo code of TRFL when I tried to introduce TRFL to my lab members. So I have to ask it here.

It reads:

q_tm1: the action value in the source state of a transition.
a_tm1: the action that was selected in the source state.

What does m1 mean here? I know "q" stands for action value, "t" stands for time step, I tried to figure "m1" stands for what, but it is not so intuitive.

Could you please help me on that? Thanks a lot.

Unable to install trfl on Windows 10 via Anaconda Prompt

Neither of the two installing options seem to work for me.

The command pip install trfl throws the following error:

Collecting trfl ERROR: Could not find a version that satisfies the requirement trfl (from versions: none) ERROR: No matching distribution found for trfl

And the command pip install git+git://github.com/deepmind/trfl.git throws this error:

...
Building wheels for collected packages: trfl
Building wheel for trfl (setup.py) ... error
ERROR: Complete output from command 'c:\users\luis\anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\Luis\AppData\Local\Temp\pip-req-build-5488atsp\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Luis\AppData\Local\Temp\pip-wheel-2xiqoy_g' --python-tag cp36:
ERROR: running bdist_wheel
running build
running build_py
creating build
error: could not create 'build': file exists


ERROR: Failed building wheel for trfl
Running setup.py clean for trfl
Failed to build trfl
Installing collected packages: trfl
Running setup.py install for trfl ... error
ERROR: Complete output from command 'c:\users\luis\anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\Luis\AppData\Local\Temp\pip-req-build-5488atsp\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Luis\AppData\Local\Temp\pip-record-82clib4d\install-record.txt' --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
error: could not create 'build': file exists
----------------------------------------
ERROR: Command "'c:\users\luis\anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\Luis\AppData\Local\Temp\pip-req-build-5488atsp\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Luis\AppData\Local\Temp\pip-record-82clib4d\install-record.txt' --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Luis\AppData\Local\Temp\pip-req-build-5488atsp`

policy_gradient_loss batch_shape requirements

Why does policy_gradient_ops.policy_gradient_loss require batch_shape to be a rank 2 tensor? This would limit the policy_gradient_loss operation to only single univariate distributions that implement log_prob?

For instance, consider the problem where the actions are multivariate and follow a normal distribution:

>>> import tensorflow as tf; tf.enable_eager_execution()
>>> import tensorflow.contrib.eager as tfe
>>> import tensorflow_probability as tfp
>>> loc = tfe.Variable(tf.zeros([5, 5, 2]))
>>> policy = tfp.distributions.Normal(loc=loc, scale=1.)
<tfp.distributions.Normal 'Normal/' batch_shape=(5, 5, 2) event_shape=() dtype=float32>
>>> trfl.policy_gradient_loss(policy, tf.zeros([5, 5, 2]), tf.ones([5, 5]), [loc])
Traceback (most recent call last):
  File "/trfl/policy_gradient_ops.py", line 119, in policy_gradient_loss
    policies_.batch_shape.assert_has_rank(2)
  File "/tensorflow/python/framework/tensor_shape.py", line 728, in assert_has_rank
    raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (5, 5, 2) must have rank 2

I could understand how it is a requirement for a discrete distribution. But for the sake of supporting other distributions, it may be more structured to require the log_prob to be rank 3 and then perform a summation operation over the leading dimension:

>>> policy.log_prob(tf.zeros([5, 5, 2])).shape
TensorShape([Dimension(5), Dimension(5), Dimension(2)])
>>> tf.reduce_sum(policy.log_prob(tf.zeros([5, 5, 2])), axis=-1).shape
TensorShape([Dimension(5), Dimension(5)])

Thank you for your support and time.

tensorflow.python.framework.errors_impl.NotFoundError: _gen_distribution_ops.so

I can not run the example the example file from a basic installation

repository here: https://github.com/LuisSaybe/trfl-gridworld

output:

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "index.py", line 2, in <module>
    import trfl
  File "/usr/local/lib/python3.6/site-packages/trfl/__init__.py", line 31, in <module>
    from trfl.dist_value_ops import categorical_dist_double_qlearning
  File "/usr/local/lib/python3.6/site-packages/trfl/dist_value_ops.py", line 33, in <module>
    from trfl import distribution_ops
  File "/usr/local/lib/python3.6/site-packages/trfl/distribution_ops.py", line 30, in <module>
    from trfl import gen_distribution_ops
  File "/usr/local/lib/python3.6/site-packages/trfl/gen_distribution_ops.py", line 2, in <module>
    _op_lib = tf.load_op_library(tf.resource_loader.get_path_to_datafile("_gen_distribution_ops.so"))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.6/site-packages/trfl/_gen_distribution_ops.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewEPFPNS_8OpKernelEPNS_20OpKernelConstructionEE

I installed trfl with the following dockerfile

FROM centos:latest

ENV TERM xterm
ENV SOURCE_DIRECTORY /root/source
ENV PYTHON_VERSION 3.6.8

RUN yum -y update && \
    yum install -y gcc g++ openssl-devel zlib-devel libffi-devel man-pages man nano wget curl git-all unzip && \
    yum clean all && \

    wget --directory-prefix=/opt https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz && \
    tar -xzf /opt/Python-$PYTHON_VERSION.tgz --directory /opt && \
    rm /opt/Python-$PYTHON_VERSION.tgz && \
    cd /opt/Python-$PYTHON_VERSION && \

    ./configure && \
    make && \
    make install && \

    pip3 install --upgrade pip && \
    pip3 install numpy tensorflow tensorflow_probability trfl && \

    mkdir -p $SOURCE_DIRECTORY

WORKDIR $SOURCE_DIRECTORY

Then I run

docker run -it --rm -v $(pwd)/src:/root/source gridworld-trfl python3 index.py

index.py here

import tensorflow as tf
import trfl

# Q-values for the previous and next timesteps, shape [batch_size, num_actions].
q_tm1 = tf.get_variable(
    "q_tm1", initializer=[[1., 1., 0.], [1., 2., 0.]], dtype=tf.float32)
q_t = tf.get_variable(
    "q_t", initializer=[[0., 1., 0.], [1., 2., 0.]], dtype=tf.float32)

# Action indices, discounts and rewards, shape [batch_size].
a_tm1 = tf.constant([0, 1], dtype=tf.int32)
r_t = tf.constant([1, 1], dtype=tf.float32)
pcont_t = tf.constant([0, 1], dtype=tf.float32)  # the discount factor

# Q-learning loss, and auxiliary data.
loss, q_learning = trfl.qlearning(q_tm1, a_tm1, r_t, pcont_t, q_t)

print('loss', loss)

Questions about retrace implementation

Hey,

I was looking at the retrace ops provided by trfl and there are a couple of implementation details that seem a bit confusing to me.

  1. It seems like trfl retrace drops the discount terms from the ๐”ผ_ฯ€ Q(x_t, .) term. This is in line with the retrace formulation in Equation 13 in MPO paper [1], but is different from Equation 4 in the original retrace paper [2]. I have included a small test case below that shows this. Is this a bug or a conscious choice? Edit: actually, it seems like at least one of the terms is included in the continuation probs.

  2. In retrace_ops._general_off_policy_corrected_multistep_target comments, it's mentioned that exp_q_t = ๐”ผ_ฯ€ Q(x_{t+1},.) and qa_t = Q(x_t, a_t), indicating that exp_q_t should be one timestep ahead of qa_t: https://github.com/deepmind/trfl/blob/e633edbd9d326b8bebc7c7c7d53f37118b48a440/trfl/retrace_ops.py#L252-L253
    However, If I understand this correctly, when those values are actually assigned, they come from the same time indices: https://github.com/deepmind/trfl/blob/e633edbd9d326b8bebc7c7c7d53f37118b48a440/trfl/retrace_ops.py#L263-L264
    It's possible that the target_policy_t values that are used to index for exp_q_t somehow account this, but I can't wrap my head around how that would do it. Am I misunderstanding something here or is it possible that these indices are actually off?

[1] Abdolmaleki, A., Springenberg, J.T., Tassa, Y., Munos, R., Heess, N. and Riedmiller, M., 2018. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920.
[2] Munos, R., Stepleton, T., Harutyunyan, A. and Bellemare, M., 2016. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems (pp. 1054-1062).

Code related to question 1 (click to expand):

The test case is simplified (e.g. just one action) and I have used a slightly modified version of trfl to make it compatible with tf2, but all the logic should be the correct.

import numpy as np
import tensorflow as tf

from trfl import retrace_ops


lambda_ = 0.99
discount = 0.9
Q_values = np.array([
    [[2.2], [5.2]],
    [[7.2], [4.2]],
    [[3.2], [4.2]],
    [[2.2], [9.2]]], dtype=np.float32)
target_Q_values = np.array([
    [[2.], [5.]],
    [[7.], [4.]],
    [[3.], [4.]],
    [[2.], [9.]]], dtype=np.float32)
actions = np.array([
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0]])
rewards = np.array([
    [1.9, 2.9],
    [3.9, 4.9],
    [5.9, 6.9],
    [np.nan, np.nan],  # nan marks entries we should never use.
], dtype=np.float32)
pcontinues = np.array([
    [0.8, 0.9],
    [0.7, 0.8],
    [0.6, 0.5],
    [np.nan, np.nan]], dtype=np.float32)
target_policy_probs = np.array([
    [[np.nan] * 1, [np.nan] * 1],
    [[1.0], [1.0]],
    [[1.0], [1.0]],
    [[1.0], [1.0]]], dtype=np.float32)
behavior_policy_probs = np.array([
    [np.nan, np.nan],
    [1.0, 1.0],
    [1.0, 1.0],
    [1.0, 1.0]], dtype=np.float32)


def retrace_original_v1(
        lambda_,
        discount,
        target_Q_values,
        actions,
        rewards,
        target_policy_probs,
        behavior_policy_probs):
    actions = actions[1:, ...]
    rewards = rewards[:-1, ...]

    target_policy_probs = target_policy_probs[1:, ...]
    behavior_policy_probs = behavior_policy_probs[1:, ...]

    traces = lambda_ * np.minimum(
        1.0, target_policy_probs / behavior_policy_probs[..., None])

    deltas = (
        rewards[..., None]
        + discount * target_Q_values[1:]
        - target_Q_values[:-1])
    retraces = []
    for i in range(tf.shape(traces)[0]):
        sum_terms = []
        for t in range(i, tf.shape(traces)[0]):
            trace = tf.reduce_prod([
                traces[k]
                for k in range(i + 1, t + 1)
            ], axis=0)
            sum_term = discount ** (t - i) * trace * deltas[t]
            sum_terms.append(sum_term)

        result = tf.reduce_sum(sum_terms, axis=0)
        retraces.append(result)

    retraces = tf.stack(retraces) + target_Q_values[:-1]
    return retraces


output_original_v1 = retrace_original_v1(
    lambda_,
    1.0,
    target_Q_values,
    actions,
    rewards,
    target_policy_probs,
    behavior_policy_probs)
print(f"output_original_v1:\n{output_original_v1.numpy().round(3)}\n")

output_original_discounted_v1 = retrace_original_v1(
    lambda_,
    discount,
    target_Q_values,
    actions,
    rewards,
    target_policy_probs,
    behavior_policy_probs)
print(f"output_original_discounted_v1:\n{output_original_discounted_v1.numpy().round(3)}\n")


output_trfl_v1 = retrace_ops.retrace(
    lambda_,
    Q_values,
    target_Q_values,
    actions,
    rewards,
    tf.ones_like(rewards),
    target_policy_probs,
    behavior_policy_probs,
).extra.target[..., None]


tf.debugging.assert_near(output_original_v1, output_trfl_v1)  # succeeds
tf.debugging.assert_near(output_original_discounted_v1, output_trfl_v1)  # fails

Issue with pip install trfl on MacOs

Hello,

I get the following error when using pip install trfl

Could not find a version that satisfies the requirement trfl (from versions: )
No matching distribution found for trfl

I have tensorflow 1.13.1 & tensorflow-probability 0.60

Do you have an idea what the issue could be?
Thanks in advance for your help

Raise "error: could not create 'build': File exists" while installing

When I firstly install trfl, it raised error almost at the end of installation,
Failed building wheel for trfl Running setup.py clean for trfl Failed to build trfl Installing collected packages: trfl Running setup.py install for trfl ... error
The further issue is like

running install running build running build_py creating build error: could not create 'build': File exists

ImportError: cannot import name gen_distribution_ops

When I try to import trfl, similarly to this public trfl colab notebook online, I get

(Note I tried this in both python 2 and 3 notebooks, met with the same results)

<ipython-input-3-dd69192d7d7c> in <module>()
----> 1 import trfl

/usr/local/lib/python2.7/dist-packages/trfl/__init__.py in <module>()
     29 from trfl.discrete_policy_gradient_ops import discrete_policy_gradient_loss
     30 from trfl.discrete_policy_gradient_ops import sequence_advantage_actor_critic_loss
---> 31 from trfl.dist_value_ops import categorical_dist_double_qlearning
     32 from trfl.dist_value_ops import categorical_dist_qlearning
     33 from trfl.dist_value_ops import categorical_dist_td_learning

/usr/local/lib/python2.7/dist-packages/trfl/dist_value_ops.py in <module>()
     31 import tensorflow as tf
     32 from trfl import base_ops
---> 33 from trfl import distribution_ops
     34 
     35 Extra = collections.namedtuple("dist_value_extra", ["target"])

/usr/local/lib/python2.7/dist-packages/trfl/distribution_ops.py in <module>()
     28 import tensorflow as tf
     29 import tensorflow_probability as tfp
---> 30 from trfl import gen_distribution_ops
     31 
     32 

ImportError: cannot import name gen_distribution_ops

(Also, if I install trfl via pip instead of cloning from git, error messages look similar with this added on the end)


/usr/local/lib/python2.7/dist-packages/trfl/gen_distribution_ops.py in <module>()
      1 import tensorflow as tf
----> 2 _op_lib = tf.load_op_library(tf.resource_loader.get_path_to_datafile("_gen_distribution_ops.so"))
      3 project_distribution = _op_lib.project_distribution
      4 del _op_lib, tf

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.pyc in load_op_library(library_filename)
     59     RuntimeError: when unable to load the library or get the python wrappers.
     60   """
---> 61   lib_handle = py_tf.TF_LoadLibrary(library_filename)
     62 
     63   op_list_str = py_tf.TF_GetOpList(lib_handle)

Help installing on Windows 10 via Anaconda Environment

I ran the recommended install command and got this:

>pip install git+git://github.com/deepmind/trfl.git
Collecting git+git://github.com/deepmind/trfl.git
  Cloning git://github.com/deepmind/trfl.git to c:\users\julius\appdata\local\temp\pip-req-build-8py9u2uh
  Error [WinError 2] The system cannot find the file specified while executing command git clone -q git://github.com/deepmind/trfl.git C:\Users\Julius\AppData\Local\Temp\pip-req-build-8py9u2uh
Cannot find command 'git' - do you have 'git' installed and in your PATH?

I am running Windows 10 and using an anaconda environment #

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.