tikhonjelvis / rl-book Goto Github PK

Shell 0.01% Nix 1.69% Python 78.24% Jupyter Notebook 10.63% Lua 0.33% TeX 9.10%

rl-book's Issues

requirements.txt issues on Windows

I had a few issues with requirements.txt dependencies. I ran into installation errors with:

pandas == 1.0.3 --> switched it to 1.3.0 and it worked fine.
scipy == 1.4.1 --> switched to 1.7.0 and it worked.

requests 2.24.0 then conflicted with urllib3 == 1.26.5, as it depended on earlier versions. Switched to urllib 1.21.1 to resolve. Ran into some more build errors down the line.

For context, I'm on running in an anaconda environment on a Windows OS which could be the problem. The requirements may run correctly on a Linux VM.

requests package conflict

ERROR: Cannot install -r requirements.txt (line 66) and urllib3==1.26.5 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested urllib3==1.26.5
requests 2.24.0 depends on urllib3!=1.25.0, !=1.25.1, <1.26 and >=1.21.1

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

Code example missing statement p 50

The 2 code examples for the iterative square root algorithm cause an 'UnboundLocalError' and are missing an assignment; even with local vars initialized, the loop will never terminate as x is never updated.

I agree the use of math.inf is messy but prevents an extra check in the while loop like while x is None or abs()>.01

These are on line approx 644 of chapter1.md (p50 of the un-numbered chapter of the pdf book), and then again 3 paragraphs later.

import math
def sqrt(a: float) -> float:
    x = math.inf                         #Need to avoid UboundLocalError, x is previous value, x_n is current value
    x_n = a / 2 # initial guess
    while abs(x_n - x) > 0.01:
        x = x_n                             #<== missing                
        x_n = (x + (a / x)) / 2
    return x_n

typo pg. 125

First sentence of the Policy Improvement section should read "Terms such as 'better' ..."

nix installation failed on old (2018) MacOS with Sonoma 14.5

On my older 2018 intel mac, installation failed using the --darwin-use-unencrypted-nix-store-volume flag. excluding this flag as on the nix docs works fine.

I will make a docs PR.

typo

"its" is written as "it's" multiple times throughout the book, e.g. "distribution of it’s Markov Process" on pg. 194

Issue with feature functions in chapter8/optimal_exercise_bi.py

Dear professors,

I believe there is an issue with the feature functions used in the chapter8/optimal_exercise_bi.py, in the lines 200-202:

ffs += [(lambda s: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[i]))
            for i in range(num_laguerre)]

It should create different functions, each one being a different Laguerre polynomial multiplied by a softmax function.

However, it seems that it is creating the same function multiple times.

Searching for this issue on the internet, I found a nice post in Stackoverflow (https://stackoverflow.com/questions/6076270/lambda-function-in-list-comprehensions), where it explains the problem (a sort of edge case in Python, where the loop variable, the i in the code above, is captured by reference instead of by value).

It seems one possible solution would be to explicitly capture the loop variable inside the function as an extra argument, like below:

ffs += [(lambda s, coeff=i: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[coeff]))
            for i in range(num_laguerre)]

It might be that this same issue happens in other places in the codebase. The solution would be similar.

Would you please let me know what you think about this?

Thanks for the code examples and the book.

Mistake in summation for Stationary Distribution

The summation variable for a Stationary Distribution (p74 of pdf) in Chapter2.md should be s not s', i.e. sum over s in N:

Wrong:

Book feature Request -- keep indentation when copying code from pdf

It would be really helpful if indents were kept when copying code snippets from the pdf version of the book. (I don't know how to do it though).
This could enable readers to experiment with code snippets faster and encourage more such behavior.

rl.distribution, etc.

I keep seeing the command:

import rl.distribution

But I can't find which package this command is from. Is it from Keras-rl or gym?

typing or rl.distribution?

Thanks for the book - I am enjoying reading the book and its "modular" python codes.

I assume that in Chapter "Markov Processes" (sec:mrp-chapter), in the python code above the "Simple Inventory Example" (page 72 of book pdf), the

from typing import FiniteDistribution, Categorical
needs to be changed to
from rl.distribution import FiniteDistribution, Categorical

Check formula for calculating the new state in Process1

The code mentions: return Process1.State(price=state.price + up_move * 2 - 1)
Where as its a logistic function of (L-Xt).
Why is up_move multiplied by 2 and subtracted by 1?

chapter numbering in pdf and rl folder

Hi,

When you read the book pdf, say I am reading Chapter 3. "3. Dynamic Programming Algorithms", the python code ("clearance_pricing_mdp.py") for the chapter has been located in chapter4 of the rl folder. I do not think that is a big issue (you read Chapter 3 and codes are in chapter4 folder), but if you can come up with a quick-fix, it can lead to further consistency of book pdf with codes.

One suggestion maybe is to number the chapters in rl folder from 0, instead of 1 (like book folder), then I feel the pdf book will match with the codes in rl folder. But I am afraid that may break some internal structure you already have :)

typo LSPI

In the first paragraph describing LSPI, it should say \bm{\phi}(s,a)^T \cdot \bm{w} instead of \bm{\phi}(s)^T \cdot \bm{w}.

Code example not properly indented p.55

The converge function on page 55 is not properly indented.
Instead of

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in itertools.pairwise(values):
    yield a
    if abs(a - b) < threshold:
        break

(I assume) it should be:

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in pairwise(values):
        yield a
        if abs(a - b) < threshold:
            break

expected_value(Coin(), 100)

typo pg. 200

In the paragraph before A Simple Financial Example, tradeoff is spelled as treadeoff.

Inconsistent code snippet imports

pg. 119 has the following code snippet

from typing import Iterator
X = TypeVar('X')

def iterate(step: Callable[[X], X], start: X) -> Iterator[X]:
    ...

It seems odd to explicitly import Iterator from typing, but not Callable or TypeVar.

State-Reward Sequence

At the end of the first paragraph, page 9 of chapter2.pdf reads

The sequence S0, R1, S1, R1, S2, . . . terminates at...

I'm guessing it's supposed to read

The sequence S0, *R0* S1, R1, S2, . . . terminates at...

tikhonjelvis / rl-book Goto Github PK

rl-book's Issues

Recommend Projects

Recommend Topics

Recommend Org