rl-book's Issues
requirements.txt issues on Windows
I had a few issues with requirements.txt dependencies. I ran into installation errors with:
pandas == 1.0.3 --> switched it to 1.3.0 and it worked fine.
scipy == 1.4.1 --> switched to 1.7.0 and it worked.
requests 2.24.0 then conflicted with urllib3 == 1.26.5, as it depended on earlier versions. Switched to urllib 1.21.1 to resolve. Ran into some more build errors down the line.
For context, I'm on running in an anaconda environment on a Windows OS which could be the problem. The requirements may run correctly on a Linux VM.
requests package conflict
ERROR: Cannot install -r requirements.txt (line 66) and urllib3==1.26.5 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested urllib3==1.26.5
requests 2.24.0 depends on urllib3!=1.25.0, !=1.25.1, <1.26 and >=1.21.1
To fix this you could try to:
- loosen the range of package versions you've specified
- remove package versions to allow pip attempt to solve the dependency conflict
Code example missing statement p 50
The 2 code examples for the iterative square root algorithm cause an 'UnboundLocalError' and are missing an assignment; even with local vars initialized, the loop will never terminate as x is never updated.
I agree the use of math.inf is messy but prevents an extra check in the while loop like while x is None or abs()>.01
These are on line approx 644 of chapter1.md (p50 of the un-numbered chapter of the pdf book), and then again 3 paragraphs later.
import math
def sqrt(a: float) -> float:
x = math.inf #Need to avoid UboundLocalError, x is previous value, x_n is current value
x_n = a / 2 # initial guess
while abs(x_n - x) > 0.01:
x = x_n #<== missing
x_n = (x + (a / x)) / 2
return x_n
typo pg. 125
First sentence of the Policy Improvement section should read "Terms such as 'better' ..."
nix installation failed on old (2018) MacOS with Sonoma 14.5
On my older 2018 intel mac, installation failed using the --darwin-use-unencrypted-nix-store-volume
flag. excluding this flag as on the nix docs works fine.
I will make a docs PR.
typo
"its" is written as "it's" multiple times throughout the book, e.g. "distribution of itβs Markov Process" on pg. 194
Issue with feature functions in chapter8/optimal_exercise_bi.py
Dear professors,
I believe there is an issue with the feature functions used in the chapter8/optimal_exercise_bi.py, in the lines 200-202:
ffs += [(lambda s: np.log(1 + np.exp(-s.state / (2 * strike))) *
lagval(s.state / strike, ident[i]))
for i in range(num_laguerre)]
It should create different functions, each one being a different Laguerre polynomial multiplied by a softmax function.
However, it seems that it is creating the same function multiple times.
Searching for this issue on the internet, I found a nice post in Stackoverflow (https://stackoverflow.com/questions/6076270/lambda-function-in-list-comprehensions), where it explains the problem (a sort of edge case in Python, where the loop variable, the i
in the code above, is captured by reference instead of by value).
It seems one possible solution would be to explicitly capture the loop variable inside the function as an extra argument, like below:
ffs += [(lambda s, coeff=i: np.log(1 + np.exp(-s.state / (2 * strike))) *
lagval(s.state / strike, ident[coeff]))
for i in range(num_laguerre)]
It might be that this same issue happens in other places in the codebase. The solution would be similar.
Would you please let me know what you think about this?
Thanks for the code examples and the book.
Mistake in summation for Stationary Distribution
Book feature Request -- keep indentation when copying code from pdf
It would be really helpful if indents were kept when copying code snippets from the pdf version of the book. (I don't know how to do it though).
This could enable readers to experiment with code snippets faster and encourage more such behavior.
rl.distribution, etc.
I keep seeing the command:
import rl.distribution
But I can't find which package this command is from. Is it from Keras-rl or gym?
typing or rl.distribution?
Thanks for the book - I am enjoying reading the book and its "modular" python codes.
I assume that in Chapter "Markov Processes" (sec:mrp-chapter), in the python code above the "Simple Inventory Example" (page 72 of book pdf), the
from typing import FiniteDistribution, Categorical
needs to be changed to
from rl.distribution import FiniteDistribution, Categorical
Check formula for calculating the new state in Process1
The code mentions: return Process1.State(price=state.price + up_move * 2 - 1)
Where as its a logistic function of (L-Xt).
Why is up_move multiplied by 2 and subtracted by 1?
chapter numbering in pdf and rl folder
Hi,
When you read the book pdf, say I am reading Chapter 3. "3. Dynamic Programming Algorithms", the python code ("clearance_pricing_mdp.py") for the chapter has been located in chapter4 of the rl folder. I do not think that is a big issue (you read Chapter 3 and codes are in chapter4 folder), but if you can come up with a quick-fix, it can lead to further consistency of book pdf with codes.
One suggestion maybe is to number the chapters in rl folder from 0, instead of 1 (like book folder), then I feel the pdf book will match with the codes in rl folder. But I am afraid that may break some internal structure you already have :)
typo LSPI
In the first paragraph describing LSPI, it should say \bm{\phi}(s,a)^T \cdot \bm{w}
instead of \bm{\phi}(s)^T \cdot \bm{w}
.
Code example not properly indented p.55
The converge function on page 55 is not properly indented.
Instead of
def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
for a, b in itertools.pairwise(values):
yield a
if abs(a - b) < threshold:
break
(I assume) it should be:
def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
for a, b in pairwise(values):
yield a
if abs(a - b) < threshold:
break
About the code in simple_inventory_mdp_nocap.py in chapter3
I am confusing about the line 40, why is it state.state.inventory_position() instead of state.inventory, and line 45 state.state.on_hand instead of state.on_hand.
Dynamic Programming convergence control
It would be good to have the dynamic programming algorithms take as input a tolerance input (eg: value_iteration_result takes an extra input tolerance: float)
Typo in codes RL-book/rl/chapter2/simple_inventory_mrp.py/
Line 48 should be state.state.on_hand instead of state.on_hand
Coin() returns str, but mean() expects a list of numbers
RL-book/rl/chapter1/probability.py
Line 55 in 56ca64d
typo pg. 200
In the paragraph before A Simple Financial Example, tradeoff
is spelled as treadeoff
.
Inconsistent code snippet imports
pg. 119 has the following code snippet
from typing import Iterator
X = TypeVar('X')
def iterate(step: Callable[[X], X], start: X) -> Iterator[X]:
...
It seems odd to explicitly import Iterator
from typing
, but not Callable
or TypeVar
.
State-Reward Sequence
At the end of the first paragraph, page 9 of chapter2.pdf reads
The sequence S0, R1, S1, R1, S2, . . . terminates at...
I'm guessing it's supposed to read
The sequence S0, *R0* S1, R1, S2, . . . terminates at...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.