juliaml / reinforce.jl Goto Github PK
View Code? Open in Web Editor NEWAbstractions, algorithms, and utilities for reinforcement learning in Julia
License: Other
Abstractions, algorithms, and utilities for reinforcement learning in Julia
License: Other
UUID is here: https://github.com/JuliaRegistries/General/blob/master/R/Reinforce/Package.toml#L2
uuid = "0376cc21-f8a9-5fcf-8891-fde1415a4fd3"
It seems like the finished(env, s)
function is not exported by Reinforce
. Is that intended?
I tried to Pkg.add but I get errors on Juliabox.com
Updating registry at `/home/jrun/.julia/registries/JuliaPro`
Updating git-repo `https://pkg.juliacomputing.com/registry/JuliaPro`
[1mFetching: [========================================>] 100.0 %.0 %
The following package names could not be resolved:
* Reinforce (not found in project, manifest or registry)
Please specify by known `name=uuid`.
Stacktrace:
[1] pkgerror(::String) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/Types.jl:121
[2] #ensure_resolved#43(::Bool, ::Function, ::Pkg.Types.EnvCache, ::Array{Pkg.Types.PackageSpec,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/Types.jl:895
[3] #ensure_resolved at ./none:0 [inlined]
[4] #add_or_develop#13(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:58
[5] #add_or_develop at ./none:0 [inlined]
[6] #add_or_develop#12 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:29 [inlined]
[7] #add_or_develop at ./none:0 [inlined]
[8] #add_or_develop#11(::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:mode,),Tuple{Symbol}}}, ::Function, ::Array{String,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:28
[9] #add_or_develop at ./none:0 [inlined]
[10] #add_or_develop#10 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:27 [inlined]
[11] #add_or_develop at ./none:0 [inlined]
[12] #add#18 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:68 [inlined]
[13] add(::String) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:68
[14] top-level scope at In[1]:2
Hi
I have an issue with the plots, that are not drawn. I tried within a Jupyter notebook and a regular Julia console.
Taking the MountainCar example: If I run the line gui(plot(env)) after the episode! execution, the plot is displayed just fine, but not during the execution.
Do you know what could be the problem here?
Require a way to conveniently manually provide initial knowledge for a policy.
For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.
Is such functionality in line with the intended directions?
Hey team,
Am really interested in this project since I feel Julia is a superior language for my use cases vs python. However, am wondering if you've also seen performance gains (memory and latency) vs those libraries on both CPU and GPU specifically for DQN.
Thanks,
Mark
Not sure that the function state(env) is defined in line 50 & 70 of the CartPoleEnv module.
It should probably replaced by env.state ! Am I wrong ?
Reinforce.jl/src/envs/cartpole.jl
When using epsilon-greedy methods to take action, neural network predicts which action to take based on input state. Recently, there have been developments wherein instead of one state (ie the current state), neural network takes the difference between current state and the state one timestep before (s_t - s_t-1). Or it may accept a set of states as input and predict an action.
I am guessing if such an action
function has to be implemented, we need to modify the call to action
. I am interested in developing this functionality. Can anyone point me in the right direction?
Hello guys,
I am experiencing the following issue when trying to run a gym. First state of an action is of type: Array{Int64,0}
and the rest are Int64. Because of this I need to implement a workaround to keep the type consistent.
Simple example:
using OpenAIGym
import Reinforce.action
env = GymEnv("Taxi-v2")
struct NewRandomPolicy <: AbstractPolicy end
function action(policy::NewRandomPolicy, r, s, A′)
println("Current state: $s, Type: $(typeof(s))")
rand(A′)
end
reset!(env)
ep = Episode(env, NewRandomPolicy())
println(state(ep.env))
i = 0
for (s, a, r, sp) in ep
i+=1
if i > 3 break end
end
I get the following output:
72
Current state: 183, Type: Array{Int64,0}
Current state: 83, Type: Int64
Current state: 63, Type: Int64
Current state: 83, Type: Int64
I was also wondering why state(ep.env)
is showing a different state form the one I get in my action (72 vs 183).
Thanks!
After a chat with @tbreloff, we roughly decided on the following core API methods:
actions(::AbstractEnvironment, s::State) -> A(s)
: This would return the set A(s)
that contains all valid actions from the state s
step(::AbstractEnvironment, a::Action) -> s', r
: This method returns the next state s'
and the reward r
associated with taking action a
. Note that the current state s
is a field of the AbstractEnvironment
. This method is also responsible for setting this state field to be s'
action(::AbstractPolicy, s'::State, r::Reward, A(s')::Actions) -> a'
: This method should be implemented by each subtype of AbstractPolicy
and should observe a state transition (result of the calling step
method with the previous action a
) and output the next action a'
.@tbreloff also mentioned an episode(::AbstractEnvironment)
method, but I wasn't clear on its purpose so I'll let him fill in the blanks there.
I was playing with the Pendulum-v0
from OpenAIGym.jl
, which has a continuous action space. It turns out that Reinfoce.jl
's assertion for checking the bounds of action chosen by policy (iterators.jl, line 40) does not support the continuous action space of Pendulum-v0
which is of type LearnBase.IntervalSet{Array{Float64,1}}
. EDIT: It works, I didn't pass the data in the required form :(
Hello,
I wish to get started with this package.
Where can I find your documentation?
Thank you!
It is not actively maintained, nor particularly useful since we have the dedicated org JuliaReinforcementLearning
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
With the latest release of Reinforce.jl and a higher julia version than julia 0.6 the Mountain Car Example should work and produce a plot like element
Using the current Julia version will cause the old version of src files (including linspace instead of range) to be installed. Even for installing the git version with Pkg.clone([url]) for most current files the example seems to be broken. Running it from within julia will throw the follwing message:
MethodError: no method matching sin(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}})
Closest candidates are:
sin(!Matched::BigFloat) at mpfr.jl:683
sin(!Matched::Missing) at math.jl:1070
sin(!Matched::Complex{Float16}) at math.jl:1019
...
Stacktrace:
[1] height(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:84
[2] macro expansion at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:101 [inlined]
[3] macro expansion at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:312 [inlined]
[4] macro expansion at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:99 [inlined]
[5] apply_recipe(::Dict{Symbol,Any}, ::MountainCar) at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:275
[6] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/pipeline.jl:83
[7] macro expansion at ./logging.jl:305 [inlined]
[8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:171
[9] #plot#132(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::MountainCar) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:57
[10] plot at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:51 [inlined]
[11] episode!(::MountainCar, ::BasicCarPolicy) at ./In[3]:12
[12] top-level scope at In[3]:16
# Steps to Reproduce the Problem
This is due to the shortcut defined in line 84 of mountain_car.jl
height(xs) = sin(3*xs)*0.45+0.55
since this operation will not work for the vector in line 100:
xs = range(min_position, max_position, length=100)
ys = height(xs)
Anyways, eliminating this error using a vectorized function call and LinRange instead of range like this
xs = LinRange(min_position, max_position, 100)
ys = height.(xs)
will cause compilation to fail with the following message:
MethodError: no method matching +(::Array{Float64,1}, ::Float64)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
+(!Matched::Bool, ::T<:AbstractFloat) where T<:AbstractFloat at bool.jl:112
+(!Matched::Float64, ::Float64) at float.jl:395
...
Stacktrace:
[1] apply_recipe(::Dict{Symbol,Any}, ::MountainCar) at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:314
[2] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/pipeline.jl:83
[3] macro expansion at ./logging.jl:305 [inlined]
[4] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:171
[5] #plot#132(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::MountainCar) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:57
[6] plot at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:51 [inlined]
[7] episode!(::MountainCar, ::BasicCarPolicy) at ./In[2]:12
[8] top-level scope at In[2]:17
Obviously now there is some line where a none elementwise addition of the now array like ys will cause an error. Since I am new to Julia that*s basically all I coudl figure out. A little help (or a fix) would be really much appreciated!
The following setup was used
I have played around with and enjoyed this package and OpenAIGym.jl for a few days now. When trying to make my code run a bit faster, I checked for type instabilities etc. in my code, and I came to the conclusion that a lot of performance was lost due to the for (s,a,r,s') in ep
iteration. The problem seems to be that the compiler doesn't know the types of s,a,r,s'
. I mitigated the issue somewhat by (in my case)
for (s::Vector{Float64}, a::Int, r::Float64, s1::Vector{Float64}) in ep
, after which the compiler can optimize for the types I declared.
Maybe speed is not the goal with this package, but RL methods have a tendency to require some time, so it would be nice if things ran fast. Maybe this is something to keep in mind while continuing the development on Reinforce.jl
ep = Episode(env,policy)
@code_warntype next(ep,1)
Variables:
#self#::Base.#next
ep::Reinforce.Episode
i::Int64
env::Any
s::Any
A::Any
r::Any
a::Any
last_reward::Any
s′::Any
#temp#@_11::Int64
_::Int64
#temp#@_13::Any
.
.
.
I know that Reinforce.jl is not trying to emulate OpenAI gym exactly, but I'm curious behind the reasoning to a couple interface decisions that seem inconsistent with gym's.
First, why doesn't reset!(env)
return a state or observation for convenience? From personal experience, when I was using OpenAIGym.jl, reset!(env)
was always returning false
. This was happening because julia returns the variable on the last line of the function by default, which happened to come from env.done=false
. I had to look through the source code to figure out what was happening. Returning a state/observation would be consistent with gym, and would avoid any confusion for new users.
Second, why does step!(env, s, a)
return r, s'
instead of s',r
? This is a minor difference in ordering, but once again, I had an expectation for what step!
should return from gym.
I want to design a new environment with two continuous actions. While implementing the function actions(env, s) I run into an iteration error, shouldn't I be able to set a vector here?
actions(env::NewEnvironment, s) = IntervalSet([-min_action_a, -min_action_b], [max_action_a, max_action_b])
My error states:
Error showing value of type LearnBase.IntervalSet{Array{Float64,1}}:
ERROR: MethodError: no method matching iterate(::LearnBase.IntervalSet{Array{Float64,1}})
Closest candidates are:
iterate(::ExponentialBackOff) at error.jl:252
iterate(::ExponentialBackOff, ::Any) at error.jl:252
iterate(::Base.AsyncGenerator, ::Base.AsyncGeneratorState) at asyncmap.jl:382
...
Stacktrace:
[1] isempty(::LearnBase.IntervalSet{Array{Float64,1}}) at .\essentials.jl:737
[2] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::LearnBase.IntervalSet{Array{Float64,1}}) at .\show.jl:149
[3] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:214
[4] display(::REPL.REPLDisplay, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:218
[5] display(::Any) at .\multimedia.jl:328
[6] #invokelatest#1 at .\essentials.jl:710 [inlined]
[7] invokelatest at .\essentials.jl:709 [inlined]
[8] print_response(::IO, ::Any, ::Bool, ::Bool, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:238
[9] print_response(::REPL.AbstractREPL, ::Any, ::Bool, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:223
[10] (::REPL.var"#do_respond#54"{Bool,Bool,Atom.var"#246#247",REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:822
[11] #invokelatest#1 at .\essentials.jl:710 [inlined]
[12] invokelatest at .\essentials.jl:709 [inlined]
[13] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\LineEdit.jl:2355
[14] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:1144
[15] (::REPL.var"#38#42"{REPL.LineEditREPL,REPL.REPLBackendRef})() at .\task.jl:356
It would be nice to access something like
ACTION_BOUND_HI = actions(env, env.state).hi
ACTION_BOUND_LO = actions(env, env.state).lo
and get Arrays of the upper and lower bounds for both actions.
Pendulum.jl uses Plots in line 83 without importing it which throws out an error on plotting
annotations := [(0, -0.2, Plots.text("a: $(env.a)", :top))]
The error it throws out:
ERROR: UndefVarError: Plots not defined
Hi guys, excited to see a reinforcement learning interface for Julia! I have a question about the interface. Is the state part of the environment?
If the state is part of the environment, then what is the reason for having it as an additional argument to step!
? If the state is not part of the environment, then why would the environment be mutated in a call to step!
?
The only RL package that I am really familiar with is OpenAI Gym, where the state is part of the environment. Perhaps it would make sense to follow their example since people are familiar with it, it has been successful, and it would allow simple interaction between environments and solvers written in julia and python.
Waiting on JuliaLang/METADATA.jl#8108 so that all deps are registered.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.