juliaml / reinforce.jl Goto Github PK

View Code? Open in Web Editor NEW

201.0 14.0 36.0 168 KB

Abstractions, algorithms, and utilities for reinforcement learning in Julia

License: Other

Julia 100.00%

julialang reinforcement-learning

reinforce.jl's People

Contributors

Stargazers

Watchers

reinforce.jl's Issues

Add Project.toml

UUID is here: https://github.com/JuliaRegistries/General/blob/master/R/Reinforce/Package.toml#L2

uuid = "0376cc21-f8a9-5fcf-8891-fde1415a4fd3"

Finished not exported

It seems like the finished(env, s) function is not exported by Reinforce. Is that intended?

Doesn't seem to work in Julia v1.0

I tried to Pkg.add but I get errors on Juliabox.com

  Updating registry at `/home/jrun/.julia/registries/JuliaPro`
  Updating git-repo `https://pkg.juliacomputing.com/registry/JuliaPro`
[1mFetching: [========================================>]  100.0 %.0 %
The following package names could not be resolved:
 * Reinforce (not found in project, manifest or registry)
Please specify by known `name=uuid`.

Stacktrace:
 [1] pkgerror(::String) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/Types.jl:121
 [2] #ensure_resolved#43(::Bool, ::Function, ::Pkg.Types.EnvCache, ::Array{Pkg.Types.PackageSpec,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/Types.jl:895
 [3] #ensure_resolved at ./none:0 [inlined]
 [4] #add_or_develop#13(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:58
 [5] #add_or_develop at ./none:0 [inlined]
 [6] #add_or_develop#12 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:29 [inlined]
 [7] #add_or_develop at ./none:0 [inlined]
 [8] #add_or_develop#11(::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:mode,),Tuple{Symbol}}}, ::Function, ::Array{String,1}) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:28
 [9] #add_or_develop at ./none:0 [inlined]
 [10] #add_or_develop#10 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:27 [inlined]
 [11] #add_or_develop at ./none:0 [inlined]
 [12] #add#18 at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:68 [inlined]
 [13] add(::String) at /opt/julia-1.0.0/share/julia/stdlib/v1.0/Pkg/src/API.jl:68
 [14] top-level scope at In[1]:2

plots not drawn

I have an issue with the plots, that are not drawn. I tried within a Jupyter notebook and a regular Julia console.

Taking the MountainCar example: If I run the line gui(plot(env)) after the episode! execution, the plot is displayed just fine, but not during the execution.

Do you know what could be the problem here?

Policy initialization

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

Benchmarks vs tensorflow and pytorch

Hey team,

Am really interested in this project since I feel Julia is a superior language for my use cases vs python. However, am wondering if you've also seen performance gains (memory and latency) vs those libraries on both CPU and GPU specifically for DQN.
Thanks,
Mark

undefined function in CartPole environment

Not sure that the function state(env) is defined in line 50 & 70 of the CartPoleEnv module.
It should probably replaced by env.state ! Am I wrong ?

Reinforce.jl/src/envs/cartpole.jl

Taking action based on set of states

When using epsilon-greedy methods to take action, neural network predicts which action to take based on input state. Recently, there have been developments wherein instead of one state (ie the current state), neural network takes the difference between current state and the state one timestep before (s_t - s_t-1). Or it may accept a set of states as input and predict an action.

I am guessing if such an action function has to be implemented, we need to modify the call to action. I am interested in developing this functionality. Can anyone point me in the right direction?

Inconsistency in State Type when developing custom Actions

Hello guys,

I am experiencing the following issue when trying to run a gym. First state of an action is of type: Array{Int64,0} and the rest are Int64. Because of this I need to implement a workaround to keep the type consistent.

Simple example:

using OpenAIGym
import Reinforce.action

env = GymEnv("Taxi-v2")

struct NewRandomPolicy <: AbstractPolicy end

function action(policy::NewRandomPolicy, r, s, A′)
    println("Current state: $s, Type: $(typeof(s))")
    rand(A′)
end

reset!(env)
ep = Episode(env, NewRandomPolicy())

println(state(ep.env))

i = 0
for (s, a, r, sp) in ep
    i+=1
    if i > 3 break end
end

I get the following output:

72
Current state: 183, Type: Array{Int64,0}
Current state: 83, Type: Int64
Current state: 63, Type: Int64
Current state: 83, Type: Int64

I was also wondering why state(ep.env) is showing a different state form the one I get in my action (72 vs 183).

Thanks!

API proposal

After a chat with @tbreloff, we roughly decided on the following core API methods:

actions(::AbstractEnvironment, s::State) -> A(s): This would return the set A(s) that contains all valid actions from the state s
step(::AbstractEnvironment, a::Action) -> s', r: This method returns the next state s' and the reward r associated with taking action a. Note that the current state s is a field of the AbstractEnvironment. This method is also responsible for setting this state field to be s'
action(::AbstractPolicy, s'::State, r::Reward, A(s')::Actions) -> a': This method should be implemented by each subtype of AbstractPolicy and should observe a state transition (result of the calling step method with the previous action a) and output the next action a'.

@tbreloff also mentioned an episode(::AbstractEnvironment) method, but I wasn't clear on its purpose so I'll let him fill in the blanks there.

example in the exampel folder does not work on Julia v1.0

Support for continuous action space

I was playing with the Pendulum-v0 from OpenAIGym.jl, which has a continuous action space. It turns out that Reinfoce.jl's assertion for checking the bounds of action chosen by policy (iterators.jl, line 40) does not support the continuous action space of Pendulum-v0 which is of type LearnBase.IntervalSet{Array{Float64,1}}. EDIT: It works, I didn't pass the data in the required form :(

Documentation

Hello,

I wish to get started with this package.
Where can I find your documentation?

Thank you!

archive the package?

It is not actively maintained, nor particularly useful since we have the dedicated org JuliaReinforcementLearning

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Mountain Car Example is not working

Expectation

With the latest release of Reinforce.jl and a higher julia version than julia 0.6 the Mountain Car Example should work and produce a plot like element

Actual Behaviour

Using the current Julia version will cause the old version of src files (including linspace instead of range) to be installed. Even for installing the git version with Pkg.clone([url]) for most current files the example seems to be broken. Running it from within julia will throw the follwing message:

MethodError: no method matching sin(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}})
Closest candidates are:
  sin(!Matched::BigFloat) at mpfr.jl:683
  sin(!Matched::Missing) at math.jl:1070
  sin(!Matched::Complex{Float16}) at math.jl:1019
  ...

Stacktrace:
 [1] height(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:84
 [2] macro expansion at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:101 [inlined]
 [3] macro expansion at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:312 [inlined]
 [4] macro expansion at /home/mbauer/.julia/dev/Reinforce/src/envs/mountain_car.jl:99 [inlined]
 [5] apply_recipe(::Dict{Symbol,Any}, ::MountainCar) at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:275
 [6] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/pipeline.jl:83
 [7] macro expansion at ./logging.jl:305 [inlined]
 [8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:171
 [9] #plot#132(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::MountainCar) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:57
 [10] plot at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:51 [inlined]
 [11] episode!(::MountainCar, ::BasicCarPolicy) at ./In[3]:12
 [12] top-level scope at In[3]:16
# Steps to Reproduce the Problem

This is due to the shortcut defined in line 84 of mountain_car.jl

height(xs) = sin(3*xs)*0.45+0.55

since this operation will not work for the vector in line 100:

xs = range(min_position, max_position, length=100)
ys = height(xs)

Anyways, eliminating this error using a vectorized function call and LinRange instead of range like this

xs = LinRange(min_position, max_position, 100)
ys = height.(xs)

will cause compilation to fail with the following message:

MethodError: no method matching +(::Array{Float64,1}, ::Float64)
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
  +(!Matched::Bool, ::T<:AbstractFloat) where T<:AbstractFloat at bool.jl:112
  +(!Matched::Float64, ::Float64) at float.jl:395
  ...

Stacktrace:
 [1] apply_recipe(::Dict{Symbol,Any}, ::MountainCar) at /home/mbauer/.julia/packages/RecipesBase/Uz5AO/src/RecipesBase.jl:314
 [2] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/pipeline.jl:83
 [3] macro expansion at ./logging.jl:305 [inlined]
 [4] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{MountainCar}) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:171
 [5] #plot#132(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::MountainCar) at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:57
 [6] plot at /home/mbauer/.julia/packages/Plots/rmogG/src/plot.jl:51 [inlined]
 [7] episode!(::MountainCar, ::BasicCarPolicy) at ./In[2]:12
 [8] top-level scope at In[2]:17

Obviously now there is some line where a none elementwise addition of the now array like ys will cause an error. Since I am new to Julia that*s basically all I coudl figure out. A little help (or a fix) would be really much appreciated!

Steps to reproduce

install Reinforce (if neccessary using the git version in order to get the llinspace fix)
run the example (from within jupyter)

Specifications

The following setup was used

Reinforce Version: 0.1.0
OS: Arch Linux 64Bit
Julia 1.0.2
Platform: Jupyter 4.4.0 / IJulia

env: add CatePole-v1

Ref: FluxML/model-zoo#23 (comment)

type annotations

I have played around with and enjoyed this package and OpenAIGym.jl for a few days now. When trying to make my code run a bit faster, I checked for type instabilities etc. in my code, and I came to the conclusion that a lot of performance was lost due to the for (s,a,r,s') in ep iteration. The problem seems to be that the compiler doesn't know the types of s,a,r,s'. I mitigated the issue somewhat by (in my case)
for (s::Vector{Float64}, a::Int, r::Float64, s1::Vector{Float64}) in ep, after which the compiler can optimize for the types I declared.

Maybe speed is not the goal with this package, but RL methods have a tendency to require some time, so it would be nice if things ran fast. Maybe this is something to keep in mind while continuing the development on Reinforce.jl

ep = Episode(env,policy)
@code_warntype next(ep,1)

Variables:
  #self#::Base.#next
  ep::Reinforce.Episode
  i::Int64
  env::Any
  s::Any
  A::Any
  r::Any
  a::Any
  last_reward::Any
  s′::Any
  #temp#@_11::Int64
  _::Int64
  #temp#@_13::Any
.
.
.

Interface Consistencies

I know that Reinforce.jl is not trying to emulate OpenAI gym exactly, but I'm curious behind the reasoning to a couple interface decisions that seem inconsistent with gym's.

First, why doesn't reset!(env) return a state or observation for convenience? From personal experience, when I was using OpenAIGym.jl, reset!(env) was always returning false. This was happening because julia returns the variable on the last line of the function by default, which happened to come from env.done=false. I had to look through the source code to figure out what was happening. Returning a state/observation would be consistent with gym, and would avoid any confusion for new users.

Second, why does step!(env, s, a) return r, s' instead of s',r? This is a minor difference in ordering, but once again, I had an expectation for what step! should return from gym.

Vector of Intervals error message for actions(env, s)

I want to design a new environment with two continuous actions. While implementing the function actions(env, s) I run into an iteration error, shouldn't I be able to set a vector here?

actions(env::NewEnvironment, s) = IntervalSet([-min_action_a, -min_action_b], [max_action_a, max_action_b])

My error states:

Error showing value of type LearnBase.IntervalSet{Array{Float64,1}}:
ERROR: MethodError: no method matching iterate(::LearnBase.IntervalSet{Array{Float64,1}})
Closest candidates are:
  iterate(::ExponentialBackOff) at error.jl:252
  iterate(::ExponentialBackOff, ::Any) at error.jl:252
  iterate(::Base.AsyncGenerator, ::Base.AsyncGeneratorState) at asyncmap.jl:382
  ...
Stacktrace:
 [1] isempty(::LearnBase.IntervalSet{Array{Float64,1}}) at .\essentials.jl:737
 [2] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::LearnBase.IntervalSet{Array{Float64,1}}) at .\show.jl:149
 [3] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:214
 [4] display(::REPL.REPLDisplay, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:218
 [5] display(::Any) at .\multimedia.jl:328
 [6] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [7] invokelatest at .\essentials.jl:709 [inlined]
 [8] print_response(::IO, ::Any, ::Bool, ::Bool, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:238
 [9] print_response(::REPL.AbstractREPL, ::Any, ::Bool, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:223
 [10] (::REPL.var"#do_respond#54"{Bool,Bool,Atom.var"#246#247",REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:822
 [11] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [12] invokelatest at .\essentials.jl:709 [inlined]
 [13] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\LineEdit.jl:2355
 [14] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:1144
 [15] (::REPL.var"#38#42"{REPL.LineEditREPL,REPL.REPLBackendRef})() at .\task.jl:356

It would be nice to access something like

ACTION_BOUND_HI = actions(env, env.state).hi
ACTION_BOUND_LO = actions(env, env.state).lo

and get Arrays of the upper and lower bounds for both actions.

Import Plots in Pendulum environment

Pendulum.jl uses Plots in line 83 without importing it which throws out an error on plotting
annotations := [(0, -0.2, Plots.text("a: $(env.a)", :top))]

The error it throws out:
ERROR: UndefVarError: Plots not defined

Is the state part of the environment?

Hi guys, excited to see a reinforcement learning interface for Julia! I have a question about the interface. Is the state part of the environment?

If the state is part of the environment, then what is the reason for having it as an additional argument to step!? If the state is not part of the environment, then why would the environment be mutated in a call to step!?

The only RL package that I am really familiar with is OpenAI Gym, where the state is part of the environment. Perhaps it would make sense to follow their example since people are familiar with it, it has been successful, and it would allow simple interaction between environments and solvers written in julia and python.

Register in METADATA

Waiting on JuliaLang/METADATA.jl#8108 so that all deps are registered.