Policy initialization,about juliaml/reinforce.jl

Comments (6)

tbreloff commented on May 27, 2024

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros [email protected] wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to choose
a sequence in which it is certainly never correct to take the first pick
right at the grid edges, with a getter+setter we can both view the previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru
.

from reinforce.jl.

jhlq commented on May 27, 2024

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they can
easily plug their favorite library for supervised learning into the setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" [email protected] wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros [email protected]
wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru
.

from reinforce.jl.

tbreloff commented on May 27, 2024

I think that, without sample code, I'll have a hard time understanding what
a "getter/setter" is. Do you mean a lookup table for states and actions? If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros [email protected] wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they can
easily plug their favorite library for supervised learning into the setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru
.

from reinforce.jl.

jhlq commented on May 27, 2024

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" [email protected] wrote:

I think that, without sample code, I'll have a hard time understanding what
a "getter/setter" is. Do you mean a lookup table for states and actions? If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros [email protected]
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

Require a way to conveniently manually provide initial knowledge for
a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru
.

from reinforce.jl.

tbreloff commented on May 27, 2024

So that's not really reinforcement learning. You should check out our
effort in JuliaML if you're more interested in more general machine
learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros [email protected] wrote:

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

I think that, without sample code, I'll have a hard time understanding
what
a "getter/setter" is. Do you mean a lookup table for states and actions?
If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to
save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');
<javascript:_e(%7B%7D,'cvml','[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');');>> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies?
Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');
<javascript:_e(%7B%7D,'cvml','[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');');>>
wrote:

Require a way to conveniently manually provide initial knowledge
for
a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236443872>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru
.

from reinforce.jl.

jhlq commented on May 27, 2024

Yes, as mentioned this is supervised and people would be able to plug their
favorite ML library.

Connecting the two is the goal, schools don't let students work entirely on
their own and neither do teachers lead them through every single problem. A
mix allows the AI to explore on its own with intermittent interventions
from more knowledgeable intelligences.

Reinforcement learning is key for robust AI and like mixing a metal with
trace elements can create strong alloys so will adding specks of
supervision significantly hasten progress.
On 31 Jul 2016 19:44, "Tom Breloff" [email protected] wrote:

So that's not really reinforcement learning. You should check out our
effort in JuliaML if you're more interested in more general machine
learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros [email protected]
wrote:

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

I think that, without sample code, I'll have a hard time understanding
what
a "getter/setter" is. Do you mean a lookup table for states and
actions?
If
so, my interest lies much more in RL through function approximation,
so I
don't have much need for table lookup apis (though it could certainly
be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to
save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples
they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');
<javascript:_e(%7B%7D,'cvml','[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');');>> wrote:

I would say that I haven't settled on a policy api yet... I've
been a
little more focused on the environments. If you have time, could
you
write
out a little example code of how you see initializing policies?
Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <
[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');
<javascript:_e(%7B%7D,'cvml','[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');');>>
wrote:

Require a way to conveniently manually provide initial knowledge
for
a
policy.

For example, say we have a hexagonal grid of which we are tasked
to
choose
a sequence in which it is certainly never correct to take the
first
pick
right at the grid edges, with a getter+setter we can both view
the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236444186>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru

.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2sKlC8AbAx9Ofsi0n1By2cVSsgw0ks5qbN7-gaJpZM4JY9Ru
.

from reinforce.jl.

Policy initialization about reinforce.jl HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent