Giter Site home page Giter Site logo

keon / deep-q-learning Goto Github PK

View Code? Open in Web Editor NEW
1.3K 63.0 451.0 1.81 MB

Minimal Deep Q Learning (DQN & DDQN) implementations in Keras

Home Page: https://keon.io/deep-q-learning

License: MIT License

Python 100.00%
deep-reinforcement-learning deep-q-network dqn reinforcement-learning deep-learning ddqn

deep-q-learning's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-q-learning's Issues

IndexError

I add 2 convolutional layer and train this on miniworld (another gym environment),but i keep getting this:
`IndexError Traceback (most recent call last)
in
39
40 if len(agent.memory) > batch_size:
---> 41 agent.replay(batch_size)
42
43 if e % 10 == 0:

in replay(self, batch_size)
59
60
---> 61 target_f[0][action] = target
62 self.model.fit(state, target_f, epochs=1, verbose=0)

IndexError: index 17447 is out of bounds for axis 0 with size 60

`
I don't know why I got the index 17447...

should update the weight every time step ?

should update the weight every time step ? (I think it is better to update the weight every for instance 10 steps in time step T/10==0 then saveweight) but in code it is updated for every 10 steps of episodes?

ValueError: cannot reshape array of size 2 into shape (1,4)

I get this numpy error while running the script - dqn.py

2022-10-06 23:47:28.547558: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2022-10-06 23:47:28.547772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. /home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/keras/optimizers/optimizer_v2/adam.py:114: UserWarning: The lrargument is deprecated, uselearning_rateinstead. super().__init__(name, **kwargs) /home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/_asarray.py:102: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return array(a, dtype, copy=False, order=order) Traceback (most recent call last): File "ddqn.py", line 100, in <module> state = np.reshape(state, [1, state_size]) File "<__array_function__ internals>", line 5, in reshape File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 299, in reshape return _wrapfunc(a, 'reshape', newshape, order=order) File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 55, in _wrapfunc return _wrapit(obj, method, *args, **kwds) File "/home/akshayparanjape/PhD/deep-q-learning/venv_dqn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 44, in _wrapit result = getattr(asarray(obj), method)(*args, **kwds) ValueError: cannot reshape array of size 2 into shape (1,4)
Has anybody encountered the same issue?

k frame

@keon Thanks for your applicable code just one question how we can add K frame to this as said in section 4.1 last sentences of first paragraph of Mnih et al. Nature 2015
4.1 Preprocessing and Model Architecture
Working directly with raw Atari frames, which are 210 � 160 pixel images with a 128 color palette,
can be computationally demanding, so we apply a basic preprocessing step aimed at reducing the
input dimensionality. The raw frames are preprocessed by first converting their RGB representation
to gray-scale and down-sampling it to a 110 �84 image. The final input representation is obtained by
cropping an 84 � 84 region of the image that roughly captures the playing area. The final cropping
stage is only required because we use the GPU implementation of 2D convolutions from [11], which
expects square inputs. For the experiments in this paper, the function � from algorithm 1 applies this
preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q-function.

Question: Is this some form of reward engineering?

This would break in environments that return the state as more/less than 4 values for unpacking.

  1. If not essential can we just remove this?
  2. If it's essential, would someone explain why and/or reference the paper for this?
    This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8  
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5  
reward = r1 + r2

memory for state

thanks Keon for your great code!
I have two questions:
1- What does [0] means in self.model.predict(next_state)[0] and return np.argmin(act_values[0])? Does this mean that first element of batch?
2-If in addition to batch, I need that my state is the state from K times before, what is the necessary change in order to do this? I want to send the state=state[i-k+1]....state[i-1],state[i] not only one state! How I can do this?

Thanks again

Making new predictions

This is extremely helpful code, thanks for sharing! I have a bit of a hypothetical question. Let's say that after training the agent using your code I want to be able to predict the q-values for moving to the right or left given a new combination of inputs. (i.e. do some type of model.predict(new_input), or test the code on new data). Where in the code would this go? Could you do model.predict(new_input) at the end of your main function outside of the for loop?

I ask because I wonder where the model parameters are being saved and if this affects where you call model.predict(new_input) for new data. Let me know if anything is unclear!

missing the initialization of target action value and refreshing the Qhat

I have several questions:
1- When I compared with algorithm presented in"Human-level control through deep reinforcement learning", I can not find the third initialization (initial target action value)? Also, I do not find the last step "every C step Qhat=Q"? Would you please explain where are them or what is the difference to reach them? These steps seems essential!
2- I have my own environment, If I want to have a state=[a,b,c] as input instead of just one input for DQN showing the state what I should do?

Minor issue with globally scoped variable `env`

I found a minor issue on line 42.

Currently:

    return env.action_space.sample()

Should be:

    return self.env.action_space.sample()

p.s. It's better practice to not put a bunch of stuff in the global namespace (e.g., under if __name__ == '__main__':). It's safer to use an actual main() method.

a hidden bug in your code

`for e in range(EPISODES):

    state = env.reset()

    state = np.reshape(state, [1, state_size])

    for time in range(500):

        # env.render()

        action = agent.act(state)

        next_state, reward, done, _ = env.step(action)

        reward = reward if not done else -10

        next_state = np.reshape(next_state, [1, state_size])

        agent.remember(state, action, reward, next_state, done)

        state = next_state

        if done:

            print("episode: {}/{}, score: {}, e: {:.2}"

                  .format(e, EPISODES, time, agent.epsilon))

            break

    if len(agent.memory) > batch_size:

        agent.replay(batch_size)`

Hi, I find a bug in your code.

The agent.replay(batch_size) should be in the inner loop, means train_on_batch each time step, not each episode.

Your version can pass the cart-pole, but not the lunar-lander (also from openai gym)

The formal algorithm is followed.

algorithm

The image from Human-level control through deep reinforcement learning FYI

Go jackets!

Saving/reloading weight does not seem to work

Hi,

I uncommented lines 69, 90 and 91 (in the dqn.py) but it seems that the weights are not reloaded: the score restart at a very low value. The file ddqn.py seems to have the same issue.

Kind regards,
Sylvain.

Not learning

Hi, is it just me or the algorihtm is not learning? I collected all the rewards for the episodes and they converge to 10

ddqn_batch

Hi. I tried to change ddqn code to update in batch like dqn_batch but this change cause no any learning. i don't have any idea why? it is a simple change and i even set the batch size to 1 so it should behave exactly like no bathing.

Speeding the replay

First, thank you for this wonderful code.

In the replay function, there is one model.fit(state, target_f) per sample in the minibach (i.e. if there are 32 samples, then there are 32 fit ).

I think all samples of the minibatch could be used in a single update with one single train_on_batch(states, targets_f), which would speed up the processing time.

Would it make sense to restrict the action to what's possible?

If the cartpole is already all the way at the right, we can't really select that action. So would it make sense to disallow that from either the random case (by sampling again) or the network case (by choosing the next highest Q value that the network predicts)?

Possible incrrection in DQN & DDQN file

The DQN algorithm from NATURE leverages a target network to update the target Q value for training.

So I think the code in ddqn.py should be code for the DQN algorithm.

Plot Image

First of all, thank you very much for your work. It was really helpful for me to understand RL. I would like to ask you the way you got the image display of the game. I didn't find it in the code. Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.