Hi Steve, I feel like asking you another newbie question. I am havin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

to clarify what I mean by [i=1, j=26], I do: <div class="snippet-clipboard-conten

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

segmentation fault with fast_cts.pyx about tensorflow-rl HOT 7 OPEN

stevekapturowski commented on June 9, 2024

segmentation fault with fast_cts.pyx

from tensorflow-rl.

Comments (7)

steveKapturowski commented on June 9, 2024

@dhfromkorea hmm, that's interesting. I've run this code on mac and ubuntu without any issue. The fast_cts code is significantly faster; I'll have to double check on the exact speedup, but I I would recommend against trying to use the pure python version. 'resize' is coming from the scikit-image library though; I wonder if there could be a problem with that installation?

from tensorflow-rl.

dhfromkorea commented on June 9, 2024

Hm... okay, so it's probably about compilation of the extension...

resize' is coming from the scikit-image library though
-> it worked fine when using the original cts ... so I doubt it's the cause.

I would recommend against trying to use the pure python version.
-> hm... how do I know if this is the case? I thought I am importing the compiled version of the extension ...

from tensorflow-rl.

steveKapturowski commented on June 9, 2024

Can you let me know the python and library versions that you're using?
Also, when you tried the the original cts model can you describe precisely which changes you made to the code?

from tensorflow-rl.

dhfromkorea commented on June 9, 2024

Hi, I am using python 3.5 and cython 0.25.2. Here you can check the code: https://github.com/RL-ninja/beating-montezuma/blob/master/algorithms/paac_cts.py

FYI, I have not changed the cython code at all. The error happens when I consume the code as it is.

A correction: the line causing the error was not resize(). It appears having to do with cts_update:

cdef double cts_update(CTSStruct* cts, int[:] context, int symbol):    
    cts[0]._time += 1.0
    cts[0].log_alpha = log(1.0 / (cts[0]._time + 1.0))
    cts[0].log_1_minus_alpha = log(cts[0]._time / (cts[0]._time + 1.0))
    cdef double log_prob = node_update(cts[0]._root, context, symbol)
    return log_prob

Along the call trace, the seg fault happens either in node_update or further down.

cdef double node_update(CTSNodeStruct* node, int[:] context, int symbol):
    lp_estimator = estimator_update(node[0].estimator, symbol)

    # If not a leaf node, recurse, creating nodes as needed.
    cdef CTSNodeStruct* child
    cdef double lp_child
    cdef double lp_node

    if context.shape[0] > 0:
        child = node_get_child(node, context[context.shape[0]-1])
        lp_child = node_update(child, context[:context.shape[0]-1], symbol)
        lp_node = node_mix_prediction(node, lp_estimator, lp_child)

        node_update_switching_weights(node, lp_estimator, lp_child)

        return lp_node
    else:
        node[0]._log_stay_prob = 0.0
        return lp_estimator

I am in the process of setting up gdb for cython so I can trace down the error.

For some reason, the seg fault error consistently occurs at a specific "factor" which is [i=1, j=26] where height and width are 42.

I hope to figure out the bug (most likely my bad.. :-) asap...

from tensorflow-rl.

dhfromkorea commented on June 9, 2024

to clarify what I mean by [i=1, j=26], I do:

    cpdef (double, double) _update(self, int[:, :] obs):
       ...
        for i in range(self.height):
            for j in range(self.width):
                ...
                print(i, j)
                log_prob += cts_update(&self.cts_factors[i][j], context, obs[i, j])
                log_recoding_prob += cts_log_prob(&self.cts_factors[i][j], context, obs[i, j])
        return log_prob, log_recoding_prob

And I get consistently:
(1, 24)
(1, 25)
(1, 26)
[1] 3918 segmentation fault python train.py -g montezuma_revenge -df logs/montezuma/ -algo paac_cts

Funny if I change the game to Breakout, I get, again, consistently:
(1, 8)
(1, 9)
(1, 10)
[1] 3623 segmentation fault python train.py -g breakout -df logs/breakout -algo paac_cts

from tensorflow-rl.

dhfromkorea commented on June 9, 2024

Hi, I got it fixed. The problem was the state (game screen image) came as being of a uint8 type... and I just had to convert it to float64.

Thanks for bearing the stupid question. Thanks again for your advice and insight!

UPDATE: with fast_cts, I am getting 272.2787954246189 steps/s avg. (with 4-frame skipping)

This still feels very slow, compared to what I normally get without cts: 2000 steps/s.

Do you have any thought how we can make this even faster? (e.g. running on a GPU)

from tensorflow-rl.

steveKapturowski commented on June 9, 2024

@dhfromkorea I could imagine parallelizing the per-pixel CTS model updates, but as there are really no matrix operations going on I'm not sure that running it on a GPU would be a huge help. On my system I found that the Double DQN updates were actually a more significant bottleneck than the density model updates. If you comment out the density model update in my implementation what sort of performance do you get?

Also, sorry for the delay-- I've been travelling and haven't had much time on my computer before ICML

from tensorflow-rl.

segmentation fault with fast_cts.pyx about tensorflow-rl HOT 7 OPEN

Comments (7)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent