This will be necessary for Python 3 because weave is not being ported to Python 3 (ver

Material for a very recent Cython tutorial can be found here: <a href="https://public.

Add Cython as a codegen target,about brian-team/brian2

Comments (46)

mstimberg commented on May 22, 2024

A nice overview of the current solutions to interface with C code (excluding weave): http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html

I wonder whether it makes sense to maybe use Cython also for the spikequeue (instead of swig)?

from brian2.

thesamovar commented on May 22, 2024

Nice, I'll take a look at that on my flight back to Boston. :)

Yeah, you might be right about Cython for spikequeue - the advantage would be that it would probably automatically handle all the datatypes stuff easier than writing it with templates in C++ and then trying to make that work with SWIG. It does mean rewriting it from scratch though.

from brian2.

mstimberg commented on May 22, 2024

Just a small thing: I guess i we add cython as a codegen target we should also rename "cpp" to weave (and maybe have cpp as a common superclass for all targets that deal with cpp code -- I guess they'll have a lot in common and only the interface to Python is different?)

from brian2.

mstimberg commented on May 22, 2024

I'm wondering whether we should actually bother with Cython or rather go with numba. We would basically have to write a new Python target (one that doesn't use vectorization but instead loops, so rather similar to the current C code) and then wrap the function in autojit -- this simple approach seems to give comparable performance to highly optimized Cython code: http://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

But this is not a high-priority item for now, anyway.

from brian2.

thesamovar commented on May 22, 2024

I think for runtime code generation rather than standalone, it's not much work to implement a new language in any case, so we can just see how it goes later on.

from brian2.

mstimberg commented on May 22, 2024

Material for a very recent Cython tutorial can be found here: https://public.enthought.com/~ksmith/scipy2013_cython/scipy-2013-cython-tutorial.zip

from brian2.

thesamovar commented on May 22, 2024

OK I started work on this i the cython_codegen branch. I have something very basic working already but it's extremely slow. I think the problem might be that cython is not quite as smart as weave about cacheing, it does a lot more work before it gets to the cached version. But I might be wrong. If so, the only way to fix the problem would be to define a function inline, return the function, and manage the passing of variables into the namespace by ourselves.

from brian2.

mstimberg commented on May 22, 2024

What did you use for testing with Cython? I used a "pure state update" example like this

G = NeuronGroup(100, 'dv/dt=-v / (10*ms) : 1')
G.v = np.linspace(0, 1, 100)

And it is indeed very slow, much slower than pure Python. In this case, the reason seems to be mostly the exp function, though: it is using our unitsafe variant, i.e. it has to convert the argument to a Python object, then does unit checking and so on.
But judging from the profiler output, you are probably right about the cacheing as well. Our situation is quite different from the typical inline use case (regardless of Cython or weave), we know that our code will not change during a run and we call inline many, many times. It might therefore make sense to interface to Cython a bit more directly and use this knowledge to only to the full preparation in the before_run phase and directly call the code during run. I had a quick look at weave.inline vs. cython.inline, there's not much optimisation possible in weave but cython does not really implement much of caching and does a lot of parsing etc. at every inline call. So there's a lot of improvement possible and necessary here.

from brian2.

mstimberg commented on May 22, 2024

I looked into cython.inline a bit more and I don't think we can use it... I wonder if it is worth the effort writing our own method for invoking cython, maybe we should have a look at numba codegen first?

from brian2.

rossant commented on May 22, 2024

I looked into cython.inline a bit more and I don't think we can use it...

Why?

Also, have you looked at this?

maybe we should have a look at numba codegen first

I think it is also a good idea. In my experience numba is still young and a bit buggy, but if the generated code is simple enough, then it might work very well.

from brian2.

thesamovar commented on May 22, 2024

I think it's a bad idea for us to start rolling our own version of cython.inline for maintainability reasons. On the other hand, you can define a function in a cython.inline call with an explicit type signature, have that function returned from the inline call, and then each timestep you just call that function rather than calling cython.inline. Hopefully that should work and be efficient although I haven't tried it yet. Maybe we can even pass the namespace dictionary direct to the function so we don't have to provide arguments, e.g. something like this (untested):

def func(_namespace):
    cdef numpy.ndarray[numpy.float64_t, ndim=1] _array_neurongroup_v = _namespace['_array_neurongroup_v']
    ...
return func

The trouble with numba is that it's pretty reliant on the continuum Python distribution. I think we should implement a numba target as well as weave and cython, since it's not actually that much effort to write new runtime targets.

@rossant - what's the idea of that link?

from brian2.

mstimberg commented on May 22, 2024

When I said we can't use it, I didn't mean that we can't use Cython at all. But the way we use weave.inline currently doesn't work for cython.inline -- we call this at every time step to execute the generated code, weave.inline does a very quick lookup for code it already knows. Cython on the other hand analyses the code completely, processes parameters etc. which just doesn't scale to calling it each timestep.

from brian2.

mstimberg commented on May 22, 2024

On the other hand, you can define a function in a cython.inline call with an explicit type signature, have that function returned from the inline call, and then each timestep you just call that function rather than calling cython.inline

This sounds like a reasonable approach. With our codegen infrastructure, the knowledge about types etc., this shouldn't be too difficult to implement, I guess.

from brian2.

rossant commented on May 22, 2024

@mstimberg Yes, I guess you'd need to build the generated code once, and then call the "compiled" function at every time step.

@thesamovar This extension lets you compile Cython code at runtime, then call this function as you want. Cython is needed for compilation, not at every function call. It seems to be close to what you want to achieve, doesn't it? (now I don't know the code generation stuff much so I may be completely wrong!)

from brian2.

thesamovar commented on May 22, 2024

@rossant Yep, that's exactly what we want. How stable would you say their code is? I just have in mind the horrible situation with sparse matrices in Brian 1, and don't ever want to have to deal with that again. My feeling is that generally IPython is not stable, they tend to change things quite a lot.

from brian2.

rossant commented on May 22, 2024

@thesamovar I was thinking about taking this code and adapting the relevant portion to your needs. Take it as an "how-to-compile-cython-code-on-the-fly" example.

from brian2.

thesamovar commented on May 22, 2024

Right, it's not too complicated (same with the cython.inline function). However, that means using undocumented internal APIs which may well change in future versions, which is something I'm wary of doing. (That said, cython.inline is effectively undocumented as far as I can tell.)

from brian2.

rossant commented on May 22, 2024

Which APIs are internal and undocumented? I think these features are as stable as we can get right now in the Python ecosystem, but I may be wrong. Might be worth getting in touch with the Cython devs...

from brian2.

thesamovar commented on May 22, 2024

Interesting, OK - that might work then. @mstimberg you were worried about it being a lot of effort, but actually looking at the code in that link, which is very similar to cython_inline called by cython.inline, it's actually pretty straightforward and we could probably simplify it even more for our case. I'll try out a few more thigns either today or this weekend and see what's possible here.

from brian2.

mstimberg commented on May 22, 2024

@mstimberg you were worried about it being a lot of effort

@thesamovar: you said it would be a bad idea :) But yes, the code doesn't look too difficult (even though it would be nicer if one would refactor cython.inline so that this duplication of code is not necessary anymore), so why not.

from brian2.

thesamovar commented on May 22, 2024

OK I've been playing around with Cython a bit more and managed to make a bit of headway. So I wrote a quickly modfied version of cython_inline to see if making a function object and calling that, and thereby skipping all the parsing that cython does makes a difference or not. The answer: yes, but it's still super slow (results below).

Another problem is that there is no support for being intelligent about functions like exp and sin. If you want to use them in a loop, you'd better do a from libc.math cimport exp, sin. There's no support for doing this with cython.inline but I added it to my modified version of inline. OK, that makes another big speed improvement but it's still slow.

So I read that by default cython will still do bounds checking and allow for wraparound indices, which slows it down. I disable that. It's still incredibly slow. See my example Brian2/dec/ideas/cython_playing_around.py. It uses the new brian2.codegen.runtime.cython_rt.modified_inline.modified_cython_inline function.

Despite all these optimisations, here are some timings on the state update of the jeffress example:

cython_modified_inline: 0.39
numpy: 0.31
weave: 0.01

In other words, Cython is still slower than just using numpy, and weave is an order of magnitude faster.

Maybe I missed something but I think I did all the Cython optimisations I know how to do. Do you guys want to give it a try and see if I missed something?

from brian2.

rossant commented on May 22, 2024

Optimizing Cython code in the dark is indeed really painful. Annotations are helpful: they show you the unoptimized lines. You can also use annotations in the IPython notebook (%%cython -a).

from brian2.

rossant commented on May 22, 2024

Also: do you have the possibility to test pure Cython code (for instance in the notebook) without using the codegen machinery? In other words, generating the whole Cython script and benchmarking that in the notebook. It would make it easier to test things and to debug.

What is the type of _array_neurongroup_*? Regular NumPy arrays are awfully slow in Cython, you should use this if you don't already. AFAIK that's the fastest option as long as you don't use vectorized operators (just access to individual items).

from brian2.

mstimberg commented on May 22, 2024

Just as a little addition to what Cyrille sad: I think the current code has a lot of overhead since it does a large amount of work dealing with the parameters passed to the function. Since these do not actually vary (except for maybe t and similar variables), creating a class/object should be much faster. For every code object we would have a corresponding class and object in Cython, call a prepare function at the beginning, receiving all the parameters, and call a parameterless update function every time step.

A minor comment on optimisation flags: setting cdivision to True should also help a bit, otherwise Cython checks for division by 0 and raises a Python exception in this case.

from brian2.

thesamovar commented on May 22, 2024

OK, well, I made some progress. Long story short, I now have this (on a 10x larger problem than before):

cython_modified_inline: 0.21
numpy: 3.33
weave: 0.15

So now Cython is almost as fast as weave. I think almost none of the difference is overhead because the timing is based on just 100 function calls, and setting N=1 gives times of 0.01 for each of them. Also, tests on the earlier, slower version, showed that the time was scaling with N, so overhead is definitely not part of the explanation.

The big change was...

It's unexpected...

Setting the -ffast-math compiler option. Nothing else made much difference at all. The other compiler flags: small differences. Fiddling around with cython options like cdivision=True: small differences.

My guess is the default implementations of sin and exp are super-slow, and this is partially verified by the fact that removing the transcendental functions from the equations makes the speed go up about the same amount.

This raises some questions:

Does numpy by default use the slow implementation? Can we force it to use the fast one for a huge speed boost on pure Python mode?
Are there accuracy implications? Should we be using the slow versions? How do we find out?
What will happen on msvc instead of gcc?

Incidentally, I also tried the following optimisation:

cdef double* _cy_array_neurongroup_a = &(_array_neurongroup_a[0])

Then use the _cy_ version to completely avoid any bounds checking etc. I'm actually surprised given this, and looking at the generated C code, that the Cython and weave versions aren't the same. I guess it's a difference in the transcendental functions.

For the final version, if we decide to continue with it, I like Marcel's idea of wrapping everything up in a class.

from brian2.

mstimberg commented on May 22, 2024

Ok, that's great news, so we all have to apologise to Cython (especially you for your commit comments ;) ). I'm also quite surprised by the magnitude of the change with -ffast-math -- I actually had it switched off in a branch for quite a while since it didn't work with anaconda (before adding the include directory) and didn't notice such drastic performance drops (didn't do any benchmarking, though). But I think a big reason is also that your current example is a pure state update while the examples I tried spend a lot of time in synaptic code.

Anyway, about the _cy_ versions, If you are talking about the cython_playing_around example: you are not actually using those versions afterwards. If I use them, I get the following times:

cython_modified_inline: 0.26
numpy: 2.76
weave: 0.24

So that seems to be pretty close to weave now.
About the accuracy implications: Yes, we have reduced accuracy, that's kind of the point of this optimization flag, isn't it? Given my Linux bias, I'm not that interested in msvc, but given all this, I'd be very interested in the performance of icc+MKL.

from brian2.

thesamovar commented on May 22, 2024

OK, on fixing it to use the _cy_ versions I get the same times for cython and weave. I'm not apologising to Cython though, I still don't like it. ;)

On the accuracy point, my question is more: does the loss in accuracy matter for us? Maybe we should try this out in some examples. Basically, apart from knowing that -ffast-math can give different results, does it actually give significantly less accurate results or just results that do not conform to the IEEE standard. I'm not even sure if the question is meaningful.

from brian2.

mstimberg commented on May 22, 2024

My impressions is that we don't have to worry too much. AFAICT there are two kind of optimizations that -ffast-math does: reducing error checking (e.g. for valid function arguments) and arithmetical re-arrangements (e.g. replacing x/y by x*(1/y) if 1/y is a common subexpression). I think we don't care much about the error checking part and in general we do not guarantee any faithful translation of abstract code into C code in the first place. We parse equations with SymPy and do all sorts of term re-arrangements, I think the changes introduced by the optimization flag are rather insignificant in comparison. Nevertheless, it can't hurt to run some comparisons, I guess.

PS: Bertrand says hi!

from brian2.

mstimberg commented on May 22, 2024

Just a minor remarks about performance: I tested with the Intel compiler/MKL libraries and did not see any change in performance for weave in the cython_playing_around example. I guess this example is too simple to really get any performance benefit beyond what -ffast-math does.

from brian2.

thesamovar commented on May 22, 2024

Actually there was something else going on here, see #173 for explanation.

from brian2.

thesamovar commented on May 22, 2024

OK so for the moment to make Cython run fast we have to use our own modified Cython inline function. However, what this function does is actually pretty minimal and maybe we can make it work with normal Cython. I think the key thing is just whether or not we can force Cython to use the -ffast-math flag. I couldn't find a way to do it, but there did appear to be some code in there suggesting it should be possible. Any thoughts?

from brian2.

mstimberg commented on May 22, 2024

However, what this function does is actually pretty minimal and maybe we can make it work with normal Cython.

What do you mean by "work with normal Cython"? I don't think there's a way to use cython.inline directly, since as you say it doesn't allow to specify cflags and also we would need some workaround such as having the Cython code return a function as you suggested earlier. Our own, stripped down, simplified version of cython_inline doesn't sound too bad, though and I don't think we need to use anything internal, we just create the file ourselves and compile it as a cython extension.

from brian2.

thesamovar commented on May 22, 2024

Yeah our own version is not too bad, it's just nicer not to have code like that if we can avoid it. Maybe there is a way to include the c flags that I didn't see? Or maybe the Cython people could be encouraged to include it for a future release? But you're right, it's not terrible to have our version.

from brian2.

mstimberg commented on May 22, 2024

Maybe there is a way to include the c flags that I didn't see? Or maybe the Cython people could be encouraged to include it for a future release?

Looking at the code, there does not seem to be a way. But it would be an easy patch to add a new argument to allow for it and I'm sure they'd be happy to include it ;) I don't know how long their development cycle is, though.

from brian2.

thesamovar commented on May 22, 2024

I started work on this in a new branch cython_codegen2 because the old branch was too severely out of date. Rather than the modified version of inline that we were using before, I'm now using a modified version of the IPython cython cell magic and it seems to be much simpler and controllable. So far, the only template I've written is stateupdate but I think it should basically work apart from that.

@mstimberg, the big thing that is missing is support for function implementations. You're more familiar with that code, so if you have some time it would be great if you could have a look at it. At the moment, it runs very slowly because functions are calling back to Python, but it is running at least. If you don't have time or want to focus on other stuff, I'll try to work it out.

from brian2.

mstimberg commented on May 22, 2024

Cool, I'll try to have a look at the function thing soon.

from brian2.

mstimberg commented on May 22, 2024

I pushed a commit to the branch, making functions mostly work. The way the FunctionImplementation containers are supposed to work is that there are three possibilities:

name==None and code==None -- function exists under the same name in the target language
name!=None and code==None -- function exists under a different name in the target language
name==None and code!=None -- function is implemented as code

What "code" means for the third option is language-specific: for numpy, we provide a Python function, for C++, we provide code as a string. For Cython we should probably support both (this is what I added), either code as a string or a Python function (as a fallback for user-defined functions). For demonstration purposes, clip uses a code string, while floor etc. use the Python functions -- they should all be implemented as strings, though.

Two major things are still missing:

The call of argumentless functions (e.g. randn()) needs to be changed to get vectorisation_idx as an argument (even though we are not making use of it yet...), in weave we are doing this using a #define, I guess in Cython we have to replace it in the string ourselves
Function implementations can also have a namespace that needs to be included, TimedArray is the major use case for this

from brian2.

thesamovar commented on May 22, 2024

For name==None and code!=None I agree we should have Python fallback options as well as code strings. I left a space in the template for "support code" but didn't yet put anything in to implement this. Is this what you used?

from brian2.

mstimberg commented on May 22, 2024

I left a space in the template for "support code" but didn't yet put anything in to implement this. Is this what you used?

Yes, exactly. I moved it above the main function, don't know whether this matters in Cython, though.

from brian2.

thesamovar commented on May 22, 2024

We should probably not use #define in our C++ code either.

from brian2.

thesamovar commented on May 22, 2024

Ugh, trying to implement Cython support is really like bashing your head against a wall. It seems you can't create buffers with bool dtype so you have to work around it by using uint8. And so on and so forth. I wonder if it's really worth the effort and how the community settled on this as better than weave.

from brian2.

thesamovar commented on May 22, 2024

OK so I made some progress but there's still lots of things to fix and it feels like I'm basically not using Cython but rather working around it trying to coax it into producing the C++ code that I want. I'm wondering if there are other options given that weave is not being ported to Python 3. For example, we could continue to use Cython but only use it to wrap into a separately compiled C++ module. This gives us the advantage that we can basically just use our weave code but add in an extra Cython wrapping stage. Or, we could try to ditch Cython and weave and find our own way to do something effectively equivalent to weave but slightly more restricted in scope to what we want. Or we could continue trying to make Cython work.

Any thoughts? @mstimberg @rossant

from brian2.

rossant commented on May 22, 2024

I haven't followed the discussion in detail, but I would say that using
Cython to wrap a separately compiled C++module seems like a good idea. I
think it wouldn't require too much work but I may be wrong.

Le mercredi 9 juillet 2014, Dan Goodman [email protected] a écrit :

OK so I made some progress but there's still lots of things to fix and it
feels like I'm basically not using Cython but rather working around it
trying to coax it into producing the C++ code that I want. I'm wondering if
there are other options given that weave is not being ported to Python 3.
For example, we could continue to use Cython but only use it to wrap into a
separately compiled C++ module. This gives us the advantage that we can
basically just use our weave code but add in an extra Cython wrapping
stage. Or, we could try to ditch Cython and weave and find our own way to
do something effectively equivalent to weave but slightly more restricted
in scope to what we want. Or we could continue trying to make Cython work.

Any thoughts? @mstimberg https://github.com/mstimberg @rossant
https://github.com/rossant

—
Reply to this email directly or view it on GitHub
#27 (comment).

from brian2.

thesamovar commented on May 22, 2024

OK I'm making some progress on Cython finally. It's still a bit hacky, particularly choosing the right names for dtypes and handling all the different types of Variable, but it now seems to work and is reasonably efficient (not quite as efficient as weave but not far off).

The major thing still remaining to do is to implement all the different templates using Cython. This can be quite an exercise in frustration, but with the existing working examples as a reference it shouldn't take too long now. Then we need to test for efficiency and correctness, and it can relatively soon be merged. Testing for efficiency is usually quite easy: either it runs almost as fast as weave or hugely slower if you've forgotten to handle type definitions correctly in some part of the code.

from brian2.

mstimberg commented on May 22, 2024

Cool, I'll try to have a look at the Variable stuff quite soon. I updated setup.py in the branch, the test suite should now run on travis.

from brian2.

thesamovar commented on May 22, 2024

OK I made a pull request for this, let's continue there.

from brian2.

Add Cython as a codegen target about brian2 HOT 46 CLOSED

Comments (46)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent