genn-team / genn Goto Github PK

View Code? Open in Web Editor NEW

229.0 229.0 57.0 253.43 MB

GeNN is a GPU-enhanced Neuronal Network simulation environment based on code generation for Nvidia CUDA.

Home Page: http://genn-team.github.io/

License: GNU Lesser General Public License v2.1

Makefile 0.22% Shell 0.32% C++ 77.33% Objective-C 0.57% Batchfile 0.12% C 10.23% Python 11.18% Dockerfile 0.04%

computational-neuroscience hacktoberfest nvidia-cuda simulation spiking-neural-networks

genn's People

Contributors

Stargazers

Watchers

Forkers

u20024804 wmji karma-revolution lebedov shahdevansh shailesh-tripathi sayan1an fereshtehvosta pablogn kernfel ajocelynpatrick jewelylee mstimberg brad-mengchi temcom tenxlenx taaccoo-beta starhxh cpehle astepanov83 quantumgame ajaysub110 karlhuang007 tapaswenipathak costrau arthas1121 knowledgefold 9inpachi obaid51 tsakalos kanishk16 adityakapoor74 claint76 abhirami1799 blipblipgo aiswarya797 passivesheep99 zhiyupan jdmonaco chenlumen lennox-elaphurus tahwaru iniwap neurorobotictech mehmet5353 muffgaga rafaelblevin821 33rupes jhnnsnk stevinson xybeth stephentaylor1998 puterd bdevans genema coachellaaaas anindyaghosh

genn's Issues

Before running the simulation, check if necessary arrays are copied to GPU

Silent errors occur when we forget to copy a variable to the GPU. It would be useful to warn the user if:

the values in the GPU are not meaningful (in general arrays contain only zeros if allocated but not initialized)
necessary GeNN functions are not called

Provide a guide for users to explain what should be set and in which order by the user

Small DT and single precision floating point

If DT is small (e.g. 0.01) and single precision floating point is used I have observed that
int riT= (int) (runtime/DT);
resulted in riT== 0 for runtime= 0.01 and hence no time stepping.
I suggest to amend all user projects that do a similar step to use:
int riT= (int) (runtime/DT+1e-6);

block size estimation does not consider that and how shared memory requirements depend on blocksize

Avoiding creating unused variables

I'm renaming this thread as we have other instances of unused variable type of problems and it would be nice to take care of this in a more systematic way during code generation.

The previous thread consisted of the warnings on:
Unused variable warning for ipost, ipre for GLOBALG used with NSYNAPSE model

This happens in the SynDelay project. It is not as straightforward as it seems; indexing is not necessary when we use GLOBAL, but if we had a synapse model with a variable that uses these indices they would be needed. These are just warnings at the moment, but it would be more elegant to make a proper check.

Another example of this kind of problem is unnecessary post-to-pre arrays for sparse connectivity. They are (presumably) not needed unless the learning kernel is used.

Creating unnecessary variables would fill the memory unnecessarily and we should avoid this.

Better global memory access

According to the CUDA programming guide, align keyword makes copy of a number of consecutive values in an array to run as fast as a single element copy. It may be useful especially for sparse connectivity.

Code-generated function createSparseFromDense<synapseName> only works for g

@ajc158 pointed out that this function is problematic when the model does not have any variable called g. This function should be changed to either:

provide an individual function for every variable of every synapse,

take variable name as an argument to the function.

A workaround would be to disable the lines that write this function in generateRunner.cc. An equivalent function which is independent of any variable or synapse grup is available in $GENN_PATH/lib/include/sparseUtils.cc.

If you are using this function, please be aware that it will be changed.

Documentation issues

homepage -> ask its
wiki content on github

Remove type conversions

This should not really matter but we should e careful about silent conversions.

Revisit remaining memory after allocation of all sparse arrays

1: Recalculate remaining memory after allocating all sparse arrays
2: Warning should be thrown ONLY if the number of connections per neuron is smaller than the possible upper limit

mpi implementation for multiple hosts

multi-device support (not multi-host)

MBody1_userdef out of sync with changes made to MBody1

While introducing the new random number generator a bug was found and fixed in the MBody1 example. The MBody_userdef example needs the same updates.

Arbitrary compilation problem with stable release 1.1.1 on ubuntu 14.04

Sometimes the projects won't compile, giving an nvlink error. It is probably a gcc/cuda version problem, and we don't have this problem in the development branch. It will be fixed in next release, please use the development branch until the next release if you experience this problem.

Block and grid sizes in sparse connectivity

Not really sure if we can actually improve it

Introduce a "bijection" connectivity type

This would be initialised with a list of matched indices pre-post.

Testing script to see if everything works alright after a modification

Post-synaptic and the new synapse dynamics variables are allocated for DENSE connectivity only

It appears that these variables are always allocated as if the connectivity scheme was dense.

Error in parsing PTXAS info in certain versions of CUDA on Windows

CHange ptxas info parsing to internal CUDA functions.

In the MBody_userdef example gRaw is not initialised correctly

gRawKCDN should be initialised as the inverse "g-function" of the initial gKCDN values that are read from file. This is currently not done and gRawKCDN has the standard initial value from the modeldefinition function. For an example how it can be done see MBody1 example in map_classol::read_kcdnsyns().

If a user tries to use double precision but the device does not support it, the run fails rather than either warning or trying to demote everything

We need to have a look at this - maybe it's not that urgent as most devices now do support double precision. See corresponding issue 18 in brian2genn as well.

Get rid of the unused variables in the CPU version

More reasonable random number generation for Poisson neurons

We can use CUDA random libraries. Event-based spiking can be also considered to improve precision and speedup.

In device choice the load on a device used for display is not considered

Calculate explicit input from the input rule on the GPU instead of reading from an array

Heterogeneous delays

"delay granularity" and individual delays in a group

Make queues for copying device variable once every x timesteps instead of every timestep

As memory transfers between the CPU and the GPU are costly, it makes sense to queue some stuff in the memory and copy it once it is big enough. The same applies for writing results to files, etc.

CPU code: write a function which is equivalent to generate_process_presynaptic_events_code in generateKernels.cc

This should save us from writing the same thing twice for true spikes and events in generateCPU.cc (which is a problem that we have already solved in generateKernels.cc)

In synapse kernel some neurons may not be updated if synapse block size is smaller than the neuron block size

Synapse delay bug: spike 'event' variables are not allocated and accessed correctly with delayed synapses

Some spike 'event' code is still generated without synapse delay queue mechanisms, causing pointer / reference errors when delay is requested (try Syn Delay example project).

Will fix in my next commit.

test delayed synapses with learning

Make all the synapses user-defined

All the standard models are already rewritten as user-defined in MBody_userdef example. Wee need to move these to utils.h or somewhere similar.

Fix PoissonIzh project to take into account the latest Poisson neuron model

Indentation in user-defined code snippets

The first line is aligned with the indentation of the generated code but the other lines don't. This makes the generated code a bit ugly but there are no functional problems.

padSumSynapseTrgN etc.

We define these (sumSynapseTrgN, padSumSynapseTrgN, padSumSynapseKrnl) in what used to be called NNmodel::initDerivedSynapsePara, but is now renamed to NNmodel::registerSynapsePopulation. However, these don't seem to be used anywhere? If they are not necessary any more we should remove them.

Provide a helper function to let users use curand libraries to create their own connectivity matrices using CUDA

When simulating millions of neurons 32 bit integers are sometimes too small for memory allocation

I suspect we should migrate everything to 64 bit integers to make sure that things work smoothly even if everything is scaled up (a lot). At the moment the point of failure coincides with where we run out of memory but with the next devices this may well change.

Use "name_substitutions" and "value_substitutions" snippets where possible when substituting variables

Currently they are only used in a few places. There are many other occurrences of "substitute" where it makes sense to use these snippets instead.

User-defined plasticity rules

Post-to-pre feedback is not working at the moment

Revisit blocksize optimisation

investigate instruction level parallelism.
performance gap between small models and larger models
revisit optimization goal (currently device occupancy only), e.g. memory bandwidth, memory sizes, etc.

If more than one sparse synapse population is defined and maxConn is defined only one of them, second population goes out of array when looking for maxConn

In order to solve this, setMaxConn can be called for every synapse population during synapse creation instead of pushing in a vector only when it is needed. Current workaround is to define maxConn for every sarse population in the model definition (an upper limit such as number of postsynaptic neurons is ok), or not to set it for any population at all -- the latter will se t the maxconn to number of postsynaptic neurons.

Simplification of user interaction with circular spike queues when delays are used

I think we should provide macros that translate something simple like "SpikePN[0]" ... "SpikePN[SpikeCountPN]" into the apropriate expressions of the form "glbSpkPN[spkQuePtr*...+...]" etc.

New blocksize optimisation

I have created a new way of acquiring ptxas info so that we can build a more robust blocksize optimisation procedure.
While doing so I have noticed that the current logic is relying on a very old method for testing whether learning is present (test for == LEARN1SYNPASE). This needs to be updated. Also the order of kernels in the optimisation is odd: synapse, learning, neuron. If there is no learning the second kernel slot runs on zeros (hopefully not with any detremental effects).
This needs teting and cleaning up.

Change host memory to pinned memory

Sparse connectivity with networks bigger than the block size is not supported for gpu sm < 2.0

This is because atomicadd of floating types is not supported by older gpus.

I think of 3 workarounds:

1: provide an overloaded atomicAdd function for floats if sm < 2.0 (maybe in the longer run, this is not planned at the moment). Users who want this can implement their own methods, for an example see: https://devtalk.nvidia.com/default/topic/458062/atomicadd-float-float-atomicmul-float-float-/

Also considering issue #29, we can do one of the following:
2: warn users that their device may not support certain functionalities (this seems like the best option at the moment)

3: if we have more of this kind of issues, we may considering providing a GeNN functionality table/checklist for each gpu on the computer. Based on this and the user model, we can improve auto device selection.

Creating sparse connectivity from dense under different floating types

The new code-generated version of createSparseFromDense calls the

countEntriesAbove function (DATATYPE * Array, int sz, float includeAbove)

function which picks all the non-zero (or all above includeAbove) values to create sparse connectivity.

We want includeAbove to be slightly higher than zero for floating type errors and divergence problems with zero values in our learning synapse model. The problem is that this value is defined as a float, and when the dense array uses double precision, we are mixing floating types. In the older versions, it uses a pre-defined value (asGoodAsZero) which is a float and its value was set in global.h. In the code-generated version, this value is set to the minimum of float or double according to precision, and when double precision is used it complains about the number not fitting in the defined float type.

I pushed a workaround which sets asGoodAsZero to double instead of float, and modified countEntriesAbove to use double as well. But it is still a bit messy.

Provide more modern alternatives for addNeuronPopulation etc

This is a suggestion only, but may be something to consider: Should we have overloaded functions where users use C++ vectors rather than C arrays for all parameters and initial values? For these we could then check that the number of provided paras is correct. Whereas with the current low-tech versions the users are on their own?