Comments (11)
Hi @pucicu ,
could you please elaborate on which data you used and how your code snippet looks like, so that we can reproduce it?
Greetings
from pyunicorn.
from pyunicorn.
Update:
Problem appears to be within cython modules
error message:
src/pyunicorn/timeseries/visibility_graph.py in retarded_local_clustering(self)
254 norm = retarded_degree * (retarded_degree - 1) / 2.
255
--> 256 _retarded_local_clustering(N, A, norm, retarded_clustering)
257 return retarded_clustering
258
src/pyunicorn/timeseries/_ext/numerics.pyx in pyunicorn.timeseries._ext.numerics._retarded_local_clustering()
ValueError: Buffer dtype mismatch, expected 'ADJ_t' but got 'short'
from pyunicorn.
problem:
pyunicorn/timeseries/visibility_graph.py
in self.retarded_local_clustering()
- (line 256) error when calling
_retarded_local_clustering(N, A, norm, retarded_clustering)
:A
has dtypeint16
(short
in C) instead of expectedADJ_t
(~int8
as defined inpyunicorn.core._ext.types
)norm
andretarded_clustering
have dtypefloat64
(double
in C) instead of expectedFIELD_t
(~float 32
as defined inpyunicorn.core._ext.types
)
- same problem with
self.advanced_local_clustering()
quick fix/workaround
pyunicorn/timeseries/visibility_graph.py
in self.retarded_local_clustering()
- (line 256) instead of
_retarded_local_clustering(N, A, norm, retarded_clustering)
do_retarded_local_clustering(N, A.astype("int8"), norm.astype("float32"), retarded_clustering.astype("float32"))
- (line 277) same for
self.advanced_local_clustering()
thorough fix:
[resolved] dtype of norm
and retarded_clustering
:
initialize as np.zeros(self.N, dtype="float32")
within VisibilityGraph (lines 219 and 244)
[not resolved] dtype of A
:
pyunicorn/timeseries/visibility_graph.py
in self.__init__()
- (line 99)
vg.visibility_relations()
returns adjacency matrixA
of dtypeint8
- (line 104)
A
is then passed toInteractingNetworks.__init__(self, A)
, which returnsself.adjacency
of dtypeint16
pyunicorn/core/network.py
in self.set_adjacency()
- (line 482) dtype of adjacency matrix is set to
int16
(orint32
if N is bigger) - why is that, when adjacency matrix should be binary?
from pyunicorn.
Thank you for the detailed analysis, @fkuehlein! You have identified two oversights in 3dab5bf.
The easy thing first: The original types for the arguments norm
and retarded_clustering
were the deprecated FLOATTYPE=np.float
, which should have been translated as DFIELD=np.float64
instead of FIELD=np.float32
in the Cython function signature. I have fixed that now for all such cases in 2b489da.
For the argument A
, the appropriate solution is to downcast to ADJ_t
at the call site, using the helper core._ext.types.to_cy()
that exists for this purpose. I will leave that to you, and kindly ask you to add a test that triggers all possible scalar types for the adjacency matrix, in a similar fashion to test_generic.simple_instances()
. Please look out for other places where a similar pattern might occur.
There are several layers of causation to this bug, which I document here in hopes that it will inform the resolution and mitigation of other similar bugs:
- The offending commit included a large number of type conversions in an attempt to make the Python/Cython interface more coherent. This involved avoiding officially deprecated type aliases, making explicit many implicit type casts, and unifying previously inconsistent type uses throughout the codebase via semantic type aliases.
- Because of a significant lack of unit tests, in many cases these changes could not be automatically tested. This requires a systematic effort to extend the code coverage of unit tests to at least trigger all methods published in tutorials and in papers by the core authors.
- Furthermore, these changes have so far not been carefully reviewed due to a lack of resources. @fkuehlein, given the code analysis you have already done for this issue, it would be wonderful if you could extend your review to the entire associated change set.
- As you explained, the dynamically changing numeric type of sparse adjacency matrices is the specific issue underlying this bug. As far as I can tell, this is a choice from early on in the library's development and is motivated by some operations on sparse matrices, e.g., row-wise or column-wise sums, which would otherwise result in overflow w.r.t. the scalar type holding the binary values of the sparse matrix. Of course, circumventing the overflow problem by upcasting the binary values is a very resource-inefficient and not a failsafe solution. Ideally, the adjacency should always be represented using the smallest possible type and potentially overflowing sparse matrix operations should be rewritten, or at least implemented using only local and temporary upcasts. But this would constitute a significant change to the core of the library and would require extensive testing, reinforcing the point above.
- More generally, this issue is a prime example for the general need to either test much more rigorously or to rewrite the library in a strongly typed way. The offending commit attempted to find a compromise with very limited resources, by forcing type casts to fail only at the Python/Cython interface when they don't conform to some "standard" behaviour that was encoded explicitly in the semantic type aliases, under the assumption that no dynamic type changes of significance will occur inside the Cython functions.
from pyunicorn.
Thanks for your help and the detailed additional info, @ntfrgl!
The original issue should be resolved as of 1651a2b
. For now, that commit remains on my fork and I will create a pull request as soon as I have cleared at least the rest of VisibilityGraph
of similar possible occurrences.
I have two questions remaining in order to proceed on this:
-
Appropriate solution for these cases
As I have understood, the minimal solution (which you have pursued so far) would be to generally pass arguments
arg
of a cython module_ext.numerics._mod(arg)
with help of_ext.numerics._mod(to_cy(arg, ARGTYPE))
to make sure they are cast to the expected typeARGTYPE
(as is done throughoutcore.Network
).All remaining variables that might be affected by this within
VisibilityGraph
, however, are already explicitly set to the required dtype on initialization. Namely:-
The empty placeholder for the adjacency matrix
A
, which is correctly initialized to be ofdtype=MASK
. I guess I keep that as it is. -
VisibilityGraph.time_series
andVisibilityGraph.timings
, which are initialized to be ofdtype=np.float32
. Keep that, or putdtype=FIELD
there instead?
-
-
Implementation of a respective test
I'm not sure what you mean by "trigger all possible scalar types for the adjacency matrix".
self.adjacency
is a property of theNetwork
class. So maybe the idea is to minimally initialize all classes that inherit fromNetwork
and check what scalar types they return forself.adjacency
. Does that make sense? Or else, could you elaborate on that?
from pyunicorn.
Short-term answer
- Solution:
- Apart from mistakes like the ones above, the original commit introducing semantic type aliases followed a manual, defensive approach. The goal was to use typed initialisation where it is apparently valid (i.e., the variable is initialised within the current scope and no type-altering methods are applied before the Cython call), and to type cast directly at the call site otherwise.
- This means that most of the work should already be done, but it is expected that a few mistakes not triggered in the current test suite need to be fixed, and that some unnecessary copy/cast operations might be avoidable with a more precise type inspection.
- It would be preferable to replace the remaining non-alias types with the semantic aliases where possible.
- Testing:
- In the current bug's case, the problematic type mutation occurs depending on the node count, which means that in order to guard against future instances of similar bugs, the test for this method should be parametrised with several different network sizes to ensure that all possible cases of Cython calls are covered.
- The reliability and reusability of the test suite would be improved if this was done using parametric test fixtures, which exist for such scenarios.
- And btw, if in the process of looking for other similar situations you find it helpful to search for
ADJ_t
, then the semantic type aliases are fulfilling part of their purpose.
Long-term answer (worth discussing and creating a new issue for)
- Solution:
- The above is really an incomplete substitute for a proper type analysis, which has traditionally not been available in the dynamically typed Python language, and is held together by non-mechanised reasoning and incomplete testing. Given recent developments in the Python language and its library ecosystem, a much more reliable and efficient approach would be to use the new NumPy API type annotations. These could be introduced incrementally throughout the codebase, and would allow static type checking of the numerical types. I expect that this will be straightforward for the most part with few exceptions, that it might expose some well hidden bugs, and that it would make the library a lot more sustainable.
- Testing:
from pyunicorn.
Thanks a lot for your help and elaborations.
I fixed the above case accordingly in VisibilityGraph
. I also looked through all of pyunicorn.timeseries
and found no more conflicting types of the adjacency matrix, they're all safely cast to ADJ
already.
I will therefore make a pull request now so the original issue can be fixed.
Then I will look into creating an appropriate test for these cases. Thanks for the hint to the existence of parametric test fixtures! Maybe we can briefly discuss your 'long-term anser' next week with @jdonges' help.
from pyunicorn.
Ok, actually found another dtype=np.uint8
in timeseries._ext.numerics._visibility()
. The function in question is only used within test_timeseries.testVisibility()
though, which is currently commented out.
Fixed both the test and the cython function with 91836d4
for a start, on a new branch. Still, the test only covers a specific testing method, so test coverage is lacking (let alone parametrisation). I will therefore further look into enhancements.
Putting this here just for documentation.
from pyunicorn.
Let me add to the critical observations in your latest commit message, since it doesn't have an associated issue yet:
_visibility()
: The core of this function is a loop with a 1-bit accumulator, and should not be allocating an array and calling several NumPy methods on it.testVisibility()
: A test which essentially replicates the code under test is almost useless.
from pyunicorn.
Resolved by #183.
from pyunicorn.
Related Issues (20)
- MAINT: review funcionality and use of `CouplingAnalysis.get_nearest_neighbors()` and add a specific test
- ENH: Theiler window option for diagonal RQA measures in `RecurrencePlot` HOT 2
- CI: set up Travis CI for macOS and Windows builds
- MAINT: drop `climate.MapPlots` in favour of `climate.CartopyPlots` HOT 2
- ENH: Remove old examples folder
- `Network.nsi_local_clustering()` might encounter divide by zero when `typical_weight` is given HOT 1
- module 'numpy' has no attribute 'int' HOT 1
- Failed building wheel for pyunicorn while installing HOT 4
- Printing `ClimateData` object throws `KeyError` from `h5netcdf` HOT 2
- Add `typical_weight` argument for all n.s.i. measures to enable their "corrected" versions
- numpy module not found during installation (Win10) HOT 1
- DOC: sphinx `Didn't find <class attribute> in <module>` when building docs HOT 2
- ModuleNotFoundError from pip install (version 0.6.1) HOT 2
- Complete implementation of new caching system, and extend
- 'pip install pyunicorn' no longer works - fails to build wheel HOT 6
- Python 3.13: igraph tests fail with SystemError (method returned NULL without setting an exception) HOT 3
- MAINT: migrate to Numpy 2.0 HOT 1
- BUG: Potentially endless while loops in experimental methods `Network.GrowPreferentially`
- MAINT: port remaining C modules to Cython in `climate`, `core` and `timeseries` extension modules
- BUG: `PartialCorrelationClimateNetwork._calculate_correlation()` throws dubious divide-by-zero warning on Windows builds only
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyunicorn.