Comments (5)
I could finally take a look at the code. Apologies I don't have a concrete answer or a working patch yet, but here's theory to what may be happening:
LOC 1
NearestNeighbors.jl/src/knn.jl
Line 36 in ac0338c
At this point in the code we have had initialized the indices array with -1
, and the distances with Inf
. The code is probably assuming that by now the indices have all become valid. With a distance of NaN
, though, we still have a -1
indices there, and then we can assign tree.indices[-1]
to idx[j]
, what is allowed by the @inbounds
. This is how we get crazy values in the output.
LOC 2
NearestNeighbors.jl/src/kd_tree.jl
Line 208 in ac0338c
I believe the ultimate reason this ends up happening is because any tests with a distance of NaN
will return false, including NaN < Inf
. Notice it's the same for Inf
in this line. Here is potentially where this may be happening.
I think the solution for that may actually involve some decisions about how the whole thing can behave. If the metric function was returning Inf
when we get NaN
, this might help, but I'm not sure this would guarantee the proper initialization of the indices. We might actually have to initialize them with eg 1
as well, and then in the end we would get (1, Inf)
for such NaN
points. Or we could even do something nifty and use "nothing" as the index, since this should be the neat way to implement an optional class for integers, that in the end is kind of what NaN
is. In any case, we probably want to do something to prevent invalid array accesses based on these NaN
values (and Inf
s?), which unfortunately are all valid Float64. Actually testing the inputs to detect NaN
s would be the other path...
from nearestneighbors.jl.
I think initializing with 1 should be just fine, any match might be considered "good" for a NaN or Inf distances, as long as we have that distance value along with the result to judge what happened. It's a good thing if we guarantee always valid indices. The other approach is a neat handling of NaN as an optional class, either returning nothing
or a guaranteed invalid index such as 0
, which I'm not really a big fan of. In my opinion, returning 1
to an Inf distance, and mapping NaN distances to Inf should be fine for this code.
I don't believe there really is a Julian way to do this, because part of it is about application domain decisions. Although using nothing
is indeed more or less the Julian way to implement an optional class, similar to modern C++ std::option, or the Scala option etc, or the hacky way we use "None" "null" and pointer to zero in JavaScrip, Python C etc... So returning a nothing
index would probably be a neat way to do it, but I personally think ensuring valid indices would be better.
from nearestneighbors.jl.
I also experience the problem of non-existent large returned indices. Will have to investigate further.
Maybe #78 might be related?
from nearestneighbors.jl.
Clearly, something is going bananas somewhere. I think this just has to be honestly debugged with print statements and whatnot to find out where things go bad.
from nearestneighbors.jl.
Handling of nothing/null/Nan seems to be an never ending story. Initialization with 1
has the problem that in the worst case when not checking the distance, one might just reference into 1
, obtaining a wrong result. I prefer the initialization with nothing
, but this can lead to exceptions where if one doesn't expect nothing
, though that probably is what should happen (and its more informative then some random Int). However I cannot tell how this comes down to performance...
A dirty compromise (still and Int, but throwing out of bound exceptions) might be returning 0
.
Is there a Julian way to deal with NaN/nothings?
from nearestneighbors.jl.
Related Issues (20)
- README.md Misleading Custom Metric Documentation
- Document that `inrangecount` also counts the point itself HOT 2
- [Question] Can you insert new data into an existing KDTree object? HOT 2
- Compilation time issues with very high dimensions HOT 3
- Reverse Cuthill-McKee ordering option HOT 1
- Querying number of distance evaluations HOT 3
- Make datatypes of the KNN results selectable for potentially lower memory overhead
- Does ball tree work with any metric? HOT 2
- Add example with `skip` option to documentation HOT 1
- Julia 1.10 is waiting on IO to finish during compilation HOT 3
- It should be possible to make `KDNode` smaller
- KDTree: Wrong results for non-Euclidean metrics
- Cannot build KDTree with Subarrays since v0.4.14 HOT 3
- KDTree with Matrix{ComplexF64} HOT 1
- Can't do `knn` on `AbstractVector{SVector}` HOT 2
- Test benchmarks and have them run on CI
- 1.0 road map HOT 5
- `get_min_distance_sq` seems weird
- Interface for tree traversal / walking of BallTree/KDTree HOT 3
- Trees for integer input data errors now it seems
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nearestneighbors.jl.