Comments (18)
I use a marker that counts the distance to the original bucket. I currently throw an exception when that marker would overflow. Instead, I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead. That way there should be almost no performance penalty in the normal case, and I could still handle the extreme case.
from robin-hood-hashing.
well, it's always robin hood hashing, even when there are lots of collisions. Collision handling is always a linear search. When you remove the colliding elements there will be less linear search.
from robin-hood-hashing.
and without losing performance
If this is not possible I could imagine a solution that lets the user choose either pure performance (throw on overflow, as it is now) or a bit less performance but with graceful error handling. Of course, if it is possible to have both (graceful overflow handling and performance) this option is obsolete.
from robin-hood-hashing.
I think we can achieve this by sacrificing time or space efficiency only when the above extreme situations occur. This will ensure that we maintain optimal performance in most cases, but will continue to work in extreme conditions.
from robin-hood-hashing.
I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead
Does this mean that in that case, all items in the container will degrade into linear probing? If so, is this container likely to automatically promote it self back to robin hood hashing in the future?
from robin-hood-hashing.
If I understand correctly, you means there will be two linear search algorithms, one is we currently used (the fast version), and the other slower one is used to solve the problem of overflow (marker reaches the maximum), right?
So we can now determine that in the new implementation, the faster linear search algorithm may be replaced by the slower one. Then:
- Does this replacement occur globally in the specified container or only in some parts of that container where the conflict is more intense?
- Is there a reverse process for this replacement? i.e.: If a container conflict is no longer so intense, is it possible for the linear search algorithm to be replaced back to the fast one automatically?
from robin-hood-hashing.
Is there any update on this? I'm currently using your excellent implementation for sparse grids. The problem is that sometimes I get an overflow exception, which unfortunately forces me to use the much slower and memory hungry, but perfectly stable, std::unordered_map again ...
I would be very happy if you made some optional setting that makes the map 100% stable. Even if it makes it a little bit slower, it would still be much faster than the STL version!
from robin-hood-hashing.
What hash are you using? With a reasonably good hash this should never happen. Can you post just the hash code?
from robin-hood-hashing.
I'm using your default hash-function with values computed as follows:
uint32_t cellX = static_cast<uint32_t>((point.x - bounds.min.x) / bounds.size.x * gridResolution);
uint32_t cellY = static_cast<uint32_t>((point.y - bounds.min.y) / bounds.size.y * gridResolution);
uint32_t cellZ = static_cast<uint32_t>((point.z - bounds.min.z) / bounds.size.z * gridResolution);
uint32_t cellIndex = cellX | (cellY << 10) | (cellZ << 20);
auto& entry = grid[cellIndex]; // grid is a robin_hood::unordered_map<uint32_t, ...>
Edit: I just remembered that I "fixed" the exceptions by changing my key to the code above. I used this before and it would cause the exception more often:
cellIndex = cellX + cellY * gridResolution + cellZ * gridResolution * gridResolution
Also pretty interesting is that the overflow occurs when I simply copy all the elements from one map into another:
for (const auto& entry : source.grid) {
target.grid[entry.first] = entry.second;
}
from robin-hood-hashing.
Can you share a reproducer? What compiler are you using, and do you compiler for 64 or 32 bit?
from robin-hood-hashing.
I wrote you an email with the link to the minimal reproducer. I'm using Visual Studio 16.4.2 and only compile for 64 bit. Thanks!
from robin-hood-hashing.
Thanks for the reproducer! I could reproduce it, here's what's the issue:
for (const auto& entry : sourceNode.grid) {
targetNode.grid[entry.first] = entry.second;
}
You iterate a large hashmap, and insert all the entries into targetNode.grid
map. All hashmaps use the same hash function, so it's basically iterating from lowest to highest hash value.
When inserting, since the targetNode.grid
is small, all the elements that are inserted will first cause a collision (because their hashes are so small), so eventually robin_hood throws in the towel and causes an exception. std::unordered_map
doesn't throw but insertion will be slow because it's boiling down to a linear search.
There is no ideal way to fix this problem, in your case you can do this in PointCloudConverter.cpp:
void PointCloudConverter::MergeOctreesInternal(Node& targetNode, Node& sourceNode,
uint32_t depth, uint32_t partitionDepth, uint32_t partitionIndex) {
targetNode.grid.reserve(targetNode.grid.size() + sourceNode.grid.size());
for (const auto& entry : sourceNode.grid) {
targetNode.grid[entry.first] = entry.second;
}
With that reserve()
call the map has already plenty of size for all the data to insert, so it doesn't feel the effect of hash collision due to linear insertion. With that change it works for me.
from robin-hood-hashing.
On second thought, this is really an issue of bad quality of the hash. If you replace robin_hood's hash_int with this code, this issue also doesn't happen (then there is also no need for reserve()):
inline size_t hash_int(uint64_t obj) noexcept {
#if ROBIN_HOOD(HAS_UMUL128)
uint64_t h;
uint64_t l = detail::umul128(obj ^ UINT64_C(0xa0761d6478bd642f), UINT64_C(0xe7037ed1a0b428db), &h);
return h ^ l;
#elif ROBIN_HOOD(BITNESS) == 32
uint64_t const r = obj * UINT64_C(0xca4bcaa75ec3f625);
auto h = static_cast<uint32_t>(r >> 32U);
auto l = static_cast<uint32_t>(r);
return h + l;
#else
// murmurhash 3 finalizer
uint64_t h = obj;
h ^= h >> 33;
h *= 0xff51afd7ed558ccd;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53;
h ^= h >> 33;
return static_cast<size_t>(h);
#endif
}
from robin-hood-hashing.
Finally came around to trying your other hash function and it seems to work for all the datasets that I have. In my opinion, a fallback to a slower linear search should still be added for absolute stability. The issue is that if my program needs to run for multiple hours, I need to be sure that it won't crash due to a very rare sequence of hashs.
from robin-hood-hashing.
You are of course right, a linear search fallback would be important to have. I don't have the time currently for this though. But I'm at least working on an improved hash.
from robin-hood-hashing.
I believe I've (finally) a simple & fast reproducer:
TEST_CASE("overflow_finder_simple") {
robin_hood::unordered_set<uint64_t> set;
for (uint64_t i = 0; i < UINT64_C(1000000); ++i) {
auto h = robin_hood::hash<uint64_t>{}(i);
if (UINT64_C(0) == (h & UINT64_C(0x1ff))) {
set.insert(i);
}
}
}
This shows that the lower bits of the hash are very bad, which is what can lead to the overflow.
mitigation strategy: create an initial very simple mixing hash that's part of the robin_hood table, that basically multiplies the value with a random odd number, and this is then passed on to the robin_hood hash. When overflow happens, add an odd constant to the mix multiplier, and rehash the map.
Advantages/disadvantages of this approach:
- no need to refactor the infobyte array, which would produce some slowdowns
- Only a single multiplication is needed at hashing => not much performance slowdown.
- allows to use the hash's address as initial multipler, to be a bit more resilient against overflow attacks
- implementation of rehashing is necessary - but can probably be implemented as a copy of the map.
from robin-hood-hashing.
Sorry to step in, but I have trouble with hashing of grid indexes (so, two ints, I'm working on a discrete spatial grid). Should I be able to fix the error by swapping the hash_int
as proposed above? @martinus
I have map overflows exceptions. I tried the above solution but seems to slow down my algorithm.
Do you know more specific and efficient hashing functions for a continuous bidimensional range of integers (AKA grid coordinates)?
from robin-hood-hashing.
I have been getting this exception extremely rarely when inserting sha256 hashed data -- truly random data really, into the hash table (I just take the first 64 bits of the 256 bit number)!!
Is this still some sort of known issue? I have seen this on so-called "good" hash functions, but extremely rarely.
from robin-hood-hashing.
Related Issues (20)
- Segmentation fault: 11 with std::this_thread::get_id(); as key in C++ HOT 5
- Idea for robin-hood-hashing V2 HOT 1
- Keys are not marked as const when used structured binding during iteration over container HOT 2
- My problem with robin_hood::erase HOT 4
- robin_hoodConfig.cmake missing HOT 4
- terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow Aborted (core dumped) HOT 5
- hope to support bazel HOT 1
- How can I further improve performance in a read-only load? HOT 2
- func unaligned_load can be optimized HOT 2
- Please only build tests when BUILD_TESTING=ON HOT 1
- undefined symbol: std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) HOT 1
- Project doesn't install anything HOT 1
- Possible Data Corruption Bug? HOT 2
- Potentially critical bug in Robin Hood invariant checking HOT 4
- does insert support std::pair? HOT 1
- Consider using `INTERFACE_SYSTEM_INCLUDE_DIRECTORIES`
- Explicit template instantiation
- There is a coredump if I defined <std::string, std::shared_ptr<void>> when a pointer delete
- intel compiler warnings (not a bug)
- crash because alignment of key is not respected HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from robin-hood-hashing.