Comments (17)
Not sure if this fork wants to try this instead:
LuaJIT/LuaJIT#294
from luajit2.
We won't consider O(n) hashing in this branch either. I'd suggest avoid using Lua strings in the first place for large string buffers (like introducing a string.buffer builtin data structure or using FFI C buffers).
from luajit2.
Thanks for the troubleshooting @stone-wind, it is a valid concern for your specific use case, unfortunately properly fixing it inside OpenResty would be harder.
The original string hash algorithm in LuaJIT is very fast, it has constant time complexity for strings of arbitrary length. The CRC32 variant https://github.com/openresty/luajit2 uses is intend to make hash collision attack harder by randomize the positions the hash function samples while calculating the hash, it is slower than the original hash function, but nevertheless still constant time.
To actually address your issue of large amount of strings with same length and almost identical content, we have to resort to an O(n) hash algorithm, because we don't have ways to know beforehand where the differences in the strings are.
The CRC32 hash implementation does use every byte for CRC32 calculation for sizes up to 127 bytes. So if you can change your program to only read 127 bytes of data from the socket at once it should not experience this much collision attacks.
If that is not acceptable, another method to fix it is that we may consider allowing user to compile LuaJIT to use all bytes in the string for CRC32 calculation, downgrading the algorithm to O(n) but more resilient against collisions in very similar long strings.
Also a word of caution of using LuaJIT/LuaJIT#294, we never tested it in OpenResty and can not guarantee it does not introduce bugs in the LuaJIT core.
from luajit2.
we have to resort to an O(n) hash algorithm
Just $0.02 but my intuition is that constant factors will be at least as important as asymptotic cost:
- Throughput: How many bytes can we hash per CPU cycle?
- Fixed cost: How many cycles is the per-string cost, excluding the per-byte cost?
Does anybody know what is the current state of the art hashing algorithm and how it performs in these ways? Or know what numbers would be acceptable for OpenResty? Or what numbers we achieve today?
I am especially wondering whether an afternoon of Googling might turn up a .h
file containing a good all-round hash function that solves the problem for all workloads.
Here is the little bit of hard data I've been able to turn up right now:
- CRC32 performance is limited in hardware to 8 bytes per cycle. On Intel CPUs the 64-bit CRC32 instruction takes three cycles to complete, and the CPU can execute three of them in parallel, and so the throughput limit is 64 bits per cycle. (Source: Agner).
- Our current CRC32 implementation issues two CRC32 instructions in parallel and so it should be limited to 5.33 bytes per cycle of throughput i.e. completes max 2 x 64-bit CRC instructions every three cycles.
- Meow Hash claims to achieve 16 bytes per cycle of throughput. That would be 2x the hardware limit for CRC32 and 3x the limit of our current routine. (Just picked that one from a random google search.)
These are only upper bounds on throughput. Maybe actual performance will be more dominated by fixed costs or memory access. I'm just trying to frame the problem in quantitative terms and establish what level of performance we want and what we should realistically be able to achieve. That might help to decide whether it's better to hash whole strings or if we really have to sample them.
from luajit2.
Just an aside thought:
Could an interesting feature be string creation without interning?
I am thinking that the cost of hashing and interning strings of binary data probably outweighs the benefit. The interning would be valuable if the string were already present in the table, or was going to be used as a table key, or was going to be used for equality tests against other strings. However none of these things seems to be true for strings containing e.g. raw binary data received from a socket.
So the idea would be to create those strings using a special reserved hash value (-1
) that means they have not been hashed, have not been interned, cannot be used as table keys (error), cannot be tested for equality using pointer value. However, they can be used for I/O, they can be used for string library functions, and data extracted from them will be ordinary strings.
This would introduce a certain amount of new complexity - probably too much - but it would also simplify some things e.g. not having to worry about the cost of hashing/interning buffers of binary data, and not having to account for hashing while benchmarking applications e.g. accidentally sending test data that will intern much better/worse than live traffic.
(I suppose this is the same distinction in many other languages where some string data is interned and others is not and go by names like strings, string buffers, symbols, binaries, etc.)
EDIT: Could also be a step towards being able to use string
module functions to operate on strings allocated in FFI memory. I have the feeling when I write OpenResty code that really I'd prefer for binary data to be stored in C memory but that's in conflict with the idiom of operating on it using Lua string routines.
from luajit2.
@lukego
Do you mean a FFI version of the cosocket API? Something like socket:recv(bufp, len)
.
It is possible to implement string
module equivalent with the FFI char buffer. Just do the old C job. And we can also implement a FFI version of the ngx.re
API.
from luajit2.
So the idea would be to create those strings using a special reserved hash value (
-1
) that means they have not been hashed, have not been interned, cannot be used as table keys (error), cannot be tested for equality using pointer value. However, they can be used for I/O, they can be used for string library functions, and data extracted from them will be ordinary strings.
I already made a string buffer system that works in place of strings for some of the string library functions and experimented with them pretending to be strings object.
I also tried to extend it to work in the FFI system so it could work with binary data.
from luajit2.
Do you mean a FFI version of the cosocket API? Something like socket:recv(bufp, len)
That would be the direct approach and probably a better idea if it's feasible to migrate to that.
I was musing here about an indirect approach of adding a special sub-type of Lua strings that are non-interned and more suitable for binary data while mostly compatible with existing APIs. This would avoid the problem on this issue #60 because you would skip the checksum on binary network data, and that would reduce the pressure to settle for a collision-prone checksum algorithm on other strings. Everybody wins? It might be easy to implement on the VM side but definitely introduces new complexity for users.
(I'm thinking about Erlang where on the C side they had efficient strings but on the Erlang side they had inefficient ones and they invented a new type - "binary" - that worked as a good compromise.)
from luajit2.
Cool @fsfod!
from luajit2.
I have made a example hash-test.lua for test:
local ori = "I was musing here about an indirect approach of adding a special sub-type of Lua strings that are non-interned and more suitable for binary data while mostly compatible with existing APIs. This would avoid the problem on this issue #60 because you would skip the checksum on binary network data, and that would reduce the pressure to settle for a collision-prone checksum algorithm on other strings. Everybody wins?"
local cnt = 1000
local rm = {}
for i = 1, cnt do
rm[i] = tostring(math.random(100000000000, 1000000000000)) .. "-"
rm[i] = rm[i] .. rm[i] .. rm[i]
end
local s = os.clock()
local pre,cur = s
local raw = {}
for i = 1, cnt do
raw[i] = ori .. rm[i] .. "The return and break statements can only be written as the last statement of a block."
if i > cnt - 5 then
cur = os.clock()
print(cur - pre)
pre = cur
end
end
local e = os.clock()
print("total:", e-s)
when cnt is 2000:
0.154724
0.000195
0.000195
0.00017900000000001
0.000196
total: 0.155491
when cnt is 20000:
14.531935
0.0015210000000003
0.0015210000000003
0.0014779999999988
0.0015999999999998
total: 14.53806
Obvious conflict will become more frequency when cnt is larger. Then i change the hash function and test again:
when cnt is 1000:
0.00488
0.000135
1.2999999999999e-05
9.0000000000003e-06
2.9000000000001e-05
total: 0.005086
when cnt is 20000:
0.030095
2.4000000000003e-05
4.9999999999981e-06
3.9999999999971e-06
3.000000000003e-06
total: 0.030134
when cnt is 200000:
0.163455
2.9000000000001e-05
4.000000000004e-06
2.9999999999752e-06
3.0000000000308e-06
total: 0.163497
Obvious better, you can verify it by youself. The new hash func is wyhash.
I changed the lj_str_hash at lj_str_hash_x64.h
#include "wyhash.h"
static LJ_AINLINE uint32_t lj_str_hash(const char* str, size_t len)
{
static uint64_t seed;
if(!seed) seed = wygrand();
return wyhash(str, len, seed);
}
Wyhash is better than crc32_hw. it do not need SSE. Simple(header only, less c code) , fastest and solid, less conflict. below is the performance data from smhasher:
Hash function | MiB/sec | cycles/hash | Quality problems |
---|---|---|---|
crc32 | 544.84 | 92.65 | insecure, 8589.93x collisions, distrib |
crc32_hw | 9310.60 | 24.63 | insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2) |
crc32_hw1 | 30145.28 | 30.98 | insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2) |
crc64_hw | 10181.36 | 23.46 | insecure, 100% bias, collisions, distrib, machine-specific (x86_64 SSE4.2) |
wyhash | 16528.58 | 17.67 | |
wyhash32lo | 15933.47 | 17.77 |
luajit had done so much work on interning, i think we should add a function for string to get the hash value , thus we can fully use the results. the hash value is useful. wyhash is also a good PRNG.
from luajit2.
@lukego LuaJIT/LuaJIT#294 is used in Tarantool's LuaJIT fork in production for several years. (Since this version of patch were created. In fact Nick - patch co-author who helped me to debug patch and suggested some improvements - were member of Tarantool core team at that moment)
https://github.com/tarantool/LuaJIT
https://tarantool.io
from luajit2.
Hash function used in patch is not much essential. You could replace it. Main part is collision detection and switch to O(n) hashing.
from luajit2.
I mean "dynamically switch to O(n) hashing if long collision chain were detected". 99% of time hashing remains to be constant-time ( O(1) ). And only if bad things happens and a lot of strings with same hash value and length is inserted, then O(n) hashing is used only for strings with similar hash value (filtered with Bloom Filter). And if such strings collected with GC, then filter is cleared either.
from luajit2.
This looks very complex and I don't think we should go down this route. My hunch is that it would be very difficult to debug when bad things happen in such a sophisticated implementation.
from luajit2.
@agentzh did you read the patch? It is not that sophisticated.
Sorry for being noisy. I shut up myself at this point.
from luajit2.
@funny-falcon Yes, I read the patch. It's quite far reaching, even touched the GC part.
from luajit2.
chunk_num = 16;
chunk_sz = len / chunk_num;
chunk_sz_log2 = log2_floor(chunk_sz);
pos1 = get_random_pos_unsafe(chunk_sz_log2, 0);
pos2 = get_random_pos_unsafe(chunk_sz_log2, 1);
h1 = lj_crc32_u32(0, len ^ seed);
h2 = 0;
/* loop over 14 chunks, 2 chunks at a time */
for (i = 0, chunk_ptr = str; i < (chunk_num / 2 - 1);
chunk_ptr += chunk_sz, i++) {
v = *cast_uint64p(chunk_ptr + pos1);
h1 = lj_crc32_u64(h1, v);
v = *cast_uint64p(chunk_ptr + chunk_sz + pos2);
h2 = lj_crc32_u64(h2, v);
}
every loop just move one chunk,it's a bug here?
from luajit2.
Related Issues (20)
- fails to build for iOS simulator
- Versioning info confusion.
- error: “bad light userdata pointer” On AArch64 when Enable Memory Tagging Extension feature
- arm64 luajit loop cpu nearly 100% in lua_resume function,same code run in x86 is ok HOT 3
- 编译后静态库体积大 HOT 2
- please, when will the latest luajit repository be merged?
- openresty CPU single core 100% ( profiling luajit table rehash) HOT 1
- lj_ccall.c:1192:5: error: use of undeclared identifier 'CCALL_MAXSTACK'
- nogc bggc feature for luajit2 HOT 1
- ppc64le support HOT 4
- cmake support HOT 1
- Can OpenResty1. 25.3.1 support compilation and installation of LoongArch? HOT 1
- Shared dictionary cannot retrieve stale data HOT 4
- luajit default allocator does not respect alignment requirements in all cases HOT 1
- luajit vs luajit2 HOT 2
- Cmake support like tarantools and github action for auto testing?
- How do I use the tests in the t directory?
- Illegal instruction (core dumped) error when running LuaJIT on qemu-system-s390x HOT 1
- which version is stable version? HOT 1
- Cannot compile luajit while using ISO C standards. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from luajit2.