catid / leopard Goto Github PK

View Code? Open in Web Editor NEW

132.0 12.0 25.0 3.35 MB

Leopard-RS : O(N Log N) MDS Reed-Solomon Block Erasure Code for Large Data

License: BSD 3-Clause "New" or "Revised" License

C++ 94.35% C 4.03% CMake 1.62%

leopard's People

Contributors

Stargazers

Watchers

leopard's Issues

recovery_count > original_count

As the title says, I have small original count, large recovery and need to pad up to 32k/32k. The complexity is not pretty (though still best one can get for a MDS).
Would it be possible to arrange the field so as to avoid the overhead?

I can see some attempts in encodeL(), but indeed as the prototype code says, it doesn't seem to work, or would need specialized decoder. Is this inherent limitation of the algorithm, or just not implemented?

Doesn't compile on apple silicone / m1

Leopard does currently not compile on the m1 chips. (Rosetta does work but then you can't use the lib together with m1 native binaries)

Request: Allow recovery blocks > source blocks

The linked papers go a bit over my head. Is there an intrinsic reason we can't use recovery blocks > source blocks so long as source + recovery < 65535?

Request: support of generic CPUs (w/o SSSE3)

It will be great if Leopard can in runtime detect whether CPU supports AVX2/SSSE3/none and executes the appropriate code

Port to ARM NEON

I had one contact who may have found this useful on cellphones so I should port it to ARM NEON

Decoder to correct both erasure and error

I found a new paper about this FFT based Reed-Solomon Codes. They present erasure-and-error decoding of an (n, k) RS code. The decoding complexity is with only O(n log n + (n − k) log2 (n − k)). As reference, complexity of current erasure decoding is O(n log n).

"On fast Fourier transform-based decoding of Reed-Solomon codes"
by Yunghsiang S. Han, Chao Chen, Sian-Jheng Lin, Baoming Bai
http://ct.ee.ntust.edu.tw/IJAHUC2021.pdf

I post the link here as a notice. If you know it already, just ignore this issue. I cannot understand the theory in the paper. If you implement the error correction also in your leopard-RS library, it may be useful rarely. While erasure correction is enough mostly, possibility of error correction is welcome.

Request: support >65536 pieces

65536 data+parity blocks limit is too low for some PAR3-style applications. I will be be glad to see it supported.

Question regarding the API

I'm writing a go-wrapper for this implementation; BTW the C wrapper makes that really convenient, thanks for that!

I struggle to completely understand the API though. What I'm trying to do is quite basic: given n original data buffers or symbols, I want to generate n "parity symbols" s.t. that I can loose any n (of the now 2*n total) symbols and still recover the data.

I mainly experimented with the API in banchmarks.cpp: OK, sounds like I would need to call leo_encode with original_count == recovery_count for that, right? leo_encode_work_count returns 2*original_count for these params and calling leo_encode yields encode_work_data with the size 2*original_count as expected. So far so good. What is surprising though: I would have expected encode_work_data to contain the original data too instead it does seem to contain 2*original_count recovery or parity symbols. So is there no way I can achieve what described above using the public C API or am I using recovery_count wrong?

Thanks in advance :-)

Investigate Frobenius Additive FFT

I believe this can be a drop-in replacement for the FFT that is currently being used:
https://arxiv.org/pdf/1802.03932.pdf

More background:
http://www.texmacs.org/joris/ffft/ffft.pdf

New benchmark for ECC libraries

https://github.com/Bulat-Ziganshin/ECC-Benchmark

I made ECC benchmark that includes mainly your libraries :) The reason is that I don't know better alternatives for them, so if anyone has better ideas, I'm all ears.

Currently it lacks better build system (just compile.cmd works for me), as well as automatic data gathering/publishing system (I work on it).

Nevertheless, for me it looks better that your own comparison of your own libraries, so you may like to link it from your readme and/or publish yourself any data gathered this way.

BTW, the benchmark can be build with -fopenmp too.

Progressive/incremental encoding ?

Hi, is it possible to do progressive/incremental encoding with this? It would significantly reduce the memory usage if we could feed in one block of the input file at a time instead of having to keep the whole file in memory.

No description how to restore missed original data after decoding

I've looked into benchmark.cpp and the only place decode_work_data is used is at

leopard/tests/benchmark.cpp

Line 493 in b58d1ea

if (!CheckPacket(decode_work_data[i], params.buffer_bytes))

Thus i conjecture decode_work_data[i] has CRC32 in first 4 bytes, then the length of data in next 4 bytes, and then the data (which correspond to missed original_data[i]) itself in remaining (length) data(?).

But if this hypothesis is correct then we should have extra 8 bytes room in each decode_work_data[i], but the code uses buffer_bytes only (exactly as for any other buffer).

If the hypothesis is wrong then where should we extract missed original_data[i] from?

Allow more parity symbols aka losses than data

We've run into some hiccups making this work with more parity symbols than data symbols.

There is a check leopard.cpp:135 to prevent this and if removed it segfaults

#0  leopard::SIMDSafeFree (ptr=0xf8913455fd9fcc4a) at /tmp/leopard/tests/../LeopardCommon.h:480
#1  Benchmark (params=...) at /tmp/leopard/tests/benchmark.cpp:452

due to benchmark/test code assuming all lost symbols come from the data symbols at benchmark.cpp:452. I suppose test cases should cover loosing only some of the original data, but it'll presumably be straightforward to (randomly?) replace less than losses data symbols with parity symbols though.

We also run into some strange slowdown in leo_encode when adjusting the code to produce more parity symbols than data symbols. At first blush, there is seemingly just some runaway code that fills up many gigabytes of memory, which sounds strange since allocations seemingly occur outside leo_encode.

I've not yet fully understood the algorithm so I'd love your input on whether you think either the encode or decode algorithm should run into any performance or other hiccups when handling more parity chunks than data chunks. Thoughts?

Warnings from newer GCC versions

See https://travis-ci.org/nemequ/leopard/jobs/395897224

tests/experiments.cpp warns about argc and argv and the encodeL function being unused. Trivial fixes.
tests/benchmark.cpp warns about an implicit fallthrough. Fixes are available, but compiler-specific until C++17… see HEDLEY_FALL_THROUGH.

/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1480:41: error: a brace-enclosed initializer is not allowed here before ‘{’ token
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1480:42: sorry, unimplemented: non-static data member initializers
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1480:42: error: ‘constexpr’ needed for in-class initialization of static data member ‘Words’ of non-integral type
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1484:46: error: a brace-enclosed initializer is not allowed here before ‘{’ token
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1484:47: sorry, unimplemented: non-static data member initializers
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1484:47: error: ‘constexpr’ needed for in-class initialization of static data member ‘BigWords’ of non-integral type
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1487:43: error: a brace-enclosed initializer is not allowed here before ‘{’ token
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1487:44: sorry, unimplemented: non-static data member initializers
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1487:44: error: ‘constexpr’ needed for in-class initialization of static data member ‘BiggestWords’ of non-integral type
/home/travis/build/nemequ/leopard/LeopardFF16.cpp: In member function ‘void leopard::ff16::ErrorBitfield::Set(unsigned int)’:
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1492:9: error: ‘Words’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp: In member function ‘bool leopard::ff16::ErrorBitfield::IsNeeded(unsigned int, unsigned int) const’:
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1504:26: error: ‘BiggestWords’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1509:26: error: ‘BigWords’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1511:22: error: ‘Words’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp: In member function ‘void leopard::ff16::ErrorBitfield::Prepare()’:
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1528:24: error: ‘Words’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1545:32: error: ‘Words’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1551:9: error: ‘BigWords’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1563:28: error: ‘BigWords’ was not declared in this scope
/home/travis/build/nemequ/leopard/LeopardFF16.cpp:1569:5: error: ‘BiggestWords’ was not declared in this scope
make[2]: *** [CMakeFiles/libleopard.dir/LeopardFF16.cpp.o] Error 1
make[2]: Leaving directory `/home/travis/build/nemequ/leopard/build'
make[1]: *** [CMakeFiles/libleopard.dir/all] Error 2
make[1]: Leaving directory `/home/travis/build/nemequ/leopard/build'
make: *** [all] Error 2

4.7+ works.

Leopard for communications RS(255,239) or RS(255,223)?

I am working in the field of communications.
Can the Leopard Library used for the RS(255,239) or RS(255,223) codes?
What should be the input parameters for these codes?
Thanks in advance.

catid / leopard Goto Github PK

leopard's People

Contributors

Stargazers

Watchers

Forkers

leopard's Issues

Recommend Projects

Recommend Topics

Recommend Org