emmanuel-marty / apultra Goto Github PK
View Code? Open in Web Editor NEWFree open-source compressor for apLib with 5-7% better ratios
License: Other
Free open-source compressor for apLib with 5-7% better ratios
License: Other
my optimization
int NUM_ELEMENTS = 0;
// Get a pointer to an array
apultra_arrival** pDestSlotsPtr = &pDestSlots;
// Size of the entire pointer array
size_t arraySize = sizeof(*pDestSlotsPtr);
// Size of one pointer
size_t elemSize = sizeof(*pDestSlotsPtr[0]);
// Calculate the number of elements
NUM_ELEMENTS = arraySize / elemSize;
int left = 0;
int right = NUM_ELEMENTS - 1; // NUM_ELEMENTS - size of the pDestSlots array
while (left <= right) {
int mid = left + (right - left) / 2;
if (pDestSlots[mid].cost < nCodingChoiceCost) {
left = mid + 1;
}
else {
right = mid - 1;
}
}
int n = left; // found index
after this code
if (nCodingChoiceCost < pDestSlots[nArrivalsPerPosition - 1].cost ||
(nCodingChoiceCost == pDestSlots[nArrivalsPerPosition - 1].cost && nScore < pDestSlots[nArrivalsPerPosition - 1].score &&
nRepOffset != pDestSlots[nArrivalsPerPosition - 1].rep_offset)) {
int exists = 0;
I adapted i8080 decompressor for GBZ80: https://github.com/untoxa/UnaPACK.GBZ80 you may add a link to it, if you wish. thank you.
Hi Emmanuel, thank you for this project, it is so wonderful to see a pure-C packer for the aPLib format the can be included in other projects (particularly development for old comupters/consoles)!
You may wish to check out my aplpak repository which includes a 6502 aPLib decompressor that executes approximately 2.5x the speed of Peter Ferrie's 65C02 code that is included in Jorgen Ibsen's aPLib distriution (at the cost of 27 bytes - 256 vs 229).
My aPLib code is decompressing at approx 75 cycles-per-byte, which compares favorably to Peter Ferrie's 6502 LZSA2 decompressor, which is decompressing at 77 cycles-per-byte with the same test file (gpl-3.0.txt).
Are you open to making some (optional) alterations to the aPLib format to improve the decompression speed a little bit more?
One obvious change (IMHO) would be to nibble-pack the 4-bit offset in the copy-one-byte-from-within-15-bytes "111 nnnn" encoding.
Another change would be to use seperate bit-buffers for the main 2-or-3-bit compression-type code, and the Elias gamma-codes, because the gamma-codes are always read 2-bits-per-loop, and so the routine that reads them would only need to do a single check for reading a new byte if the 2-bits were always 2-bit-aligned in the bit-buffer.
Does that explanation make sense?
Are you open to tweaking the format a little?
Hi!
thanks for making apultra, pretty easy to integrate. I noticed a strange behavior with unaplib_68000.S and some packed files. At first I though it was a packer bug but it seems to be a depacker bug. ( the specific file is depacking properly using another 68k aplib decoder ). I didn't investigated the issue but I attached problematic file if you're interested. ( just pack it, and it doesn't depack properly using unaplib_68000.S: The first byte is literal (normal) and the second token try to depack a string with "0" offset (producing bad output data).
cheers
Arnaud
sprite.zip
Hi,
I am need to set up the compression level in apultra, but I am having some trouble. I am using the source code, and I not see feature for this.
Can you please help me?
Thanks
Hey,
for better reach of the community, I would like to suggest to add the following topic to the repository: motorola-68000
Apultra is an excellent compression algorithm with great compression ratios. As an optimization enthusiast, I would like to propose a way to significantly improve Apultra's performance while maintaining the integrity of compression.
The idea is to invert certain cycles to reduce processor cache misses. For example, the following cycle can be inverted:
for (int n = 0; n < numElements; n++) {
if (condition) {
// ...
}
}
Into:
for (int n = numElements - 1; n >= 0; n--) {
if (condition) {
// ...
}
}
Benefits of this approach:
With cycle inversion, we can optimize hot loops that dominate the compression ratios like the ones in apultra_optimize_forward, hash search cycles, and other auxiliary algorithms.
My benchmarks demonstrate [2-4x] performance improvements with these optimizations while maintaining bit-exact output on multiple datasets.
It would be great if you can assess the viability of including cycle inversion in Apultra's core optimization pipeline. Please let me know if any other details are required. These micro-optimizations can significantly improve the real-world speed of compression without affecting accuracy.
I optimize apultra_optimize_forward
and receive good results.
Hi,
Having problems converting the small 8088 depacker to x86. For an EXE file of 814232 bytes, it fails to decode file after processing 811240 bytes. It works fine with smaller files, so have you any suggestions on how to fix it?
I think it better code ๐
static void apultra_insert_forward_match(apultra_compressor *pCompressor, const unsigned char *pInWindow, const int i, const int nMatchOffset, const int nStartOffset, const int nEndOffset, const int nArrivalsPerPosition, const int nDepth) {
unsigned char slots_mask = 0xFF;
const apultra_arrival *arrival = pCompressor->arrival + ((i - nStartOffset) * nArrivalsPerPosition);
const int *rle_len = (const int*)pCompressor->intervals /* reuse */;
apultra_visited* visited = ((apultra_visited*)pCompressor->pos_data) - nStartOffset /* reuse */;
int j;
for (j = 0; j < nArrivalsPerPosition; j++) {
if (!(slots_mask & (1 << j))) {
if (arrival[j].follows_literal) {
const int nRepOffset = arrival[j].rep_offset;
if (nMatchOffset != nRepOffset) {
const int nRepPos = arrival[j].rep_pos;
if (nRepPos >= nStartOffset &&
(nRepPos + 1) < nEndOffset &&
visited[nRepPos] != nMatchOffset) {
visited[nRepPos] = nMatchOffset;
apultra_match* fwd_match = pCompressor->match + ((nRepPos - nStartOffset) << MATCHES_PER_INDEX_SHIFT);
if (fwd_match[NMATCHES_PER_INDEX - 1].length == 0) {
if (nRepPos >= nMatchOffset) {
const unsigned char* pInWindowStart = pInWindow + nRepPos;
if (!memcmp(pInWindowStart, pInWindowStart - nMatchOffset, 2)) {
if (nRepOffset) {
const int nLen0 = rle_len[nRepPos - nMatchOffset];
const int nLen1 = rle_len[nRepPos];
const int nMinLen = (nLen0 < nLen1) ? nLen0 : nLen1;
unsigned int nMaxRepLen = nEndOffset - nRepPos;
if (nMaxRepLen > LCP_MAX)
nMaxRepLen = LCP_MAX;
const unsigned char* pInWindowMax = pInWindowStart + nMaxRepLen;
const unsigned char* pInWindowAtRepOffset = pInWindowStart + nMinLen;
if (pInWindowAtRepOffset > pInWindowMax)
pInWindowAtRepOffset = pInWindowMax;
while ((pInWindowAtRepOffset + 8) < pInWindowMax && !memcmp(pInWindowAtRepOffset, pInWindowAtRepOffset - nMatchOffset, 8))
pInWindowAtRepOffset += 8;
while ((pInWindowAtRepOffset + 4) < pInWindowMax && !memcmp(pInWindowAtRepOffset, pInWindowAtRepOffset - nMatchOffset, 4))
pInWindowAtRepOffset += 4;
while (pInWindowAtRepOffset < pInWindowMax && pInWindowAtRepOffset[0] == pInWindowAtRepOffset[-nMatchOffset])
pInWindowAtRepOffset++;
const unsigned int nCurRepLen = (const unsigned int)(pInWindowAtRepOffset - pInWindowStart);
unsigned short* fwd_depth = pCompressor->match_depth + ((nRepPos - nStartOffset) << MATCHES_PER_INDEX_SHIFT);
int r;
for (r = 0; fwd_match[r].length; r++) {
if (fwd_match[r].offset == nMatchOffset && (fwd_depth[r] & 0x3fff) == 0) {
if (fwd_match[r].length < nCurRepLen) {
fwd_match[r].length = nCurRepLen;
fwd_depth[r] = 0;
}
break;
}
}
if (fwd_match[r].length == 0) {
fwd_match[r].length = nCurRepLen;
fwd_match[r].offset = nMatchOffset;
fwd_depth[r] = 0;
if (nDepth < 9)
apultra_insert_forward_match(pCompressor, pInWindow, nRepPos, nMatchOffset, nStartOffset, nEndOffset, nArrivalsPerPosition, nDepth + 1);
}
}
}
}
}
}
}
}
}
slots_mask >>= 1;
}
}```
Getting this warning when compiling expand.c on a Debian 9 server with gcc 6.3.0
@ulukai:~/proj/apultra$ make
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/apultra.c -o obj/src/apultra.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/expand.c -o obj/src/expand.o
src/../src/expand.c:103:19: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:118:22: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:148:25: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:173:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:177:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:181:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:185:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:228:19: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:242:22: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:309:25: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:355:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:359:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:363:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
src/../src/expand.c:367:28: warning: comparison of unsigned expression < 0 is
always false [-Wtautological-compare]
if (nResult < 0) return -1;
~~~~~~~ ^ ~
14 warnings generated.
gcc version:
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Like I did with your LZSA2 depacker before, I adapted your "size-optimized aPLib decompressor for 8088" for use in my 86-DOS kernel/executable format. It is available in my hgweb repo.
The changes that I had to do:
Salut Emmanuel,
I tried to use the 6502 depackers so I think I don't use it well.
I need the depacked data to be at a given address ($1000) so I have setup up:
apl_srcptr: end address of the packed data
apl_dstptr: start address of the given location I need the depacked data
the packed code is at $3000 and the destination address is $1000. Here is how I call the routine:
lda #$00
sta $fc
lda #$30
sta $fd
lda #$00
sta $fe
lda #$10
sta $ff
jsr apl_decompress
But after decompressing, the result at $1000 doesn't look like the original data. What's my mistake?
Thanks!
Bonjour Emmanuel,
I just have been using LZSA previously and one feature that I love is that you can get the "Safe Distance" for decrunching data over itself. For me, this information is great for making a better use of the limited ram in my games for old 8 bit computers.
Would it be possible to add this feature to apultra? Because the compression ratio of apultra is even better, it would be an improvement for my projects in those limited machines.
Merci! :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.