emmanuel-marty / apultra Goto Github PK

View Code? Open in Web Editor NEW

96.0 11.0 14.0 387 KB

Free open-source compressor for apLib with 5-7% better ratios

License: Other

Makefile 0.19% C 68.40% CMake 7.80% Assembly 23.61%

compression aplib c 8-bit retrocomputing aplib-format compression-algorithm 6502 z80 arm

apultra's People

Contributors

Stargazers

Watchers

Forkers

gavz dwedit odzhan b0nefish dougmasten clayne ruguevara snaphat gmh5225 octoate stahta01

apultra's Issues

NUM_ELEMENTS optimization for apultra_optimize_forward

my optimization

int NUM_ELEMENTS = 0;

// Get a pointer to an array
apultra_arrival** pDestSlotsPtr = &pDestSlots;

// Size of the entire pointer array
size_t arraySize = sizeof(*pDestSlotsPtr);

// Size of one pointer
size_t elemSize = sizeof(*pDestSlotsPtr[0]);

// Calculate the number of elements
NUM_ELEMENTS = arraySize / elemSize;

int left = 0;
int right = NUM_ELEMENTS - 1; // NUM_ELEMENTS - size of the pDestSlots array

while (left <= right) {
    int mid = left + (right - left) / 2;

    if (pDestSlots[mid].cost < nCodingChoiceCost) {
        left = mid + 1;
    }
    else {
        right = mid - 1;
    }
}

int n = left; // found index

after this code

   if (nCodingChoiceCost < pDestSlots[nArrivalsPerPosition - 1].cost ||
      (nCodingChoiceCost == pDestSlots[nArrivalsPerPosition - 1].cost && nScore < pDestSlots[nArrivalsPerPosition - 1].score &&
         nRepOffset != pDestSlots[nArrivalsPerPosition - 1].rep_offset)) {
      int exists = 0;

Game boy decompressor

I adapted i8080 decompressor for GBZ80: https://github.com/untoxa/UnaPACK.GBZ80 you may add a link to it, if you wish. thank you.

aPLib format changes for improving decompression on 8-bit micros.

Hi Emmanuel, thank you for this project, it is so wonderful to see a pure-C packer for the aPLib format the can be included in other projects (particularly development for old comupters/consoles)!

You may wish to check out my aplpak repository which includes a 6502 aPLib decompressor that executes approximately 2.5x the speed of Peter Ferrie's 65C02 code that is included in Jorgen Ibsen's aPLib distriution (at the cost of 27 bytes - 256 vs 229).

My aPLib code is decompressing at approx 75 cycles-per-byte, which compares favorably to Peter Ferrie's 6502 LZSA2 decompressor, which is decompressing at 77 cycles-per-byte with the same test file (gpl-3.0.txt).

Are you open to making some (optional) alterations to the aPLib format to improve the decompression speed a little bit more?

One obvious change (IMHO) would be to nibble-pack the 4-bit offset in the copy-one-byte-from-within-15-bytes "111 nnnn" encoding.

Another change would be to use seperate bit-buffers for the main 2-or-3-bit compression-type code, and the Elias gamma-codes, because the gamma-codes are always read 2-bits-per-loop, and so the routine that reads them would only need to do a single check for reading a new byte if the 2-bits were always 2-bit-aligned in the bit-buffer.

Does that explanation make sense?

Are you open to tweaking the format a little?

strange bug in unaplib_68000.S

Hi!
thanks for making apultra, pretty easy to integrate. I noticed a strange behavior with unaplib_68000.S and some packed files. At first I though it was a packer bug but it seems to be a depacker bug. ( the specific file is depacking properly using another 68k aplib decoder ). I didn't investigated the issue but I attached problematic file if you're interested. ( just pack it, and it doesn't depack properly using unaplib_68000.S: The first byte is literal (normal) and the second token try to depack a string with "0" offset (producing bad output data).
cheers
Arnaud
sprite.zip

Help with setting up compression level in apultra

Hi,

I am need to set up the compression level in apultra, but I am having some trouble. I am using the source code, and I not see feature for this.

Can you please help me?

Thanks

topic addition for better community reach

Hey,
for better reach of the community, I would like to suggest to add the following topic to the repository: motorola-68000

How to do this

Proposal to optimize performance with cycle inversion

Apultra is an excellent compression algorithm with great compression ratios. As an optimization enthusiast, I would like to propose a way to significantly improve Apultra's performance while maintaining the integrity of compression.

The idea is to invert certain cycles to reduce processor cache misses. For example, the following cycle can be inverted:

for (int n = 0; n < numElements; n++) {
  if (condition) {
    // ...
  }
}

Into:

for (int n = numElements - 1; n >= 0; n--) {
  if (condition) {
    // ... 
  }
}

Benefits of this approach:

Improves memory locality and cache utilization by accessing memory sequentially
Reduces branch mispredictions
Enables better instruction pipelining

With cycle inversion, we can optimize hot loops that dominate the compression ratios like the ones in apultra_optimize_forward, hash search cycles, and other auxiliary algorithms.

My benchmarks demonstrate [2-4x] performance improvements with these optimizations while maintaining bit-exact output on multiple datasets.

It would be great if you can assess the viability of including cycle inversion in Apultra's core optimization pipeline. Please let me know if any other details are required. These micro-optimizations can significantly improve the real-world speed of compression without affecting accuracy.

I optimize apultra_optimize_forward and receive good results.

Converting 8088 Depackers to x86

Hi,

Having problems converting the small 8088 depacker to x86. For an EXE file of 814232 bytes, it fails to decode file after processing 811240 bytes. It works fine with smaller files, so have you any suggestions on how to fix it?

aplib_x86_small.asm.zip

apultra_insert_forward_match optimize

I think it better code 👍

static void apultra_insert_forward_match(apultra_compressor *pCompressor, const unsigned char *pInWindow, const int i, const int nMatchOffset, const int nStartOffset, const int nEndOffset, const int nArrivalsPerPosition, const int nDepth) {

    unsigned char slots_mask = 0xFF;

   const apultra_arrival *arrival = pCompressor->arrival + ((i - nStartOffset) * nArrivalsPerPosition);
   const int *rle_len = (const int*)pCompressor->intervals /* reuse */;
   apultra_visited* visited = ((apultra_visited*)pCompressor->pos_data) - nStartOffset /* reuse */;
   int j;

   for (j = 0; j < nArrivalsPerPosition; j++) {
       if (!(slots_mask & (1 << j))) {
           if (arrival[j].follows_literal) {
               const int nRepOffset = arrival[j].rep_offset;

               if (nMatchOffset != nRepOffset) {
                   const int nRepPos = arrival[j].rep_pos;

                   if (nRepPos >= nStartOffset &&
                       (nRepPos + 1) < nEndOffset &&
                       visited[nRepPos] != nMatchOffset) {

                       visited[nRepPos] = nMatchOffset;

                       apultra_match* fwd_match = pCompressor->match + ((nRepPos - nStartOffset) << MATCHES_PER_INDEX_SHIFT);

                       if (fwd_match[NMATCHES_PER_INDEX - 1].length == 0) {
                           if (nRepPos >= nMatchOffset) {
                               const unsigned char* pInWindowStart = pInWindow + nRepPos;

                               if (!memcmp(pInWindowStart, pInWindowStart - nMatchOffset, 2)) {
                                   if (nRepOffset) {
                                       const int nLen0 = rle_len[nRepPos - nMatchOffset];
                                       const int nLen1 = rle_len[nRepPos];
                                       const int nMinLen = (nLen0 < nLen1) ? nLen0 : nLen1;

                                       unsigned int nMaxRepLen = nEndOffset - nRepPos;
                                       if (nMaxRepLen > LCP_MAX)
                                           nMaxRepLen = LCP_MAX;

                                       const unsigned char* pInWindowMax = pInWindowStart + nMaxRepLen;
                                       const unsigned char* pInWindowAtRepOffset = pInWindowStart + nMinLen;

                                       if (pInWindowAtRepOffset > pInWindowMax)
                                           pInWindowAtRepOffset = pInWindowMax;

                                       while ((pInWindowAtRepOffset + 8) < pInWindowMax && !memcmp(pInWindowAtRepOffset, pInWindowAtRepOffset - nMatchOffset, 8))
                                           pInWindowAtRepOffset += 8;
                                       while ((pInWindowAtRepOffset + 4) < pInWindowMax && !memcmp(pInWindowAtRepOffset, pInWindowAtRepOffset - nMatchOffset, 4))
                                           pInWindowAtRepOffset += 4;
                                       while (pInWindowAtRepOffset < pInWindowMax && pInWindowAtRepOffset[0] == pInWindowAtRepOffset[-nMatchOffset])
                                           pInWindowAtRepOffset++;

                                       const unsigned int nCurRepLen = (const unsigned int)(pInWindowAtRepOffset - pInWindowStart);

                                       unsigned short* fwd_depth = pCompressor->match_depth + ((nRepPos - nStartOffset) << MATCHES_PER_INDEX_SHIFT);
                                       int r;

                                       for (r = 0; fwd_match[r].length; r++) {
                                           if (fwd_match[r].offset == nMatchOffset && (fwd_depth[r] & 0x3fff) == 0) {
                                               if (fwd_match[r].length < nCurRepLen) {
                                                   fwd_match[r].length = nCurRepLen;
                                                   fwd_depth[r] = 0;
                                               }
                                               break;
                                           }
                                       }

                                       if (fwd_match[r].length == 0) {
                                           fwd_match[r].length = nCurRepLen;
                                           fwd_match[r].offset = nMatchOffset;
                                           fwd_depth[r] = 0;

                                           if (nDepth < 9)
                                               apultra_insert_forward_match(pCompressor, pInWindow, nRepPos, nMatchOffset, nStartOffset, nEndOffset, nArrivalsPerPosition, nDepth + 1);
                                       }
                                   }
                               }
                           }
                       }
                   }
               }
           }
       }
       slots_mask >>= 1;
   }
}```

gcc 6.3.0 gives warning: comparison of unsigned expression < 0 is always false

Getting this warning when compiling expand.c on a Debian 9 server with gcc 6.3.0

@ulukai:~/proj/apultra$ make
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/apultra.c -o obj/src/apultra.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/expand.c -o obj/src/expand.o
src/../src/expand.c:103:19: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
      if (nResult < 0) return -1;
          ~~~~~~~ ^ ~
src/../src/expand.c:118:22: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
         if (nResult < 0) return -1;
             ~~~~~~~ ^ ~
src/../src/expand.c:148:25: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
            if (nResult < 0) return -1;
                ~~~~~~~ ^ ~
src/../src/expand.c:173:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:177:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:181:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:185:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:228:19: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
      if (nResult < 0) return -1;
          ~~~~~~~ ^ ~
src/../src/expand.c:242:22: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
         if (nResult < 0) return -1;
             ~~~~~~~ ^ ~
src/../src/expand.c:309:25: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
            if (nResult < 0) return -1;
                ~~~~~~~ ^ ~
src/../src/expand.c:355:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:359:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:363:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
src/../src/expand.c:367:28: warning: comparison of unsigned expression < 0 is
      always false [-Wtautological-compare]
               if (nResult < 0) return -1;
                   ~~~~~~~ ^ ~
14 warnings generated.

gcc version:

gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I adapted your code to an 8086 (segmented memory) depacker

Like I did with your LZSA2 depacker before, I adapted your "size-optimized aPLib decompressor for 8088" for use in my 86-DOS kernel/executable format. It is available in my hgweb repo.

The changes that I had to do:

Use bp for stack frame to access variables throughout
Count down remaining lengths of source and destination (needed to detect success or failure with overlapping buffers, also provides simple error checking)
Use far pointers throughout, which are normalised after every use
Add stack frame variables for the high words of distance and gamma
Allow matches with >= 64 KiB distance and/or >= 64 KiB length
Allow overlapping buffers by checking the pointers after matches and match bytes

Question: how to use the depacker (fw) to depack at a given address?

Salut Emmanuel,

I tried to use the 6502 depackers so I think I don't use it well.
I need the depacked data to be at a given address ($1000) so I have setup up:
apl_srcptr: end address of the packed data
apl_dstptr: start address of the given location I need the depacked data

the packed code is at $3000 and the destination address is $1000. Here is how I call the routine:

          lda #$00                                                                                  
          sta $fc                                                                                   
          lda #$30                                                                                  
          sta $fd                                                                                   
          lda #$00                                                                                  
          sta $fe                                                                                   
          lda #$10                                                                                  
          sta $ff                                                                                   
          jsr apl_decompress

But after decompressing, the result at $1000 doesn't look like the original data. What's my mistake?

Thanks!

Is it possible to print the "Safe Distance" as in LZSA?

Bonjour Emmanuel,

I just have been using LZSA previously and one feature that I love is that you can get the "Safe Distance" for decrunching data over itself. For me, this information is great for making a better use of the limited ram in my games for old 8 bit computers.

Would it be possible to add this feature to apultra? Because the compression ratio of apultra is even better, it would be an improvement for my projects in those limited machines.

Merci! :)