Giter Site home page Giter Site logo

Comments (15)

BenjaminAster avatar BenjaminAster commented on May 27, 2024 1

There is currently no straight-forward method to compress raw binary data without all the stream stuff which I guess does make sense as network stuff is the place where compression is usually used and needed, but yeah, I would wish one for myself, especially to be able to read and write .zip, .docx, ... files for a possible future rich-text editor project of mine (see also @ndesmic's article about that). Such an async function has been suggested in e.g. WICG/compression#8; it might maybe become available in the future? As for ways to implement your own wrapper function, using Responses and .pipeThrough() seems to be inefficient; the spec suggests an example on how to do that in a more performant way using writers and readers.

Regarding the TypeScript definitions, CompressionStreams will be in v5.1 (you can already try it out with its beta version), in the meantime you can use my new-javascript type definitions where they're also included.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024 1

Thanks for letting me know about the performance differences, I had no idea that was the case! I will look into using the ArrayBuffer loading setup instead of passing it into a Blob constructor like I'm currently doing.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Oh, yeah I did see that, super awesome! The only thing is that deflate-raw isn't supported in Node as of now. And I guess that some browsers an update or few back might not have it quite yet. It probably won't be anything to worry about in 6 months or so though, that'll be great.

I do wish there were async functions to go along with the stream-based parts of the API too. Then it would be simpler to use with Uint8Array and ArrayBuffer objects directly, without having to convert them to streams and back every time. It could probably do that under the hood either way, but it does seem a bit cumbersome to need to convert it using either a Blob constructor, or a Response object. Any suggestions on how you would do that? I've just been using my own wrapper async functions to do that. It is a minor detail though, so I guess it's not too bad to have each of these in every project that does compression? Still feels a bit awkward for me though.

Sorry for the rant, guess I'm just not sure how I should manage complete feature parity, while also using modern APIs.

After thinking about it, I guess it's no different than having to make your own file picker function wrapper around things like the <input type="file"> element, and the File System Access API when it's present. That also feels like it is in that compatibility gray area, where it has to work either way, but you have to do some work to make it streamlined.

Oh yeah, and that TypeScript doesn't have support for it yet either. Have you heard any news as to when this API will become part of the standard library type definitions?

Sorry this is so long, didn't mean for that. Guess I've been unsure with how to properly set this all up here, at least what the 'best' way to do it should be. I'm curious if there are any performance differences between the two methods I've tried out here?

// My first version

export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";

export interface CompressionOptions {
  format?: CompressionFormat;
}

/**
 * Transforms a Uint8Array through a specific compression format. If a format is not provided, the gzip format will be used.
*/
export async function compress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
  const stream = new CompressionStream(format);
  const readable = new Blob([data]).stream().pipeThrough(stream);
  const buffer = await new Response(readable).arrayBuffer();
  return new Uint8Array(buffer);
}

/**
 * Transforms a Uint8Array through a specific decompression format. If a format is not provided, the gzip format will be used.
*/
export async function decompress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
  const stream = new DecompressionStream(format);
  const readable = new Blob([data]).stream().pipeThrough(stream);
  const buffer = await new Response(readable).arrayBuffer();
  return new Uint8Array(buffer);
}
// My second version
// I think I changed my mind on 'gzip' being the default compression format.

export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";

export interface CompressionOptions {
  format: CompressionFormat;
}

/**
 * Compresses a Uint8Array using a specific compression format.
*/
export async function compress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
  const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
  const readable = body!.pipeThrough(new CompressionStream(format));
  const buffer = await new Response(readable).arrayBuffer();
  return new Uint8Array(buffer);
}

/**
 * Decompresses a Uint8Array using a specific decompression format.
*/
export async function decompress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
  const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
  const readable = body!.pipeThrough(new DecompressionStream(format));
  const buffer = await new Response(readable).arrayBuffer();
  return new Uint8Array(buffer);
}

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Yeah, this is my main concern for it. While it is supported in the most recent versions, still not quite everyone has moved up to the latest browser versions yet.

CompressionStream API - Caniuse

image

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Ok, I decided to make a comparison for these two, and it came up with some interesting results!
It would probably also make sense to add an average for all of them; I only log out each result individually, here.

You can save this locally as Compression-Streams-Speed.mts, then call npx tsx ./Compression-Streams-Speed.mts to run it in Node.

// clear; npx tsx ./Compression-Streams-Speed.mts

const DATA = new Uint8Array(Array.from({ length: 0x1000 },() => Math.floor(Math.random() * 10)));
console.log(DATA,"\n");

const BLOB = new Blob([DATA]);

const TEST_REPEATS = 0x1000;

for (let i = 0; i < TEST_REPEATS; i++){
  const COMPRESSED_DATA =
  await timer(`#${i} ArrayBuffer Compress  `,async () => compressArrayBuffer(DATA,"deflate"));
  await timer(`#${i} ArrayBuffer Decompress`,async () => decompressArrayBuffer(COMPRESSED_DATA,"deflate"));

  const COMPRESSED_BLOB =
  await timer(`#${i} Blob Compress         `,async () => compressBlob(BLOB,"deflate"));
  await timer(`#${i} Blob Decompress       `,async () => decompressBlob(COMPRESSED_BLOB,"deflate"));

  console.log();
}

async function timer<T>(label: string, callback: () => Promise<T>){
  console.time(label);
  const result: T = await callback();
  console.timeEnd(label);
  return result;
}

// 8.2. Deflate-compress an ArrayBuffer to a Uint8Array

async function compressArrayBuffer(input: BufferSource, format: CompressionFormat){
  const cs = new CompressionStream(format);
  const writer = cs.writable.getWriter();
  writer.write(input);
  writer.close();
  const output: Uint8Array[] = [];
  const reader = cs.readable.getReader();
  let totalSize = 0;
  while (true){
    const { done, value } = await reader.read();
    if (done) break;
    output.push(value);
    totalSize += value.byteLength;
  }
  const concatenated = new Uint8Array(totalSize);
  let offset = 0;
  for (const array of output){
    concatenated.set(array,offset);
    offset += array.byteLength;
  }
  return concatenated;
}

// Demo: decompress ArrayBuffer

async function decompressArrayBuffer(input: BufferSource, format: CompressionFormat){
  const ds = new DecompressionStream(format);
  const writer = ds.writable.getWriter();
  writer.write(input);
  writer.close();
  const output: Uint8Array[] = [];
  const reader = ds.readable.getReader();
  let totalSize = 0;
  while (true){
    const { done, value } = await reader.read();
    if (done) break;
    output.push(value);
    totalSize += value.byteLength;
  }
  const concatenated = new Uint8Array(totalSize);
  let offset = 0;
  for (const array of output){
    concatenated.set(array,offset);
    offset += array.byteLength;
  }
  return concatenated;
}

// 8.3. Gzip-decompress a Blob to Blob

async function decompressBlob(blob: Blob, format: CompressionFormat){
  const ds = new DecompressionStream(format);
  const decompressionStream = blob.stream().pipeThrough(ds);
  return new Response(decompressionStream).blob();
}

// Demo: compress Blob

async function compressBlob(blob: Blob, format: CompressionFormat){
  const cs = new CompressionStream(format);
  const compressionStream = blob.stream().pipeThrough(cs);
  return new Response(compressionStream).blob();
}

declare global {
  type CompressionFormat = "deflate" | "deflate-raw" | "gzip";
  class CompressionStream extends TransformStream<BufferSource,Uint8Array> {
    constructor(format: CompressionFormat);
  }
  class DecompressionStream extends TransformStream<BufferSource,Uint8Array> {
    constructor(format: CompressionFormat);
  }
}

export {};

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

I recently discovered that the chunking function implementation doesn't appear to work with the deflate-raw format. I wrote some demo code to demonstrate this, it's originally from the commit I mentioned just above this comment, 'Auto Deflate-Raw Support'.

I have a lot more about this find in the original commit message. Two of the main notable parts about this find are:

  • This same code works if you change out the implementation of the pipeThroughCompressionStream() function to that which uses the original new Reponse()-based implementation

  • This error happens both in Node, and in Chrome itself, so it's doesn't have to do with the deflate-raw polyfill I set up for Node using node:zlib. They both return a similarly-worded issue, related to the zlib header of the file.

    zlib Error in Node zlib Error in Chrome

(I'm going to re-open this issue for tracking this a bit more, while it isn't quite related to the original topic of the issue)

// clear; npx tsx ./Auto-Unzip.mts

const data = Uint8Array.from({ length: 25 },() => 25);
console.log(data);

const compressed = await compress(data,"deflate-raw");
console.log(compressed);

const unzipped = await unzip(compressed);
console.log(unzipped);

async function unzip(data: Uint8Array): Promise<Uint8Array> {
  try {
    return await decompress(data,"gzip");
  } catch {
    try {
      return await decompress(data,"deflate");
    } catch {
      try {
        return await decompress(data,"deflate-raw");
      } catch {
        throw new Error("Could not unzip the buffer data");
      }
    }
  }
}

async function compress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> {
  try {
    const compressionStream = new CompressionStream(format);
    return pipeThroughCompressionStream(data,compressionStream);
  } catch (error){
    if (format !== "deflate-raw") throw error;
    // @ts-expect-error
    const { deflateRawSync } = await import("node:zlib");
    return new Uint8Array(deflateRawSync(data));
  }
}

async function decompress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> {
  try {
    const decompressionStream = new DecompressionStream(format);
    return pipeThroughCompressionStream(data,decompressionStream);
  } catch (error){
    if (format !== "deflate-raw") throw error;
    // @ts-expect-error
    const { inflateRawSync } = await import("node:zlib");
    return new Uint8Array(inflateRawSync(data));
  }
}

// // This original implementation does not cause the error
// async function pipeThroughCompressionStream(data: Uint8Array, stream: CompressionStream | DecompressionStream): Promise<Uint8Array> {
//   const { body } = new Response(data);
//   const readable = body!.pipeThrough(stream);
//   const buffer = await new Response(readable).arrayBuffer();
//   return new Uint8Array(buffer);
// }

async function pipeThroughCompressionStream(data: Uint8Array, { readable, writable }: CompressionStream | DecompressionStream): Promise<Uint8Array> {
  const writer = writable.getWriter();

  writer.write(data);
  writer.close();

  const chunks: Uint8Array[] = [];
  let byteLength = 0;

  const generator = (Symbol.asyncIterator in readable) ? readable : readableStreamToAsyncGenerator(readable as ReadableStream<Uint8Array>);

  for await (const chunk of generator){
    chunks.push(chunk);
    byteLength += chunk.byteLength;
  }

  const result = new Uint8Array(byteLength);
  let byteOffset = 0;

  for (const chunk of chunks){
    result.set(chunk,byteOffset);
    byteOffset += chunk.byteLength;
  }

  return result;
}

async function* readableStreamToAsyncGenerator<T>(readable: ReadableStream<T>): AsyncGenerator<T,void,void> {
  const reader = readable.getReader();
  try {
    while (true){
      const { done, value } = await reader.read();
      if (done) return;
      yield value;
    }
  } finally {
    reader.releaseLock();
  }
}

declare global {
  interface CompressionStream {
    readonly readable: ReadableStream<Uint8Array>;
    readonly writable: WritableStream<BufferSource>;
  }

  interface DecompressionStream {
    readonly readable: ReadableStream<Uint8Array>;
    readonly writable: WritableStream<BufferSource>;
  }

  interface ReadableStream<R> {
    [Symbol.asyncIterator](): AsyncGenerator<R>;
  }
}

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Welp, I think I may have found it! This was a sneaky one for sure.

Turns out calling writer() and close() on a WritableStreamDefaultWriter actually each return Promise-es, so those have to be await-ed! It fixed the issue right away.

WritableStreamDefaultWriter Method Return Types

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Hmm, maybe it's not fixed so soon. Having both of the await calls present caused the decompression to stop working in the browser, while it had fixed it in Node. Just discovered this while trying to implement NBTify 1.60.0 into Dovetail, which would cause all compressed files to fail silently in the opening process. I ended up discovering that I think it's because the Promises for one of these doesn't resolve in the browser for some reason, while it does for Node.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

I'm curious if this has to do between the possible difference of underlying Stream implementations between Node and the browser?

This caught my eye over on the WritableStreamDefaultWriter docs.

Note that what "success" means is up to the underlying sink; it might indicate that the chunk has been accepted, and not necessarily that it is safely saved to its ultimate destination.

Does this also mean that it's possible that it won't always resolve either?

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Ok, this is getting more interesting. It's looking like those might not have been the issue all along? I think it is starting to shape out to looking like it's instead the changes that I made, before I brought back their original implementations in the 'Compression Headers Revert' commit for NBTify.

So adding await to the WritableStreamDefaultWriter method calls introduced this new blocking Promise bug, and the original 'incorrect header check' issue might have actually been because of my, then 'updated', try-catch handling for the various compression formats, rather than the pre-read header check implementations (what going back to will fix).

This seems to be what my tests are showing, proven by if I only update the Read module (that's where the auto-detect compression logic is), the error comes back, and this blocking behavior isn't present in either platform if I remove the await calls from the current release.

This message is starting to feel less clear the more I write it, I'm just going to keep testing, then come back lol.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Ok, looks like this is the offending code:

export async function read(data,{ name, endian, compression, bedrockLevel, strict } = {}){

  // Snippet of the function body; incomplete

  if (compression === undefined){
    try {
      return await read(data,{ name, endian, compression: null, bedrockLevel, strict });
    } catch (error){
      try {
        return await read(data,{ name, endian, compression: "gzip", bedrockLevel, strict });
      } catch {
        try {
          return await read(data,{ name, endian, compression: "deflate", bedrockLevel, strict });
        } catch {
          try {
            return await read(data,{ name, endian, compression: "deflate-raw", bedrockLevel, strict });
          } catch {
            throw error;
          }
        }
      }
    }
  }
}

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Yep, that's it! Yay! Glad it's actually NBTify, and not the chunking code, that makes a lot more sense haha.

Here's the tests for comparison; both are with the await method calls removed:

Test using offending code

Test using offending code

Test without offending code

Test without offending code

Notably in this case here, leaving the offending code, and running in addition to the await calls 'fix' allows the tests to all decompress correctly in Node, while it causes the unresponsive Promises issue in the browser (Hence, I think this is the order it went in to slip past me, too).

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

It would've been ideal to track this in NBTify instead I guess, but that's ok. I thought it had to do with the chunking code originally.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

Ok, turns out it does have to do with the writer.write() and writer.close() calls. See this commit message for more intel on this.

from dovetail.

Offroaders123 avatar Offroaders123 commented on May 27, 2024

For a better reference on the error handling for NBTify here, see the demo code in this commit here, referenced from the same commit as above.

Node Promise Results Chrome Promise Results Safari Promise Results Firefox Promise Results

from dovetail.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.