Comments (15)
There is currently no straight-forward method to compress raw binary data without all the stream stuff which I guess does make sense as network stuff is the place where compression is usually used and needed, but yeah, I would wish one for myself, especially to be able to read and write .zip, .docx, ... files for a possible future rich-text editor project of mine (see also @ndesmic's article about that). Such an async function has been suggested in e.g. WICG/compression#8; it might maybe become available in the future? As for ways to implement your own wrapper function, using Responses and .pipeThrough() seems to be inefficient; the spec suggests an example on how to do that in a more performant way using writers and readers.
Regarding the TypeScript definitions, CompressionStreams will be in v5.1 (you can already try it out with its beta version), in the meantime you can use my new-javascript type definitions where they're also included.
from dovetail.
Thanks for letting me know about the performance differences, I had no idea that was the case! I will look into using the ArrayBuffer
loading setup instead of passing it into a Blob
constructor like I'm currently doing.
from dovetail.
Oh, yeah I did see that, super awesome! The only thing is that deflate-raw
isn't supported in Node as of now. And I guess that some browsers an update or few back might not have it quite yet. It probably won't be anything to worry about in 6 months or so though, that'll be great.
I do wish there were async functions to go along with the stream-based parts of the API too. Then it would be simpler to use with Uint8Array
and ArrayBuffer
objects directly, without having to convert them to streams and back every time. It could probably do that under the hood either way, but it does seem a bit cumbersome to need to convert it using either a Blob
constructor, or a Response
object. Any suggestions on how you would do that? I've just been using my own wrapper async functions to do that. It is a minor detail though, so I guess it's not too bad to have each of these in every project that does compression? Still feels a bit awkward for me though.
Sorry for the rant, guess I'm just not sure how I should manage complete feature parity, while also using modern APIs.
After thinking about it, I guess it's no different than having to make your own file picker function wrapper around things like the <input type="file">
element, and the File System Access API when it's present. That also feels like it is in that compatibility gray area, where it has to work either way, but you have to do some work to make it streamlined.
Oh yeah, and that TypeScript doesn't have support for it yet either. Have you heard any news as to when this API will become part of the standard library type definitions?
Sorry this is so long, didn't mean for that. Guess I've been unsure with how to properly set this all up here, at least what the 'best' way to do it should be. I'm curious if there are any performance differences between the two methods I've tried out here?
// My first version
export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";
export interface CompressionOptions {
format?: CompressionFormat;
}
/**
* Transforms a Uint8Array through a specific compression format. If a format is not provided, the gzip format will be used.
*/
export async function compress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
const stream = new CompressionStream(format);
const readable = new Blob([data]).stream().pipeThrough(stream);
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
/**
* Transforms a Uint8Array through a specific decompression format. If a format is not provided, the gzip format will be used.
*/
export async function decompress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
const stream = new DecompressionStream(format);
const readable = new Blob([data]).stream().pipeThrough(stream);
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
// My second version
// I think I changed my mind on 'gzip' being the default compression format.
export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";
export interface CompressionOptions {
format: CompressionFormat;
}
/**
* Compresses a Uint8Array using a specific compression format.
*/
export async function compress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
const readable = body!.pipeThrough(new CompressionStream(format));
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
/**
* Decompresses a Uint8Array using a specific decompression format.
*/
export async function decompress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
const readable = body!.pipeThrough(new DecompressionStream(format));
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
from dovetail.
Yeah, this is my main concern for it. While it is supported in the most recent versions, still not quite everyone has moved up to the latest browser versions yet.
CompressionStream API - Caniuse
from dovetail.
Ok, I decided to make a comparison for these two, and it came up with some interesting results!
It would probably also make sense to add an average for all of them; I only log out each result individually, here.
You can save this locally as Compression-Streams-Speed.mts
, then call npx tsx ./Compression-Streams-Speed.mts
to run it in Node.
// clear; npx tsx ./Compression-Streams-Speed.mts
const DATA = new Uint8Array(Array.from({ length: 0x1000 },() => Math.floor(Math.random() * 10)));
console.log(DATA,"\n");
const BLOB = new Blob([DATA]);
const TEST_REPEATS = 0x1000;
for (let i = 0; i < TEST_REPEATS; i++){
const COMPRESSED_DATA =
await timer(`#${i} ArrayBuffer Compress `,async () => compressArrayBuffer(DATA,"deflate"));
await timer(`#${i} ArrayBuffer Decompress`,async () => decompressArrayBuffer(COMPRESSED_DATA,"deflate"));
const COMPRESSED_BLOB =
await timer(`#${i} Blob Compress `,async () => compressBlob(BLOB,"deflate"));
await timer(`#${i} Blob Decompress `,async () => decompressBlob(COMPRESSED_BLOB,"deflate"));
console.log();
}
async function timer<T>(label: string, callback: () => Promise<T>){
console.time(label);
const result: T = await callback();
console.timeEnd(label);
return result;
}
// 8.2. Deflate-compress an ArrayBuffer to a Uint8Array
async function compressArrayBuffer(input: BufferSource, format: CompressionFormat){
const cs = new CompressionStream(format);
const writer = cs.writable.getWriter();
writer.write(input);
writer.close();
const output: Uint8Array[] = [];
const reader = cs.readable.getReader();
let totalSize = 0;
while (true){
const { done, value } = await reader.read();
if (done) break;
output.push(value);
totalSize += value.byteLength;
}
const concatenated = new Uint8Array(totalSize);
let offset = 0;
for (const array of output){
concatenated.set(array,offset);
offset += array.byteLength;
}
return concatenated;
}
// Demo: decompress ArrayBuffer
async function decompressArrayBuffer(input: BufferSource, format: CompressionFormat){
const ds = new DecompressionStream(format);
const writer = ds.writable.getWriter();
writer.write(input);
writer.close();
const output: Uint8Array[] = [];
const reader = ds.readable.getReader();
let totalSize = 0;
while (true){
const { done, value } = await reader.read();
if (done) break;
output.push(value);
totalSize += value.byteLength;
}
const concatenated = new Uint8Array(totalSize);
let offset = 0;
for (const array of output){
concatenated.set(array,offset);
offset += array.byteLength;
}
return concatenated;
}
// 8.3. Gzip-decompress a Blob to Blob
async function decompressBlob(blob: Blob, format: CompressionFormat){
const ds = new DecompressionStream(format);
const decompressionStream = blob.stream().pipeThrough(ds);
return new Response(decompressionStream).blob();
}
// Demo: compress Blob
async function compressBlob(blob: Blob, format: CompressionFormat){
const cs = new CompressionStream(format);
const compressionStream = blob.stream().pipeThrough(cs);
return new Response(compressionStream).blob();
}
declare global {
type CompressionFormat = "deflate" | "deflate-raw" | "gzip";
class CompressionStream extends TransformStream<BufferSource,Uint8Array> {
constructor(format: CompressionFormat);
}
class DecompressionStream extends TransformStream<BufferSource,Uint8Array> {
constructor(format: CompressionFormat);
}
}
export {};
from dovetail.
I recently discovered that the chunking function implementation doesn't appear to work with the deflate-raw
format. I wrote some demo code to demonstrate this, it's originally from the commit I mentioned just above this comment, 'Auto Deflate-Raw Support'.
I have a lot more about this find in the original commit message. Two of the main notable parts about this find are:
-
This same code works if you change out the implementation of the
pipeThroughCompressionStream()
function to that which uses the originalnew Reponse()
-based implementation -
This error happens both in Node, and in Chrome itself, so it's doesn't have to do with the
deflate-raw
polyfill I set up for Node usingnode:zlib
. They both return a similarly-worded issue, related to the zlib header of the file.
(I'm going to re-open this issue for tracking this a bit more, while it isn't quite related to the original topic of the issue)
// clear; npx tsx ./Auto-Unzip.mts
const data = Uint8Array.from({ length: 25 },() => 25);
console.log(data);
const compressed = await compress(data,"deflate-raw");
console.log(compressed);
const unzipped = await unzip(compressed);
console.log(unzipped);
async function unzip(data: Uint8Array): Promise<Uint8Array> {
try {
return await decompress(data,"gzip");
} catch {
try {
return await decompress(data,"deflate");
} catch {
try {
return await decompress(data,"deflate-raw");
} catch {
throw new Error("Could not unzip the buffer data");
}
}
}
}
async function compress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> {
try {
const compressionStream = new CompressionStream(format);
return pipeThroughCompressionStream(data,compressionStream);
} catch (error){
if (format !== "deflate-raw") throw error;
// @ts-expect-error
const { deflateRawSync } = await import("node:zlib");
return new Uint8Array(deflateRawSync(data));
}
}
async function decompress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> {
try {
const decompressionStream = new DecompressionStream(format);
return pipeThroughCompressionStream(data,decompressionStream);
} catch (error){
if (format !== "deflate-raw") throw error;
// @ts-expect-error
const { inflateRawSync } = await import("node:zlib");
return new Uint8Array(inflateRawSync(data));
}
}
// // This original implementation does not cause the error
// async function pipeThroughCompressionStream(data: Uint8Array, stream: CompressionStream | DecompressionStream): Promise<Uint8Array> {
// const { body } = new Response(data);
// const readable = body!.pipeThrough(stream);
// const buffer = await new Response(readable).arrayBuffer();
// return new Uint8Array(buffer);
// }
async function pipeThroughCompressionStream(data: Uint8Array, { readable, writable }: CompressionStream | DecompressionStream): Promise<Uint8Array> {
const writer = writable.getWriter();
writer.write(data);
writer.close();
const chunks: Uint8Array[] = [];
let byteLength = 0;
const generator = (Symbol.asyncIterator in readable) ? readable : readableStreamToAsyncGenerator(readable as ReadableStream<Uint8Array>);
for await (const chunk of generator){
chunks.push(chunk);
byteLength += chunk.byteLength;
}
const result = new Uint8Array(byteLength);
let byteOffset = 0;
for (const chunk of chunks){
result.set(chunk,byteOffset);
byteOffset += chunk.byteLength;
}
return result;
}
async function* readableStreamToAsyncGenerator<T>(readable: ReadableStream<T>): AsyncGenerator<T,void,void> {
const reader = readable.getReader();
try {
while (true){
const { done, value } = await reader.read();
if (done) return;
yield value;
}
} finally {
reader.releaseLock();
}
}
declare global {
interface CompressionStream {
readonly readable: ReadableStream<Uint8Array>;
readonly writable: WritableStream<BufferSource>;
}
interface DecompressionStream {
readonly readable: ReadableStream<Uint8Array>;
readonly writable: WritableStream<BufferSource>;
}
interface ReadableStream<R> {
[Symbol.asyncIterator](): AsyncGenerator<R>;
}
}
from dovetail.
Welp, I think I may have found it! This was a sneaky one for sure.
Turns out calling writer()
and close()
on a WritableStreamDefaultWriter
actually each return Promise
-es, so those have to be await
-ed! It fixed the issue right away.
from dovetail.
Hmm, maybe it's not fixed so soon. Having both of the await
calls present caused the decompression to stop working in the browser, while it had fixed it in Node. Just discovered this while trying to implement NBTify 1.60.0
into Dovetail, which would cause all compressed files to fail silently in the opening process. I ended up discovering that I think it's because the Promises for one of these doesn't resolve in the browser for some reason, while it does for Node.
from dovetail.
I'm curious if this has to do between the possible difference of underlying Stream implementations between Node and the browser?
This caught my eye over on the WritableStreamDefaultWriter
docs.
Note that what "success" means is up to the underlying sink; it might indicate that the chunk has been accepted, and not necessarily that it is safely saved to its ultimate destination.
Does this also mean that it's possible that it won't always resolve either?
from dovetail.
Ok, this is getting more interesting. It's looking like those might not have been the issue all along? I think it is starting to shape out to looking like it's instead the changes that I made, before I brought back their original implementations in the 'Compression Headers Revert' commit for NBTify.
So adding await
to the WritableStreamDefaultWriter
method calls introduced this new blocking Promise bug, and the original 'incorrect header check' issue might have actually been because of my, then 'updated', try-catch handling for the various compression formats, rather than the pre-read header check implementations (what going back to will fix).
This seems to be what my tests are showing, proven by if I only update the Read module (that's where the auto-detect compression logic is), the error comes back, and this blocking behavior isn't present in either platform if I remove the await
calls from the current release.
This message is starting to feel less clear the more I write it, I'm just going to keep testing, then come back lol.
from dovetail.
Ok, looks like this is the offending code:
export async function read(data,{ name, endian, compression, bedrockLevel, strict } = {}){
// Snippet of the function body; incomplete
if (compression === undefined){
try {
return await read(data,{ name, endian, compression: null, bedrockLevel, strict });
} catch (error){
try {
return await read(data,{ name, endian, compression: "gzip", bedrockLevel, strict });
} catch {
try {
return await read(data,{ name, endian, compression: "deflate", bedrockLevel, strict });
} catch {
try {
return await read(data,{ name, endian, compression: "deflate-raw", bedrockLevel, strict });
} catch {
throw error;
}
}
}
}
}
}
from dovetail.
Yep, that's it! Yay! Glad it's actually NBTify, and not the chunking code, that makes a lot more sense haha.
Here's the tests for comparison; both are with the await
method calls removed:
Test using offending code
Test without offending code
Notably in this case here, leaving the offending code, and running in addition to the await
calls 'fix' allows the tests to all decompress correctly in Node, while it causes the unresponsive Promises issue in the browser (Hence, I think this is the order it went in to slip past me, too).
from dovetail.
It would've been ideal to track this in NBTify instead I guess, but that's ok. I thought it had to do with the chunking code originally.
from dovetail.
Ok, turns out it does have to do with the writer.write()
and writer.close()
calls. See this commit message for more intel on this.
from dovetail.
For a better reference on the error handling for NBTify here, see the demo code in this commit here, referenced from the same commit as above.
from dovetail.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dovetail.