Giter Site home page Giter Site logo

Comments (9)

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

if the client send a Range request header and also Accept request header with the application/gzip mime type, then it is the clients responsibility to take into account whether the sent byte ranges correspond to zipped file or not. This is the intuitive behavior.

From the top of my head, I can tell a couple of cases where this change might break applications that depend on genomics data that needs to handle huge files such as https://github.com/igvteam/igv.js/ that make request to .bgz, fasta or bam files. These files arrive to the client compressed, decompressed by the browser and then the client knows that the bytes correspond to the uncompressed payload. You can't just dump a 5GB file to the client with compression enabled if there happens to be a range request.

The decompression is handled by the browser and the final receiver gets the bytes of the payload that corresponds to the uncompressed form. This has always been like this. For a lot of developers, intuition & usability >>> RFC compliance. Am I misinterpreting something?

PS: Not everyone uses Next.js or related Azure services. And they have to serve GBs of files that can be compressed fairly well on request (1-10MB chunks )such as genomic data and CANNOT compute the total length of the compressed file ahead of time for performance reasons. That means those who do use compression for large files that can benefit from this library will have drop it or filter these files out. How do these half-baked requirements make their way into RFC is really appalling sometimes..

from compression.

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

Is it possible to add a config object such as:

app.use(compression({ enforceRFC7233: true}))

which would default to false by primum non nocere, thus avoiding breaking apps that rely on this behavior and can be turned on by setting it to true which would drop Content-Range header and set status to 200 if a Range request is encountered for response with Transfer-encoding? This would make the life of those that want to be super compliant with the RFC easier by just setting a config prop and prevent those that have to drop using compression because the cost to stay compliant (calculating the gzipped size) for large files is greater than the benefit of being compliant

from compression.

dougwilson avatar dougwilson commented on April 27, 2024

Hi @IbrahimTanyalcin you can make any rule you like to determine if the response should br compressed or not using the filter option. You can find more and an example at https://github.com/expressjs/compression#filter-1

from compression.

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

@dougwilson I understand that I can pass a filter function, but that means dropping support for compression for plain text file types (fasta etc.) that would actually benefit the greatest from such compression. The benefit from compressing a bundle from Next.js of let's say 300kB is nothing compared to the benefit of compressing a 10Mb chunk of a 5Gb genomic sequence file. Am I wrong in my reasoning? It would be so nice if it was possible to devise a solution that wouldn't break apps of people like me and also allow @mscbpi and others to achieve what they want.

from compression.

dougwilson avatar dougwilson commented on April 27, 2024

I'm not dure I understand. If you return true from the filter, then it would compress those files you would like compressed. I'm not sure whay the default behavior is for the file you are referring to is, however. If you think it should be compressed by default without needing to make a filter function, please let us know the details on how to detect that type of file for compression.

from compression.

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

In genomics we deal with large files, let's say mysequence.fasta. This a plain text file that reads like ATTTCCGGTTT... and is around 5Gb:

  • These files have an accompanying files called indexes that show byte ranges on each line. This index file ends with mysequence.fai extension and is generally small (10-100kB)
  • Tools like https://github.com/igvteam/igv.js calculate the portion of the sequence that fits to a user's viewport and make a Range-Request based on the index file, let's say they request a 5MB portion
  • Using your compression library, this 5MB portion is extracted from the static file using ExpressJS and compressed and sent to the browser.
  • The browser decompresses the payload and IGV knows that the requested byte-range corresponds to the uncompressed file not the compressed one. In fact, from a technical standpoint, this is the only cost effective way to make Range requests without costing the server resources (looking from index files rather than telling the server "calculate me the correct Content-Range", which is impractical) and using compression meanwhile.

The RFC requires the server to know the gzipped size beforehand so that the Content-Range can be correctly calculated. It is in truth just a semantic compliance that will require genomic apps that use Express to drop compression because obviously they do not want Express to send 200 and dump the file, nor they want to gzip the entire file and calculate the correct Content-Range header based on that. There are plethora of other apps that rely on the current behavior.

I think a solution that wouldn't break backwards compatibility without resorting to filter (effectively dropping support for that file type) and also allows people like @mscbpi to achieve what they want (because they rely on Next.js and other CDN's behavior) is more reasonable. At the end of the day, it is your call. Thanks for the time.

from compression.

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

Ok here is some more info, in my route file for serving static content, I have something like:

app.use('/static', compression(), _static, function(req, res, next) { //downstream middleware ...

_static is the expressjs static middleware.

I also dug into what are the request headers for such large fasta files and compared them to regular js/html/css files.

Here is the causal js/html/css files:
expressjs-compress-ss2

And here is the request headers I send for a supposedly large genomic file:
expressjs-compress-ss3

So it turns out, the client logic is sending Accept-Encoding: identity header which in the compression library does NOT compress and is passed to static which doesn't know what to do with .fasta file so it adds Content-Type: application/octet-stream.

This means in my case this wouldn't seem to break the behavior with large files, as they are already not compressed. I was wrong. ( There might still be other genomics apps though that do not send the correct encoding request header and expect it work though 🤷)

from compression.

mscbpi avatar mscbpi commented on April 27, 2024

if the client send a Range request header and also Accept request header with the application/gzip mime type, then it is the clients responsibility to take into account whether the sent byte ranges correspond to zipped file or not. This is the intuitive behavior.

Hi @IbrahimTanyalcin thanks a lot,

This is not Microsoft azure's point of view/interpretation of standard but they may be themselves wrong.

https://learn.microsoft.com/en-us/azure/frontdoor/front-door-caching?pivots=front-door-standard-premium#delivery-of-large-files

Tip
If your origin compresses the response, ensure that the Content-Range header value matches the actual length of the compressed response.

Other resources / SR lead to the same observation.

from compression.

IbrahimTanyalcin avatar IbrahimTanyalcin commented on April 27, 2024

@mscbpi yes you are correct, I never objected the RFC, however from a technical standpoint it is very costly to compute the total zipped size and adjust the Content-Range value.

I wish a provisionary header like Content-Range-Origin: transformed | identity existed so that the ambiguity is clarified and servers are given a choice. But it is ok, I guess safest bet is to send Accept-Encoding: identity or use the filter option to drop support for that file type and hope for the best.

from compression.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.