Giter Site home page Giter Site logo

Use Vary on 304 about compression HOT 37 CLOSED

expressjs avatar expressjs commented on April 25, 2024
Use Vary on 304

from compression.

Comments (37)

dougwilson avatar dougwilson commented on April 25, 2024

Weird. I just added a test that verifies that we do indeed set the Vary header on a 304. Perhaps there is something more going on beyond the report? Is is possible you can provide code to reproduce?

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Actually, I see that typically you would not send a Content-Type header with a 304 response, thus by default we would filter out the request (since we only vary responses with content types we support compression on). I need to read some more on this...

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

From reading the paragraph:

The server generating a 304 response MUST generate any of the
following header fields that would have been sent in a 200 (OK)
response to the same request: [...] Vary.

It seems like it may actually be impossible for this module to comply with that, because without the Content-Type response header sent, how can we know if a 200 response would have thus included a Vary header or not?

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Right, yeah...the issue is the typical lack of Content-Type and the problem with complying is that we can't know what the 200 would've done. I'm not sure what the solution is either (or if there even is a decent generic one); I'll think on it more too. For now, I just wanted to put the issue out there.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

For now, I just wanted to put the issue out there.

Indeed, yes! I never even thought about it :) I'll also be thinking on it, haha, which is why I re-opened it when I realized why it would be occurring (with the default lack of Content-Type).

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Hey @ethanresnick , just passing by this issue to see if you have any additional thoughts?

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

I haven't had much time to think about it, but here's one (perhaps naive) idea that comes to mind: have a map that persists between requests and that stores, for each 200 response, whether it was compressible; if so, the module would add Accept-Encoding to Vary to subsequent requests for the same resource even if those requests 304.

One tricky bit here would be figuring out how to make the map's key to identify a request, but that seems doable. Off the top of my head, it seems like we'd want it to be a composite of the request URI and the value of any headers that the user put in Vary. I don't think including the http method in the key should be necessary, since the imaginary request that the 304 response's headers are based on is always the GET.

The advantage of this approach is that its totally transparent to users of the module. The downside is that it would require some extra memory.

I think this downside could be mitigated though by also allowing the user to provide a function that, if provided, would act as a replrcement for adding to/looking things up in the map. The function would be called on 304s and would receive the same information about the request that would've gone into the map's cache key, and would be responsible for returning true or false (indicating whether the 200 response was cacheable). These functions could be quite simple too--always returning true in the most basic case (e.g. for a server that only serves Json), or switching on a file extension pulled from the request uri, or similar.

Hopefully, this could kind of get the best of both worlds: a totally transparent solution for people who don't mind a bit of extra memory and a space efficient solution for those who need it.

Again, just one idea. I'm curious to hear others too.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

That sounds like introducing a lot of complication to this module, though, and it also doesn't work out-of-the-box with horizontal scale, which is not something I want to publish (just like we, as npm community, don't publish modules that do fs.*Sync at runtime by default, we don't publish things that prevent horizontal scaling by default). It also has a lot of issues with even being consistent:

  1. It only reliably works where the cache has existed for all time. Once the cache is cleared (server restart with that default implementation), then requests for a specific route will always 304 without Vary until finally a request comes in that resulted in a 200 for us to capture the information.
  2. It falls apart as soon as someone doesn't always serve the same Content-Type for everyone from the same URL or someone has a more complex filter argument. Default things in the library need to always work reliably, no matter what the other options to the same library are.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

You're definitely right about the horizontal scaling and that the design assumes the cache has been on forever. I was thinking a bit about these issues when I made the proposal, but I just assumed it would be better for the module to sometimes do the right thing than to never do so. That's probably not right.

One solution then would be to make the user-provided function I talked about above mandatory. It would be called on 304 and asked to return whether the content from the corresponding 200 would've been compressible.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

One solution then would be to make the user-provided function I talked about above mandatory. It would be called on 304 and asked to return whether the content from the corresponding 200 would've been compressible.

Does the filter function not fit this? We call it on 304 currently, and they can say yay or nay.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Does the filter function not fit this?

It does! I'd forgotten about that function entirely, since I've always just left the default.

I think we'd want to tweak it's description in the docs to explain this case (since the user's not really "considering for compression" a 304 response), and maybe expand the default filter function to make some decent guesses when Content-Type isn't set (e.g. by checking if the request URI ends in a compressible extension). But that seems like a workable approach!

EDIT That said, I can also see a UX argument for making this a separate function, since a lot more people will probably need to customize this than will need to customize the default filter function

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

At this point, we may as well simply always set Vary.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Maybe just make that the default (i.e. the default function just returns true)? That way, people who care about maximizing cache hits can use the shorter Vary on non-compressible resources?

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Maybe just make that the default (i.e. the default function just returns true)?

You mean just make filter always return true by default?

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Yup!

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

(On 304s, that is; the existing logic would remain for other responses)

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Yup! (On 304s, that is; the existing logic would remain for other responses)

Well, that won't work, because it's still a violation of the RFC, which states:

The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary.

Making the change you suggest is still just as much in violation that the module is in right now. All you're doing is change it so 304s will always include a Vary header and 200s will sometimes includes a Vary header.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Neither of my proposals was intended to obviate the need for some people to customize to achieve RFC compliance. Under the current setup, though, no one gets RFC compliance out of the box; with a default, at least some people do (i.e. those whose servers fit the default's logic)—and the others are no worse off than they are now.

So my proposals were about trying to find a decent default with the above in mind. I proposed the “always return true” default because I thought you were saying that the more sophisticated default (of checking URI endings etc) was too complicated, and obviously we want to balance minimizing the number of people that need to customize with not making the base library overly complicated.

If you don't want anyone to have to customize to achieve compliance, I think the only way to get it is to make the default the filter function work the same way regardless of whether the response is a 304…but that would mean not relying on the Content-Type header, which would likely mean that far fewer compressible requests would get compressed by default. I don’t think that’s a worthwhile tradeoff.

Maybe there’s some entirely different option here, but I’m not seeing it...

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Ok, well here's the problem here: I need you to provide a specific problem to solve here. Your original post was that we are not RFC compliant, which is true. Thus if that is the issue, then the solution must bring us into RFC compliance, which is a MUST and thus we cannot sometimes be right, we MUST always be right, which with the current code means always setting Vary above the filter. I will be committing and releasing that change today unless you can provide a different problem statement.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Hey Doug. I apologize if my messages here have been a bit confused—I've been trying to attend to this thread while simultaneously doing a million other things.

You're right that always setting Vary: Accept-Encoding, regardless of whether the request ends up being compressible (i.e. passing the filter), does bring us into RFC compliance on the point I quoted at the start of the thread.

What I was trying to say here though is that just always setting Vary comes with its own tradeoff, namely: resources that are not in fact compressible become less cacheable, because caches have to use the (often-poorly-normalized) Accept-Encoding header as part of their cache keys. So, if we can avoid that consequence, that should be part of the problem statement too.

Maybe we can rewrite onHeaders like so:

onHeaders(res, function(){
  // on loading the module, the user can optionally provide a function that
  // determines for a 304 response whether the corresponding 200 would've varied.
  // It lives in the variable `wouldCompress` (as opposed to `shouldCompress`)

  var toVary = 
    // always vary if the user didn't provide `wouldCompress`, 
    // since we can't be smarter.
    typeof wouldCompress !== "function" || 

    // othewise determine whether to vary on 304s using wouldCompress,
    // and using filter (its non-304 counterpart) for other responses.
    (res.status() === 304 ? wouldCompress(req, res) : filter(req, res));

  if(toVary) {
    vary(res, 'Accept-Encoding');
  }

  // determine if request is filtered
  if (!filter(req, res)) {
    nocompress('filtered')
    return
  }

  // do compress...
})

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

All of that wouldCompress logic the user can literally do with filter, so we are not going to add another function for no reason. There are only three courses of action I'm going to really entertain from here, so this thread can end:

  1. We close the issue and leave the behavior as it is now, where the user can provide a filter that works how they like and we set the Vary based on their filter function.
  2. We move the Vary header setting above the filter. The user now can only control the Vary getting set based on where the middleware is located or using a middleware wrapping module.
  3. You define the exact reason we need to make the change for. If it is RFC compliance, then number 2 above the the absolute only solution. If it is just to interop with some third-part caching software, please let me know so we can ensure that the solution works for that software.

I don't mean to be sort, but this issue is just going in circles. If you even want to speak over the phone or Skype or something, you can email me. I'll be releasing number 2 tonight unless I hear a very compelling argument why we should not be RFC compliant.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Basically, can you explain how, exactly, the code in #44 (comment) allows you to in any way do something different than simply defining your own filter function? Your code, by default, will act just like it does now. Otherwise, it'll call a different function for 304s, yet you can easily make that code patch branch in your own filter function, so it does absolutely nothing different than filter, only makes it more complicated. TL;DR your code suggestion in #44 (comment) is just asking for option number 1: do nothing and close this issue.

app.use(compresion({
  filter: function (req, res) {
    if (res.statusCode !== 304) return /* your filter function or compression.filter(req, res) */
    else /* your wouldCompress function */
  }
}))

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

@dougwilson My code does number 2 by default. The user doesn't pass in would compress, so Vary always gets set.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Gotcha. So number 2 it is!

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

@dougwilson Right, of your three options, I'm definitely for the second. The reason I introduced a new wouldCompress function is because, if you do option two without it, then the user has to use filter to remove certain Vary headers that they don't want added (but that were put on previously when option 2 was applied). And then suddenly filter isn't just a simple predicate, but is this ugly side effect thing

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Ok, but like I said: we are not adding a second weird function. They don't have to use filer to remove from Vary, they can re-arrange the middleware to be in a more sane location or use on-header in their code to remove from Vary.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Ok, then option 2 it is.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

If you like, though, I can add an option that you can set to true or false for even adding to Vary at all and then people can set it to false and use their own logic if they like. How does that sound?

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Yeah, I'd be in favor of adding that option too. Would simplify things for people who really want to manage vary.

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

So basically we get:

onHeaders(res, function(){
  if(options.setVary) {
    vary(res, 'Accept-Encoding');
  }

  // determine if request is filtered
  if (!filter(req, res)) {
    nocompress('filtered')
    return
  }

  // do compress...
})

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

I apologize again for all the going in circles. I'm gonna close this now, and then head out. Enjoy your 4th of July :)

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Ok. For posterity, here is why it was located after the filter: senchalabs/connect#848 people say that before it was breaking Akami. I guess we'll just go back to breaking Akami again :)

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Then it seems like we've gotten to a pretty good place with following the rfc by default but adding an option :) [I'm not using Akamai, and wasn't arguing based on that issue, but it's good to know about.]

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Cool. Yes, happy 4th! I'll think on this issue a bit, because I'm not sure about breaking Akami by default all of a sudden (we can, but it may have to be a major version in a few months down the road instead of this weekend).

from compression.

ethanresnick avatar ethanresnick commented on April 25, 2024

Gotcha. If possible we should first check whether Akamai's already fixed its behavior. Because they may well have---it's been over a year since that other issue, and, more importantly, the creation of RFC 7232 may have brought this problem to their attention (the prior RFC 2068 actually did not have this requirement; it had a slightly weaker one instead). Now, I'm not sure how to go about checking this given the private kbs, but maybe someone you know could help?

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Right, I don't know how to check, either. I may be able to find someone with Akamai access I can ask, but it could be a few weeks if that's OK.

from compression.

dougwilson avatar dougwilson commented on April 25, 2024

Though I doubt Akamai works any different today than a year ago, seeing as Drupal just made a commit to fix this very thing only about 15 days ago: http://cgit.drupalcode.org/cdn/commit/?id=a3408dc

from compression.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.