Giter Site home page Giter Site logo

cacherules's Introduction

Hi there ๐Ÿ‘‹

cacherules's People

Contributors

aw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cacherules's Issues

Incorrect min-fresh?

Hi there,

Whilst source sifting I stumbled upon this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L64-L74

It is used in the "validate_allow_stale" or may I serve this request as stale? However, min-fresh means: the request needs to be fresh for at least this long when you return it. A request with a min-fresh value can never be stale, as by definition it's then no longer fresh.

In fact, you COULD use min-fresh to reduce the maximum age before it becomes stale. and as is, I think, if I understand this correctly, that is what you have implemented it here (in inverse, by adding to the current age, acting as if it's older):

CacheRules/lib/helpers.rb

Lines 317 to 324 in 537667a

def helper_min_fresh
Proc.new {|request, freshness_lifetime, current_age|
if request && request['min-fresh']
token = request['min-fresh']['token']
freshness_lifetime.to_i >= (current_age + token.to_i)
end
}
end

Now since that is correct, unfortunately, if min-fresh is given, and it's fresh, the validate_allow_stale function give... true? Min-fresh should probably not have anything to do with this.

p.s. I love what you've done!

Older user agents might not understand 307 responses

When redirecting to another URI, the client is redirected using the 307 HTTP status code.

This might not work for user agents using HTTP/1.0 or older.

  • One solution is to change the code to 302
  • Another solution would be to detect the version and send 302 or 307

Errors caused by empty HTTP headers

This was discovered thanks to Amazon S3 sending an invalid HTTP Content-Type header, which in fact is not even allowed to be empty according to the RFCs.

Location: https://s3.amazonaws.com/production.s3.rubygems.org/specs.4.8.gz

HTTP/1.1 200 OK
x-amz-id-2: alURqgFXa/nVfPpLEJQVOs3fJ8t9C9Z+MlAdMAaC62d1ZsccK06NVpqRLxlD1KVB
x-amz-request-id: 1581E6BF3225E185
Date: Wed, 29 Apr 2015 15:22:57 GMT
x-amz-version-id: qc5dRNqFJO68y1JxMODSS1fQf.8WZ8IF
Last-Modified: Wed, 29 Apr 2015 15:21:33 GMT
ETag: "c64eff68233a552b9737972ad8c2fb86"
Accept-Ranges: bytes
Content-Type: 
Content-Length: 2364063
Server: AmazonS3

I vote ๐Ÿ‘ to just drop empty headers.

Cached headers are not returned

When sending a 200 or 304 response, we need to return the original cached headers with existing headers overwritten by the revalidated response.

At the moment only the following are returned: Cache-Control Content-Location Date ETag Expires Vary.

This was by design, but it's wrong.

Improper validation of max-stale

Detected this ๐Ÿ› in our production setup.

Validation always returns a 504 Gateway Timeout / EXPIRED result when no Cache-Control: max-stale header is provided. This is an error as seen in the RFC:

If no value is assigned to max-stale, then the client is willing to accept a stale response of any age.

Revalidation on no match incorrectly gives back cached value, even on deleted resource

Related to #15

In the decision table:

rule result
500 error no
matches precondition no
> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 200 Ok
< ETag: "W/newetag"

Currently it returns a 200 with the "cached" result, but actually a GET should be done to fetch the resource again. The server would return a 304 (correct in the table as well) if the preconditions match, and in that case it may use the result from the cache, but on a 200 it means that the resource has changed and should be served from the origin server.

I propose changing it to MISS. You might as well do a GET for revalidation. In case of 304, you'll get a head :not_modified, so you don't really gain anything here from doing a HEAD. Additionally, I don't the 200 status here is correct. It should report whichever status was returned from the server. Example:

> Host: "https://example.org"
> Location: /my-resource
> Method: HEAD # or GET
> If-None-Match: "W/myetag"
< Host: "https://example.org"
< Location: /my-resource
< HTTP 410 Gone

In this case on revalidation the resource has reported gone. It is correctly not a "stale-allowing-error" (your implementation of validate_is_error) and definitely not REVALIDATED. Should be MISS with the code 410. This is true for anything that's not a server error (500..599).

All of the above counts when the preconditions do not match, or are not even present in the first place (revalidation is then a simple fetch, as per #15).

Revalidation is "EXPIRED" when there is no precondition

Hi again,

When you hit the revalidation code, the following rules are applied:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L127-L133

Which internally calls:
https://github.com/aw/CacheRules/blob/master/lib/helpers.rb#L365-L369

Now I could not find such mandatory. This might be a design choice in your library, but I don't think a Gateway Timeout is appropriate for all cases here.

Let's consider the case of a simple must-revalidate request.

< Cache-Control: must-revalidate, max-age=60
< Date: Fri, 13 Jul 2018 16:40:00 +0000
< HTTP 200 Ok

Meaning, fresh for 60 seconds, MUST NOT use stale when it has expired. If requested past 16:41, it SHOULD just retry the request. In this case, because no ETag or Last-Modified is present in the cached response, nor is there a If-None-Match in the request, it gives us a 504, but we have not even tried to reach the origin server.

I think you implemented it as such because of https://tools.ietf.org/html/rfc7234#section-4.3.1 where it says

   When sending a conditional request for cache validation, a cache
   sends one or more precondition header fields containing validator
   metadata from its stored response(s), which is then compared by
   recipients to determine whether a stored response is equivalent to a
   current representation of the resource.

However, when you don't have these headers, you would not send a conditional request, but a regular one. This is how both Chrome and Firefox have implemented it. It is mentioned in the mozilla docs: It is either validated or fetched again.

Because of the careful wording in the RFC, and not using a capitalized MUST/SHOULD in this paragraph, I believe you must always try to revalidate in the flow, regardless of the presence of the preconditions. It becomes, semantically, a conditional request if one of the headers is present, but otherwise it's a regular fetch request (and will always return a non-304 result).


Posted the RFC entry just for ease. The other mentions are only "triggering" extra invalidation/rules, but nothing says anything about an ETag / Last-Modified being mandatory.

https://tools.ietf.org/html/rfc7234#section-5.2.2.1   

   The "must-revalidate" response directive indicates that once it has
   become stale, a cache MUST NOT use the response to satisfy subsequent
   requests without successful validation on the origin server.

   The must-revalidate directive is necessary to support reliable
   operation for certain protocol features.  In all circumstances a
   cache MUST obey the must-revalidate directive; in particular, if a
   cache cannot reach the origin server for any reason, it MUST generate
   a 504 (Gateway Timeout) response.

   The must-revalidate directive ought to be used by servers if and only
   if failure to validate a request on the representation could result
   in incorrect operation, such as a silently unexecuted financial
   transaction.

Recently cached responses are considered STALE

This continues from #9 which didn't fully fix the problem.

After some local tests, we've observed this:

"last-modified"=>"Thu, 30 Apr 2015 13:17:42 GMT", # HIT
"last-modified"=>"Fri, 01 May 2015 02:45:29 GMT", # STALE

That's obviously an massive error and completely backwards. I believe it's related to the freshness_lifetime calculation.. looking into it.

Valid cached files are labeled as expired

For some strange reason, all recently cached files are automatically labeled as "expired" if requested without any cache-control or expiry headers.

I have a feeling this is wrong, should investigate..

Max-age header isn't fully validated

It appears there is no validation on the max-age client request header.

There is also only an if max-age == 0 check on the cached response header. This means if a client or server sends Cache-Control: max-age=10 and the response's Age field is great than 10, it will be ignored. Oops.

Caching rules are inconsistent

The tests passed, so everything must be fine right? wrong!

In fact, I'm certain the tests are wrong and the cache rules are not being applied correctly. Investigating..

Revalidation with matching If-Not-Modified-Since is 200 instead of 304

Hi there,

Currently, on revalidation, a 304 will only be generated if the precondition matches:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L124-L127"

However, this does not handle all the preconditions:

https://tools.ietf.org/html/rfc7234#section-4.3.2

   If an If-None-Match header field is not present, a request containing
   an If-Modified-Since header field (Section 3.3 of [RFC7232])
   indicates that the client wants to validate one or more of its own
   stored responses by modification date.  A cache recipient SHOULD
   generate a 304 (Not Modified) response (using the metadata of the
   selected stored response) if one of the following cases is true: 1)
   the selected stored response has a Last-Modified field-value that is
   earlier than or equal to the conditional timestamp; 2) no
   Last-Modified field is present in the selected stored response, but
   it has a Date field-value that is earlier than or equal to the
   conditional timestamp; or, 3) neither Last-Modified nor Date is
   present in the selected stored response, but the cache recorded it as
   having been received at a time earlier than or equal to the
   conditional timestamp.

On such a request, CacheRules will return a 200 with REVALIDATED as lookup value.

The funny thing is, you have actually already written this:
https://github.com/aw/CacheRules/blob/master/lib/validations.rb#L29

A simple replacement of validator_match with precond_match here should fix this:
https://github.com/aw/CacheRules/blob/master/lib/cache_rules.rb#L89

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.