-
Notifications
You must be signed in to change notification settings - Fork 603
Add ignore_vary option #2318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add ignore_vary option #2318
Conversation
|
Cannot create a git commit message from PR title and description. Error while parsing PR description body: Problematic parser input: Please see PR title and description formatting requirements for more details. This message was added by Anubis bot. Anubis will add a new message if the error text changes. Anubis will remove M-failed-description label when there are no corresponding failures to report. |
yadij
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: investigation with redbot.org shows the example URL you provided displaying different response contents for different Accept-Encoding header values. There is no ETag to tell the responses apart and they fluctuate between 139 and 158 bytes in length. Even though they appear to be the same resource (and even bytes) when leaving the origin server - they are being changed by the CDN server in ways indicated by the Vary header, at the time indicated by the Last-Modified header.
This means that your situation is one in which it is absolutely NOT safe to ignore the Vary header and treat the 2+ responses as being the same object.
Response headers involved with content negotiation in general are not safe to just ignore. It is far too late to do anything besides abort the caching decision once the upstream server has done its negotiation choices.
Instead you should be using request_header_access Accept-Encoding deny all on your Squid proxies that sit in front of these problematic CDN servers. This will at least ensure that only one variant is ever requested according to the CDN viewpoint.
Also, having your origin set an ETag header so Squid has something more reliable to tell the variants apart than expanding Vary header.
At the very least to be considered for general use this option should also:
- enforce that a valid ETag header is provided before ignoring Vary.
- have code wrapped in USE_HTTP_VIOLATIONS protection
- be an ACL driven setting for targeted application (not a global ON/OFF)
- be implemented with an on-request action to strip specific request headers, then to only ignore matching response
Vary:which list (any subset of) those headers which are removed from the request.
rousskov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I flagged several implementation problems, but I am against adding this directive and this overarching concern should be addressed first: I have no doubts that the proposed directive works around a particular problem in a particular environment, but it is too specific to that problem and that environment. Squid history is full of such "itch scratching" hacks and bug workarounds, with many of them becoming costly to maintain long-term (or remove).
If fixing underlying Vary support bugs is not currently feasible, then I believe we should address the problem this PR is working around using a more general approach: Add an incoming_reply_header_access or a similar directive that would apply to all HTTP response headers received by Squid (from peers and adaptation services) and generated by Squid itself (e.g., in error messages). The new directive can then be used to achieve the same overall effect as the proposed ignore_vary:
incoming_reply_header_access deny Vary all
I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed] ignore_vary implementation (and ask you to add incoming_reply_header_access or a similar general directive instead). CC: @kinkie, @yadij
P.S. The existing reply_header_access applies to being sent (rather than received) response headers AFAICT.
| option causes Squid to completely ignore the Vary header and | ||
| go ahead and cache the response using normal rules. | ||
|
|
||
| WARNING: Doing this this VIOLATES the HTTP standard. Enabling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add IFDEF: USE_HTTP_VIOLATIONS to the directive metadata then, right?
| Some HTTP servers send a Vary header even though the clients | ||
| never send requests with different request headers. This | ||
| option causes Squid to completely ignore the Vary header and | ||
| go ahead and cache the response using normal rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree with the following assertion, please adjust PR description to explain that the proposed ignore_vary on differs from the already supported reply_header_access Vary deny all because the new feature is applied to all responses, while existing reply_header_access directive only applies to responses sent by Squid.
| DOC_START | ||
| Some HTTP servers send a Vary header even though the clients | ||
| never send requests with different request headers. This | ||
| option causes Squid to completely ignore the Vary header and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this description misleads the reader into thinking that Squid is just going to ignore the Vary header. In reality, Squid is going to delete that header upon receipt and, hence, never forward it. The header will not even appear in well-known level-2 response header dumps (as if it was never received)!
Also, it may surprise some admins that Squid will send some Vary headers even if this directive is enabled (because Squid adds Vary to some of its generated responses).
Please rephrase to clarify these effects of the proposed directive. Renaming the directive to delete_received_vary or similar may be required.
P.S. AFAICT, the header is going to be counted in Squid named header statistics, arguably contradicting the "completely ignore" promise as well.
| cached. It is intended as a workaround until HTTP/1.1 is fully | ||
| implemented and only for use when client behavior is fully | ||
| controlled. | ||
| DOC_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document how the new ignore_vary option interacts with the existing vary_ignore_expire. Warning the admin when both directives are enabled is probably a good idea!
|
|
||
| if (Config.onoff.ignore_vary) { | ||
| String rawVary; | ||
| if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this code is missing Http::HdrType::HDR_X_ACCELERATOR_VARY treatment or the new directive documentation is missing a disclaimer about the difference in treatment of those two Vary-like headers (that are treated the same way in many other Squid contexts). The former is more likely.
| if (Config.onoff.ignore_vary) { | ||
| String rawVary; | ||
| if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) { | ||
| delById(Http::HdrType::VARY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this low-level hack will trigger "no leftovers" assertion in Adaptation::Icap::ModXact::prepEchoing() if ignore_vary is enabled during reconfiguration that happened after the affected response header has been received but before ICAP decided to echo that response.
Most likely, this code should be moved to (i.e. triggered from) higher level code to avoid such difficult-to-find bugs.
|
I have not seen Amos' review when posting mine. FWIW, the alternative solution I have proposed would effectively work around concerns in Amos' review -- those concerns would apply to a particular Squid configuration instead (and addressing them would be up to the admin deploying that configuration, of course -- Squid Project reviewers would not stand in the way). |
I outlined in my review a few implementation requirements we need not to break HTTP quite badly. |
We are in agreement on that, but it may take a very long time and requires special expertise... The alternative |
AIUI, this alternative meets the ACL-driven requirement. "generally useful" is a bad thing - the protocol impact is expanded from just Vary to all possible content negotiation headers, including future/unknown ones. Making the important request-filtering action highly difficult to implement correctly. |
|
Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming. I will probably instead just leave this applied as a patch on our project's Squid distribution until the Vary behavior in Squid improves, if it does. We all agree that the better thing would be to fix the Vary bug(s), but as @rousskov said, that requires specialized knowledge, and I don't have that. My applications also don't need Vary at all so I can't justify spending significant resources on it. @yadij I'm not surprised that Cloudflare behaves as it's supposed to with varied request headers, but I know that our applications don't send any so I'm not worried about that. I will indeed include @rousskov I was wishing for something like an |
FWIW, I am not asking for very complex or very time consuming things.
I understand. FWIW, when considering adding new features, I recommend asking first.
I disagree with Amos "generally useful is a bad thing" opinion: IMO, correct/good applications justify the inevitable abuses in @yadij, do you expect to veto a quality PR that adds a general filtering directive like |
I thought about that but I knew it was a known problem, and I needed assurance that I could find a workaround. Once I found one I thought it would be best to propose it and see what came of it. |
|
I would also like to hear the answer to this question from Alex:
|
I never plan to veto anyones contributions. Whether I do depends on what "quality" it actually is (low? terrible?). As mentioned, the description so far has major hurdles that need to be jumped and I doubt any of us have time to see that one through to "reasonable" quality. |
|
@rousskov, please recall that embedding user-specific menus and profile/login info into web content is one of the oldest use-cases for HTTP content negotiation and still widely used. Sharing per-user content between all visitors and changing the headers such that the rest of the web caches in the chain do the same - is CVE fodder (at least CWE-359). |
There are many ways to use
@yadij, assuming that I do not see any legitimate "major hurdles" in adding such a simple header filtering feature. Apparently, we disagree on that. I want to understand whether that disagreement is enough to trigger your veto of the proposed feature. |
This adds a configuration option to completely ignore all
Varyheaders.It is well known that squid does not fully implement
Varyheaders, in particular with shared memory caching as mentioned in bug 4445 comment 3. We are running into this now that we have our larger installations using SMP. We control our servers and clients, and the servers never sendVaryand the clients never vary their request types, but sometimes we insert a Cloudflare CDN reverse proxy between our forward proxy squid and server, where we have little control, and those insertVary: accept-encoding. That causes those responses to always be treated as stale by squid after their first expiration. I have not been able to find another way to avoid that besides this option.To reproduce, set
http_proxyto point to a squid configured for SMP and runAfter it expires in 61 seconds, run it again a few times. Without this option the
Cache-Statusalways includesstale. With this option, it isstaleonce, thenmiss, thenhit. I don't know why there's an intermediatemiss, but it's not nearly as big of a problem for us to have that one extra miss as to have every request be stale.As an aside: we usually use
collapsed_forwarding, but this problem happens with or withoutcollapsed_forwarding. Our server does not sendLast-Modified, but Cloudflare inserts that too. I know that SMP does not supportIf-Modified-Sincewithcollapsed_forwarding(bug #4890), but we work around that by prohibiting sending anyIf-Modified-Sincerequests with