Add ignore_vary option #2318

DrDaveD · 2025-12-11T20:30:08Z

This adds a configuration option to completely ignore all Vary headers.

It is well known that squid does not fully implement Vary headers, in particular with shared memory caching as mentioned in bug 4445 comment 3. We are running into this now that we have our larger installations using SMP. We control our servers and clients, and the servers never send Vary and the clients never vary their request types, but sometimes we insert a Cloudflare CDN reverse proxy between our forward proxy squid and server, where we have little control, and those insert Vary: accept-encoding. That causes those responses to always be treated as stale by squid after their first expiration. I have not been able to find another way to avoid that besides this option.

To reproduce, set http_proxy to point to a squid configured for SMP and run

curl -vs http://s1cern-cvmfs.openhtc.io/cvmfs/info/v1/meta.json | wc -c

After it expires in 61 seconds, run it again a few times. Without this option the Cache-Status always includes stale. With this option, it is stale once, then miss, then hit. I don't know why there's an intermediate miss, but it's not nearly as big of a problem for us to have that one extra miss as to have every request be stale.

As an aside: we usually use collapsed_forwarding, but this problem happens with or without collapsed_forwarding. Our server does not send Last-Modified, but Cloudflare inserts that too. I know that SMP does not support If-Modified-Since with collapsed_forwarding (bug #4890), but we work around that by prohibiting sending any If-Modified-Since requests with

request_header_access If-Modified-Since deny all

squid-anubis · 2025-12-11T20:31:19Z

Cannot create a git commit message from PR title and description.

Error while parsing PR description body: the line is too long 73>72

Problematic parser input:

This adds a configuration option to completely ignore all `Vary` headers.

Please see PR title and description formatting requirements for more details.

This message was added by Anubis bot. Anubis will add a new message if the error text changes. Anubis will remove M-failed-description label when there are no corresponding failures to report.

yadij

FTR: investigation with redbot.org shows the example URL you provided displaying different response contents for different Accept-Encoding header values. There is no ETag to tell the responses apart and they fluctuate between 139 and 158 bytes in length. Even though they appear to be the same resource (and even bytes) when leaving the origin server - they are being changed by the CDN server in ways indicated by the Vary header, at the time indicated by the Last-Modified header.

This means that your situation is one in which it is absolutely NOT safe to ignore the Vary header and treat the 2+ responses as being the same object.

Response headers involved with content negotiation in general are not safe to just ignore. It is far too late to do anything besides abort the caching decision once the upstream server has done its negotiation choices.

Instead you should be using request_header_access Accept-Encoding deny all on your Squid proxies that sit in front of these problematic CDN servers. This will at least ensure that only one variant is ever requested according to the CDN viewpoint.

Also, having your origin set an ETag header so Squid has something more reliable to tell the variants apart than expanding Vary header.

At the very least to be considered for general use this option should also:

enforce that a valid ETag header is provided before ignoring Vary.
have code wrapped in USE_HTTP_VIOLATIONS protection
be an ACL driven setting for targeted application (not a global ON/OFF)
be implemented with an on-request action to strip specific request headers, then to only ignore matching response Vary: which list (any subset of) those headers which are removed from the request.

rousskov

I flagged several implementation problems, but I am against adding this directive and this overarching concern should be addressed first: I have no doubts that the proposed directive works around a particular problem in a particular environment, but it is too specific to that problem and that environment. Squid history is full of such "itch scratching" hacks and bug workarounds, with many of them becoming costly to maintain long-term (or remove).

If fixing underlying Vary support bugs is not currently feasible, then I believe we should address the problem this PR is working around using a more general approach: Add an incoming_reply_header_access or a similar directive that would apply to all HTTP response headers received by Squid (from peers and adaptation services) and generated by Squid itself (e.g., in error messages). The new directive can then be used to achieve the same overall effect as the proposed ignore_vary:

incoming_reply_header_access deny Vary all

I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed] ignore_vary implementation (and ask you to add incoming_reply_header_access or a similar general directive instead). CC: @kinkie, @yadij

P.S. The existing reply_header_access applies to being sent (rather than received) response headers AFAICT.

rousskov · 2025-12-11T21:04:51Z

src/cf.data.pre

+	option causes Squid to completely ignore the Vary header and
+	go ahead and cache the response using normal rules.
+
+	WARNING: Doing this this VIOLATES the HTTP standard.  Enabling


We should add IFDEF: USE_HTTP_VIOLATIONS to the directive metadata then, right?

rousskov · 2025-12-11T21:08:22Z

src/cf.data.pre

+	Some HTTP servers send a Vary header even though the clients
+	never send requests with different request headers.  This
+	option causes Squid to completely ignore the Vary header and
+	go ahead and cache the response using normal rules.


If you agree with the following assertion, please adjust PR description to explain that the proposed ignore_vary on differs from the already supported reply_header_access Vary deny all because the new feature is applied to all responses, while existing reply_header_access directive only applies to responses sent by Squid.

rousskov · 2025-12-11T21:13:20Z

src/cf.data.pre

+DOC_START
+	Some HTTP servers send a Vary header even though the clients
+	never send requests with different request headers.  This
+	option causes Squid to completely ignore the Vary header and


I believe this description misleads the reader into thinking that Squid is just going to ignore the Vary header. In reality, Squid is going to delete that header upon receipt and, hence, never forward it. The header will not even appear in well-known level-2 response header dumps (as if it was never received)!

Also, it may surprise some admins that Squid will send some Vary headers even if this directive is enabled (because Squid adds Vary to some of its generated responses).

Please rephrase to clarify these effects of the proposed directive. Renaming the directive to delete_received_vary or similar may be required.

P.S. AFAICT, the header is going to be counted in Squid named header statistics, arguably contradicting the "completely ignore" promise as well.

rousskov · 2025-12-11T21:38:52Z

src/cf.data.pre

+	cached.  It is intended as a workaround until HTTP/1.1 is fully
+	implemented and only for use when client behavior is fully
+	controlled.
+DOC_END


Please also document how the new ignore_vary option interacts with the existing vary_ignore_expire. Warning the admin when both directives are enabled is probably a good idea!

rousskov · 2025-12-11T21:41:33Z

src/HttpHeader.cc


+    if (Config.onoff.ignore_vary) {
+        String rawVary;
+        if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) {


Either this code is missing Http::HdrType::HDR_X_ACCELERATOR_VARY treatment or the new directive documentation is missing a disclaimer about the difference in treatment of those two Vary-like headers (that are treated the same way in many other Squid contexts). The former is more likely.

rousskov · 2025-12-11T21:56:15Z

src/HttpHeader.cc

+    if (Config.onoff.ignore_vary) {
+        String rawVary;
+        if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) {
+            delById(Http::HdrType::VARY);


AFAICT, this low-level hack will trigger "no leftovers" assertion in Adaptation::Icap::ModXact::prepEchoing() if ignore_vary is enabled during reconfiguration that happened after the affected response header has been received but before ICAP decided to echo that response.

Most likely, this code should be moved to (i.e. triggered from) higher level code to avoid such difficult-to-find bugs.

rousskov · 2025-12-11T22:11:08Z

I have not seen Amos' review when posting mine. FWIW, the alternative solution I have proposed would effectively work around concerns in Amos' review -- those concerns would apply to a particular Squid configuration instead (and addressing them would be up to the admin deploying that configuration, of course -- Squid Project reviewers would not stand in the way).

yadij · 2025-12-11T22:13:26Z

I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed] ignore_vary implementation (and ask you to add incoming_reply_header_access or a similar general directive instead). CC: @kinkie, @yadij

I outlined in my review a few implementation requirements we need not to break HTTP quite badly.
IMHO though it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.

rousskov · 2025-12-11T22:16:12Z

it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.

We are in agreement on that, but it may take a very long time and requires special expertise... The alternative incoming_reply_header_access directive is generally useful and does not have those problems.

yadij · 2025-12-11T22:29:24Z

The alternative incoming_reply_header_access directive is generally useful and does not have those problems.

AIUI, this alternative meets the ACL-driven requirement. "generally useful" is a bad thing - the protocol impact is expanded from just Vary to all possible content negotiation headers, including future/unknown ones. Making the important request-filtering action highly difficult to implement correctly.

DrDaveD · 2025-12-12T00:17:18Z

Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming. I will probably instead just leave this applied as a patch on our project's Squid distribution until the Vary behavior in Squid improves, if it does.

We all agree that the better thing would be to fix the Vary bug(s), but as @rousskov said, that requires specialized knowledge, and I don't have that. My applications also don't need Vary at all so I can't justify spending significant resources on it.

@yadij I'm not surprised that Cloudflare behaves as it's supposed to with varied request headers, but I know that our applications don't send any so I'm not worried about that. I will indeed include request_header_access Accept-Encoding deny all to make sure nobody sends that just in case, thanks for that hint. One of our two major applications streams data from a database, so it is impossible for it to calculate an ETag in time to insert it as a header.

@rousskov I was wishing for something like an incoming_reply_header_access feature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler. It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.

rousskov · 2025-12-12T14:11:37Z

Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming.

FWIW, I am not asking for very complex or very time consuming things.

@rousskov I was wishing for something like an incoming_reply_header_access feature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler.

I understand. FWIW, when considering adding new features, I recommend asking first.

It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.

I disagree with Amos "generally useful is a bad thing" opinion: IMO, correct/good applications justify the inevitable abuses in incoming_reply_header_access case. However, this disagreement is of secondary importance here. What really matters is the answer to the following question:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

DrDaveD · 2025-12-12T21:33:18Z

when considering adding new features, I recommend asking first.

I thought about that but I knew it was a known problem, and I needed assurance that I could find a workaround. Once I found one I thought it would be best to propose it and see what came of it.

DrDaveD · 2025-12-18T21:14:01Z

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

yadij · 2025-12-21T09:29:43Z

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

I never plan to veto anyones contributions. Whether I do depends on what "quality" it actually is (low? terrible?). As mentioned, the description so far has major hurdles that need to be jumped and I doubt any of us have time to see that one through to "reasonable" quality.

yadij · 2025-12-21T09:55:45Z

@rousskov, please recall that embedding user-specific menus and profile/login info into web content is one of the oldest use-cases for HTTP content negotiation and still widely used. Sharing per-user content between all visitors and changing the headers such that the rest of the web caches in the chain do the same - is CVE fodder (at least CWE-359).

rousskov · 2025-12-21T16:04:08Z

@rousskov, please recall that embedding user-specific menus and profile/login info into web content is one of the oldest use-cases for HTTP content negotiation and still widely used. Sharing per-user content between all visitors and changing the headers such that the rest of the web caches in the chain do the same - is CVE fodder (at least CWE-359).

There are many ways to use incoming_reply_header_access correctly, without running into problems you are describing. There are many ways to use incoming_reply_header_access incorrectly as well. IMO, the latter does not matter in this "should we add incoming_reply_header_access feature?" discussion context -- it is admin responsibility to use a known-to-be-dangerous (but still generally useful) tool correctly.

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

I never plan to veto anyones contributions. Whether I do depends on what "quality" it actually is (low? terrible?). As mentioned, the description so far has major hurdles that need to be jumped and I doubt any of us have time to see that one through to "reasonable" quality.

@yadij, assuming that incoming_reply_header_access implementation simply removes matching header fields, without any additional header-specific checks/protections, would you veto that feature addition?

I do not see any legitimate "major hurdles" in adding such a simple header filtering feature. Apparently, we disagree on that. I want to understand whether that disagreement is enough to trigger your veto of the proposed feature.

Add ignore_vary option

e61a762

squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Dec 11, 2025

yadij requested changes Dec 11, 2025

View reviewed changes

rousskov requested changes Dec 11, 2025

View reviewed changes

rousskov added the S-waiting-for-author author action is expected (and usually required) label Dec 11, 2025

Add ignore_vary option #2318

Are you sure you want to change the base?

Add ignore_vary option #2318

Conversation

DrDaveD commented Dec 11, 2025

Uh oh!

squid-anubis commented Dec 11, 2025

Uh oh!

yadij left a comment

Choose a reason for hiding this comment

Uh oh!

rousskov left a comment

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov commented Dec 11, 2025

Uh oh!

yadij commented Dec 11, 2025

Uh oh!

rousskov commented Dec 11, 2025

Uh oh!

yadij commented Dec 11, 2025

Uh oh!

DrDaveD commented Dec 12, 2025

Uh oh!

rousskov commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrDaveD commented Dec 12, 2025

Uh oh!

DrDaveD commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yadij commented Dec 21, 2025

Uh oh!

yadij commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rousskov commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rousskov commented Dec 12, 2025 •

edited

Loading

DrDaveD commented Dec 18, 2025 •

edited

Loading

yadij commented Dec 21, 2025 •

edited

Loading

rousskov commented Dec 21, 2025 •

edited

Loading