Skip to content

Conversation

@DrDaveD
Copy link
Contributor

@DrDaveD DrDaveD commented Dec 11, 2025

This adds a configuration option to completely ignore all Vary headers.

It is well known that squid does not fully implement Vary headers, in particular with shared memory caching as mentioned in bug 4445 comment 3. We are running into this now that we have our larger installations using SMP. We control our servers and clients, and the servers never send Vary and the clients never vary their request types, but sometimes we insert a Cloudflare CDN reverse proxy between our forward proxy squid and server, where we have little control, and those insert Vary: accept-encoding. That causes those responses to always be treated as stale by squid after their first expiration. I have not been able to find another way to avoid that besides this option.

To reproduce, set http_proxy to point to a squid configured for SMP and run

curl -vs http://s1cern-cvmfs.openhtc.io/cvmfs/info/v1/meta.json | wc -c

After it expires in 61 seconds, run it again a few times. Without this option the Cache-Status always includes stale. With this option, it is stale once, then miss, then hit. I don't know why there's an intermediate miss, but it's not nearly as big of a problem for us to have that one extra miss as to have every request be stale.

As an aside: we usually use collapsed_forwarding, but this problem happens with or without collapsed_forwarding. Our server does not send Last-Modified, but Cloudflare inserts that too. I know that SMP does not support If-Modified-Since with collapsed_forwarding (bug #4890), but we work around that by prohibiting sending any If-Modified-Since requests with

request_header_access If-Modified-Since deny all

@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Dec 11, 2025
@squid-anubis
Copy link
Collaborator

Cannot create a git commit message from PR title and description.

Error while parsing PR description body: the line is too long 73>72

Problematic parser input:

This adds a configuration option to completely ignore all `Vary` headers.

Please see PR title and description formatting requirements for more details.

This message was added by Anubis bot. Anubis will add a new message if the error text changes. Anubis will remove M-failed-description label when there are no corresponding failures to report.

Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: investigation with redbot.org shows the example URL you provided displaying different response contents for different Accept-Encoding header values. There is no ETag to tell the responses apart and they fluctuate between 139 and 158 bytes in length. Even though they appear to be the same resource (and even bytes) when leaving the origin server - they are being changed by the CDN server in ways indicated by the Vary header, at the time indicated by the Last-Modified header.

This means that your situation is one in which it is absolutely NOT safe to ignore the Vary header and treat the 2+ responses as being the same object.

Response headers involved with content negotiation in general are not safe to just ignore. It is far too late to do anything besides abort the caching decision once the upstream server has done its negotiation choices.

Instead you should be using request_header_access Accept-Encoding deny all on your Squid proxies that sit in front of these problematic CDN servers. This will at least ensure that only one variant is ever requested according to the CDN viewpoint.

Also, having your origin set an ETag header so Squid has something more reliable to tell the variants apart than expanding Vary header.

At the very least to be considered for general use this option should also:

  • enforce that a valid ETag header is provided before ignoring Vary.
  • have code wrapped in USE_HTTP_VIOLATIONS protection
  • be an ACL driven setting for targeted application (not a global ON/OFF)
  • be implemented with an on-request action to strip specific request headers, then to only ignore matching response Vary: which list (any subset of) those headers which are removed from the request.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flagged several implementation problems, but I am against adding this directive and this overarching concern should be addressed first: I have no doubts that the proposed directive works around a particular problem in a particular environment, but it is too specific to that problem and that environment. Squid history is full of such "itch scratching" hacks and bug workarounds, with many of them becoming costly to maintain long-term (or remove).

If fixing underlying Vary support bugs is not currently feasible, then I believe we should address the problem this PR is working around using a more general approach: Add an incoming_reply_header_access or a similar directive that would apply to all HTTP response headers received by Squid (from peers and adaptation services) and generated by Squid itself (e.g., in error messages). The new directive can then be used to achieve the same overall effect as the proposed ignore_vary:

incoming_reply_header_access deny Vary all

I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed] ignore_vary implementation (and ask you to add incoming_reply_header_access or a similar general directive instead). CC: @kinkie, @yadij

P.S. The existing reply_header_access applies to being sent (rather than received) response headers AFAICT.

option causes Squid to completely ignore the Vary header and
go ahead and cache the response using normal rules.

WARNING: Doing this this VIOLATES the HTTP standard. Enabling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add IFDEF: USE_HTTP_VIOLATIONS to the directive metadata then, right?

Some HTTP servers send a Vary header even though the clients
never send requests with different request headers. This
option causes Squid to completely ignore the Vary header and
go ahead and cache the response using normal rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree with the following assertion, please adjust PR description to explain that the proposed ignore_vary on differs from the already supported reply_header_access Vary deny all because the new feature is applied to all responses, while existing reply_header_access directive only applies to responses sent by Squid.

DOC_START
Some HTTP servers send a Vary header even though the clients
never send requests with different request headers. This
option causes Squid to completely ignore the Vary header and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this description misleads the reader into thinking that Squid is just going to ignore the Vary header. In reality, Squid is going to delete that header upon receipt and, hence, never forward it. The header will not even appear in well-known level-2 response header dumps (as if it was never received)!

Also, it may surprise some admins that Squid will send some Vary headers even if this directive is enabled (because Squid adds Vary to some of its generated responses).

Please rephrase to clarify these effects of the proposed directive. Renaming the directive to delete_received_vary or similar may be required.

P.S. AFAICT, the header is going to be counted in Squid named header statistics, arguably contradicting the "completely ignore" promise as well.

cached. It is intended as a workaround until HTTP/1.1 is fully
implemented and only for use when client behavior is fully
controlled.
DOC_END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also document how the new ignore_vary option interacts with the existing vary_ignore_expire. Warning the admin when both directives are enabled is probably a good idea!


if (Config.onoff.ignore_vary) {
String rawVary;
if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this code is missing Http::HdrType::HDR_X_ACCELERATOR_VARY treatment or the new directive documentation is missing a disclaimer about the difference in treatment of those two Vary-like headers (that are treated the same way in many other Squid contexts). The former is more likely.

if (Config.onoff.ignore_vary) {
String rawVary;
if (getByIdIfPresent(Http::HdrType::VARY, &rawVary)) {
delById(Http::HdrType::VARY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this low-level hack will trigger "no leftovers" assertion in Adaptation::Icap::ModXact::prepEchoing() if ignore_vary is enabled during reconfiguration that happened after the affected response header has been received but before ICAP decided to echo that response.

Most likely, this code should be moved to (i.e. triggered from) higher level code to avoid such difficult-to-find bugs.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Dec 11, 2025
@rousskov
Copy link
Contributor

I have not seen Amos' review when posting mine. FWIW, the alternative solution I have proposed would effectively work around concerns in Amos' review -- those concerns would apply to a particular Squid configuration instead (and addressing them would be up to the admin deploying that configuration, of course -- Squid Project reviewers would not stand in the way).

@yadij
Copy link
Contributor

yadij commented Dec 11, 2025

I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed] ignore_vary implementation (and ask you to add incoming_reply_header_access or a similar general directive instead). CC: @kinkie, @yadij

I outlined in my review a few implementation requirements we need not to break HTTP quite badly.
IMHO though it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.

@rousskov
Copy link
Contributor

it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.

We are in agreement on that, but it may take a very long time and requires special expertise... The alternative incoming_reply_header_access directive is generally useful and does not have those problems.

@yadij
Copy link
Contributor

yadij commented Dec 11, 2025

The alternative incoming_reply_header_access directive is generally useful and does not have those problems.

AIUI, this alternative meets the ACL-driven requirement. "generally useful" is a bad thing - the protocol impact is expanded from just Vary to all possible content negotiation headers, including future/unknown ones. Making the important request-filtering action highly difficult to implement correctly.

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Dec 12, 2025

Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming. I will probably instead just leave this applied as a patch on our project's Squid distribution until the Vary behavior in Squid improves, if it does.

We all agree that the better thing would be to fix the Vary bug(s), but as @rousskov said, that requires specialized knowledge, and I don't have that. My applications also don't need Vary at all so I can't justify spending significant resources on it.

@yadij I'm not surprised that Cloudflare behaves as it's supposed to with varied request headers, but I know that our applications don't send any so I'm not worried about that. I will indeed include request_header_access Accept-Encoding deny all to make sure nobody sends that just in case, thanks for that hint. One of our two major applications streams data from a database, so it is impossible for it to calculate an ETag in time to insert it as a header.

@rousskov I was wishing for something like an incoming_reply_header_access feature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler. It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.

@rousskov
Copy link
Contributor

rousskov commented Dec 12, 2025

Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming.

FWIW, I am not asking for very complex or very time consuming things.

@rousskov I was wishing for something like an incoming_reply_header_access feature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler.

I understand. FWIW, when considering adding new features, I recommend asking first.

It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.

I disagree with Amos "generally useful is a bad thing" opinion: IMO, correct/good applications justify the inevitable abuses in incoming_reply_header_access case. However, this disagreement is of secondary importance here. What really matters is the answer to the following question:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Dec 12, 2025

when considering adding new features, I recommend asking first.

I thought about that but I knew it was a known problem, and I needed assurance that I could find a workaround. Once I found one I thought it would be best to propose it and see what came of it.

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Dec 18, 2025

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

@yadij
Copy link
Contributor

yadij commented Dec 21, 2025

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

I never plan to veto anyones contributions. Whether I do depends on what "quality" it actually is (low? terrible?). As mentioned, the description so far has major hurdles that need to be jumped and I doubt any of us have time to see that one through to "reasonable" quality.

@yadij
Copy link
Contributor

yadij commented Dec 21, 2025

@rousskov, please recall that embedding user-specific menus and profile/login info into web content is one of the oldest use-cases for HTTP content negotiation and still widely used. Sharing per-user content between all visitors and changing the headers such that the rest of the web caches in the chain do the same - is CVE fodder (at least CWE-359).

@rousskov
Copy link
Contributor

rousskov commented Dec 21, 2025

@rousskov, please recall that embedding user-specific menus and profile/login info into web content is one of the oldest use-cases for HTTP content negotiation and still widely used. Sharing per-user content between all visitors and changing the headers such that the rest of the web caches in the chain do the same - is CVE fodder (at least CWE-359).

There are many ways to use incoming_reply_header_access correctly, without running into problems you are describing. There are many ways to use incoming_reply_header_access incorrectly as well. IMO, the latter does not matter in this "should we add incoming_reply_header_access feature?" discussion context -- it is admin responsibility to use a known-to-be-dangerous (but still generally useful) tool correctly.

I would also like to hear the answer to this question from Alex:

@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?

I never plan to veto anyones contributions. Whether I do depends on what "quality" it actually is (low? terrible?). As mentioned, the description so far has major hurdles that need to be jumped and I doubt any of us have time to see that one through to "reasonable" quality.

@yadij, assuming that incoming_reply_header_access implementation simply removes matching header fields, without any additional header-specific checks/protections, would you veto that feature addition?

I do not see any legitimate "major hurdles" in adding such a simple header filtering feature. Apparently, we disagree on that. I want to understand whether that disagreement is enough to trigger your veto of the proposed feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels S-waiting-for-author author action is expected (and usually required)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants