Add AC hit rate metrics with prometheus labels (alternative implementation with decorator)#358
Add AC hit rate metrics with prometheus labels (alternative implementation with decorator)#358ulrfa wants to merge 2 commits intobuchgr:masterfrom
Conversation
This is a draft, not ready to be merged. I wait with test cases until getting feedback about the main functionality. Same functionality as in buchgr#350 - Prometheus counter for cache hit ratio of only AC requests. - Support for prometheus labels based on custom HTTP and gRPC headers. but implemented in an alternative way: - disk.io implements two interfaces: cache.CasAcCache, cache.Stats - cache.metricsdecorator is decorator for cache.CasAcCache and provides prometheus metrics. Pros with this alternative implementation: - Should allow non-prometheus metrics as requested in buchgr#355 - Avoid the question about if the counters should be placed in disk.go or http.go/grpc*.go. If placing in disk.go, there are issues about how to avoid incrementing counter twice for the same request (both in Get and in GetValidatedActionResult) and at the same time count found AC but missing CAS, as cache miss. - Makes headers available also for logic in disk.go, which could be useful for other functionality in the future. - Metrics can be separated from logging, and still not require injecting counter increment invocations in tons of places. Incrementing from a few well defined places minimize the risk for bugs in the metrics.
mostynb
left a comment
There was a problem hiding this comment.
I haven't had time to fully digest this or the original version yet, but this feels a bit cleaner so far. I'll get back to you late this week with some more feedback.
| // an error. | ||
| // TODO Consider separating implementation of cache.AcStore interface, to open up | ||
| // possibilities combining that functionality with proxies, and also allow | ||
| // bazel-remote configurations with proxy but no local disk storage? |
There was a problem hiding this comment.
A proxy-only mode has been on my todo-list for a while. I have considered doing this as a short-circuit inside disk.Cache (it might be time to rename or split the package at that point).
There was a problem hiding this comment.
@mostynb Does it make sense to bring back the cache.Cache interface that the proxies and disk cache used to implement for something like that?
There was a problem hiding this comment.
@mostynb Is it something more than implementation of GetValidatedActionResult, that you expect will be needed from disk.io, in a proxy-only-mode?
| validateAC := !c.DisableHTTPACValidation | ||
| h := server.NewHTTPCache(diskCache, accessLogger, errorLogger, validateAC, c.EnableACKeyInstanceMangling, gitCommit) | ||
|
|
||
| h := server.NewHTTPCache(casAcCache, diskCache, accessLogger, errorLogger, validateAC, c.EnableACKeyInstanceMangling, gitCommit) |
There was a problem hiding this comment.
It seems a bit strange to pass both casAcCache and diskCache here. Doesn't the former wrap the latter?
There was a problem hiding this comment.
NewHTTPCache takes two interfaces: cache.BlobAcStore and cache.Stats.
The file disk.go implements both those interfaces.
But metricsdecorator is a decorator only for cache.BlobAcStore. The main reason is that I imagine using other metricsdecorator instances wrapping also the proxies, replacing the current counters in the proxies. But the proxies does not implement cache.Stats and therfore metricsdecorator also does not. (Using metricsdecorator instances also for proxies would involve setting counter name in call to NewMetricsDecorator, optional non-AC metrics support , etc)
The GRPC server is not using the cache.Stats interface.
| // TODO Document interface | ||
| type CasAcCache interface { | ||
| // TODO change to io.ReadCloser? | ||
| Put(kind EntryKind, hash string, size int64, rdr io.Reader, reqCtx RequestContext) error |
There was a problem hiding this comment.
I wonder if we could use a map[string]string instead of a RequestContext interface in these functions?
There was a problem hiding this comment.
I tried that first, but ended up with cache.RequestContext because:
-
gRPC and HTTP/2 is using canonical lowercase header names, but the golang HTTP/1.1 library normalize header names to start with capital letter. I want to hide that difference, but avoid translating headers that are never accessed, and instead translate on demand in:
Lines 395 to 398 in 4b2b8b2
-
Minimize overhead for creating cache.RequestContext instance in general (only allocate struct and store a pointer)
-
Allows extending cache.RequestContext with additional methods that returns non-header meta information. Such as Host, RemoteAddr or Cookie from HTTP requests. Or any other information that could be added by bazel-remote’s HTTP/GRPC servers, like enum indicating if original request were HTTP or GRPC.
-
Allows extending cache.RequestContext to allow proxies to cancel outgoing requests, if associated incomming request is canceled. I have not looked into details, but seems to be something related to that in underlying GRPC and HTTP contexts.
-
Allows extending cache.RequestContext in general, instead of having to update all Get/Put/Contains invocations with additional parameters, all over bazel-remote codebase.
-
Allows propagating custom information in general between a chain of different cache.BlobStore implementations (such as proxies, decorators, …)
| } | ||
|
|
||
| // TODO Document interface | ||
| type AcStore interface { |
There was a problem hiding this comment.
Is this interface used anywhere outside BlobAcStore?
There was a problem hiding this comment.
No. I’m only thinking that having a separate AcStore interface might be more generic and flexible. Especially if GetValidatedActionResult method would be extracted from disk.go in the future. Or if there would be decorators wrapping only AcStore. But I don’t have much experience of embedding interfaces and structs in golang. It might work with only BlobStore and BlobAcStore interface, if you think it is preferable to avoid AcStore?
|
Hi @mostynb. I would be happy to continue working on this patch, what do you think? |
|
@ulrfa Is it possible to break this into two PRs? Basically thinking:
I think it will be easier to digest and review. If not I understand and will carve out some time later to look it over. |
|
Hi @bdittmer,
That is all changes except the three files listed below.
That is the changes in the three source files:
Perhaps the list of source files above is enough for decision about if you would want me to go forward with this. If you decide yes, then I will add test cases and can also split into several PRs if you want. Is that OK? I will probably not have time to work on this until after next week. |
This is a draft, not ready to be merged. I wait with test cases until
getting feedback about the main functionality.
Same functionality as in #350
Prometheus counter for cache hit ratio of only AC requests.
Support for prometheus labels based on custom HTTP and gRPC headers.
but implemented in an alternative way:
disk.io implements two interfaces: cache.CasAcCache, cache.Stats
cache.metricsdecorator is decorator for cache.CasAcCache and
provides prometheus metrics.
Pros with this alternative implementation:
Should allow non-prometheus metrics as requested in
Metrics interface(s) #355
Avoid the question about if the counters should be placed in
disk.go or http.go/grpc*.go. If placing in disk.go, there are
issues about how to avoid incrementing counter twice for the
same request (both in Get and in GetValidatedActionResult)
and at the same time count found AC but missing CAS, as cache miss.
Makes headers available also for logic in disk.go, which could be
useful for other functionality in the future.
Metrics can be separated from logging, and still not require
injecting counter increment invocations in tons of places.
Incrementing from a few well defined places minimize the risk
for bugs in the metrics.