Before Creating the Enhancement Request
Summary
After PR #10443 introduces static AttributeKey instances, the next layer of allocation in the metrics hot path is the Attributes object itself. For repeated parameter combinations (e.g. same (topic, messageType, isSystem) or same (requestCode, responseCode, isLongPolling, result)), the broker keeps rebuilding identical immutable Attributes objects on every message/RPC. This issue proposes a two-level cache (volatile inline cache + ConcurrentHashMap) to reuse those Attributes objects.
It also includes a concurrency fix for RemotingCodeDistributionHandler's inline cache, which is reachable from multiple Netty EventLoop threads via @Sharable and would otherwise have a torn-read window between two independent volatile fields.
Motivation
JFR heap dump analysis on a running broker shows:
BrokerMetricsManager.incTopicMessage is called once per send. Each call rebuilds an Attributes for the same (topic, messageType, isSystem) tuple. With ~50 unique combinations under steady-state and 100k QPS, ~99.95% of these Attributes objects are exact duplicates.
- Each
Attributes.builder().put().put().build() allocates ~200–300 bytes (builder, internal ArrayList<Object>, sort/dedup temporaries, final immutable backing array). 100k QPS × 250 B ≈ 25 MB/s metrics-only allocation rate.
RemotingMetricsManager builds a 4-tag Attributes for every RPC response, also dominated by repeats.
RemotingCodeDistributionHandler already maintains a per-code LongAdder map; it would benefit from an inline cache to avoid ConcurrentHashMap.get on every report.
Describe the Solution You'd Like
Two-level cache structure (used in both broker and remoting):
inline cache (volatile) → near-100% hit in single-tenant scenario
│ miss
↓
ConcurrentHashMap<key, Attributes> → multi-tenant fallback
BrokerMetricsManager.getOrBuildTopicAttributes(topic, messageType, isSystem) — 4-field volatile inline cache + CHM keyed by topic|messageType|isSystem.
RemotingMetricsManager.getOrBuildAttributes(requestCode, responseCode, isLongPolling, result) — long-packed cache key (35 bits across 4 args, no String allocation), volatile inline cache + CHM.
RemotingCodeDistributionHandler — replace two independent volatile fields with an immutable CachedCounter { final int code; final LongAdder adder; } holder published through a single volatile CachedCounter. Single volatile read per inc() returns a self-consistent snapshot, eliminating a torn-read between lastCode and lastAdder under @Sharable concurrent access.
Describe Alternatives You've Considered
- String concatenation as cache key (e.g.
topic + "|" + msgType + "|" + isSystem) — rejected because each lookup would itself allocate a String + char[], defeating the purpose of caching.
- Single
volatile Attributes cache without CHM fallback — rejected because in multi-tenant scenarios it would constantly be replaced and degenerate to a per-call build.
- Two independent
volatile fields for RemotingCodeDistributionHandler — initially proposed, rejected after review identified a torn-read between writing lastCode and lastAdder.
Additional Context
Before Creating the Enhancement Request
Summary
After PR #10443 introduces static
AttributeKeyinstances, the next layer of allocation in the metrics hot path is theAttributesobject itself. For repeated parameter combinations (e.g. same(topic, messageType, isSystem)or same(requestCode, responseCode, isLongPolling, result)), the broker keeps rebuilding identical immutableAttributesobjects on every message/RPC. This issue proposes a two-level cache (volatile inline cache +ConcurrentHashMap) to reuse thoseAttributesobjects.It also includes a concurrency fix for
RemotingCodeDistributionHandler's inline cache, which is reachable from multiple NettyEventLoopthreads via@Sharableand would otherwise have a torn-read window between two independentvolatilefields.Motivation
JFR heap dump analysis on a running broker shows:
BrokerMetricsManager.incTopicMessageis called once per send. Each call rebuilds anAttributesfor the same(topic, messageType, isSystem)tuple. With ~50 unique combinations under steady-state and 100k QPS, ~99.95% of theseAttributesobjects are exact duplicates.Attributes.builder().put().put().build()allocates ~200–300 bytes (builder, internalArrayList<Object>, sort/dedup temporaries, final immutable backing array). 100k QPS × 250 B ≈ 25 MB/s metrics-only allocation rate.RemotingMetricsManagerbuilds a 4-tagAttributesfor every RPC response, also dominated by repeats.RemotingCodeDistributionHandleralready maintains a per-codeLongAddermap; it would benefit from an inline cache to avoidConcurrentHashMap.geton every report.Describe the Solution You'd Like
Two-level cache structure (used in both broker and remoting):
BrokerMetricsManager.getOrBuildTopicAttributes(topic, messageType, isSystem)— 4-field volatile inline cache + CHM keyed bytopic|messageType|isSystem.RemotingMetricsManager.getOrBuildAttributes(requestCode, responseCode, isLongPolling, result)— long-packed cache key (35 bits across 4 args, no String allocation), volatile inline cache + CHM.RemotingCodeDistributionHandler— replace two independentvolatilefields with an immutableCachedCounter { final int code; final LongAdder adder; }holder published through a singlevolatile CachedCounter. Single volatile read perinc()returns a self-consistent snapshot, eliminating a torn-read betweenlastCodeandlastAdderunder@Sharableconcurrent access.Describe Alternatives You've Considered
topic + "|" + msgType + "|" + isSystem) — rejected because each lookup would itself allocate aString + char[], defeating the purpose of caching.volatile Attributescache without CHM fallback — rejected because in multi-tenant scenarios it would constantly be replaced and degenerate to a per-call build.volatilefields forRemotingCodeDistributionHandler— initially proposed, rejected after review identified a torn-read between writinglastCodeandlastAdder.Additional Context
AttributeKeyinstances #10443 (provides the typedLABEL_*_KEYconstants used in cache builders).AttributeKeyinstances #10441 (parent issue covering all three layers of metrics allocation reduction; this issue covers only the second layer).