perf: avoid context deep-clone and flags serde double-pass#9
Draft
gagantrivedi wants to merge 2 commits intomainfrom
Draft
perf: avoid context deep-clone and flags serde double-pass#9gagantrivedi wants to merge 2 commits intomainfrom
gagantrivedi wants to merge 2 commits intomainfrom
Conversation
Two independent hot-path wins on the identities and flags endpoints:
1. LocalMemEnvironmentsCache::get_context now returns
Option<Arc<EngineEvaluationContext>> instead of Option<EngineEvaluationContext>.
The context holds every feature, segment, rule, and condition for an
environment; returning it by value caused a full deep clone on every
request. Using Arc makes reads a pointer-bump and moves the one-time
construction cost to the poll path (where it belongs). Callers in
EnvironmentService now dereference the Arc before passing to the engine.
2. get_flags previously returned Json<serde_json::Value>. Building that
required serde_json::to_value(flags) which walks the whole structure
once to produce an intermediate Value tree; Axum's Json response then
walks the tree again to produce bytes. This eliminates the first pass:
a new FlagsResponse enum wraps either a single APIFeatureState or
Vec<APIFeatureState>, implements IntoResponse by delegating to
Json<T>, and we serialize directly.
Measured end-to-end (local Docker, 1 vCPU / 2 GB, identities endpoint,
wrk with 3 trait-matching segment conditions, endpoint caches off):
baseline this PR delta
small 50f/15s/750ov 3,072 4,457 +45% RPS
p99 @ c=200 77 ms 51 ms
medium 200f/50s/8.7Kov 268 510 +90% RPS
p99 @ c=200 2.14 s 412 ms -81%
The medium project benefits disproportionately because the context
clone cost grows with project size; Arc::clone is O(1) either way.
Fixes clippy::explicit_auto_deref. Rust auto-derefs &Arc<T> to &T via the Deref trait when coercing to function args, so the explicit &* was redundant. No behaviour change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two independent hot-path wins on the identities and flags endpoints.
1.
LocalMemEnvironmentsCache::get_contextreturnsArc<EngineEvaluationContext>The context holds every feature, segment, rule, and condition for an environment. The previous implementation deep-cloned it on every request.
Arc::cloneis O(1) and moves the allocation to the polling refresh path. Call sites inEnvironmentServicedereference the Arc before passing to the engine — no behaviour change.2.
get_flagsserializes directly, skippingserde_json::ValueJson<serde_json::Value>requiredserde_json::to_value(flags)first, which walks the whole structure once to build an intermediateValuetree; Axum'sJsonresponse then walks the tree again to produce bytes. Two full traversals per response.New
FlagsResponseenum wraps either a singleAPIFeatureStateorVec<APIFeatureState>, implementsIntoResponseby delegating toJson<T>. One tree-walk to bytes.Measured impact
Local Docker, 1 vCPU / 2 GB, identities endpoint, wrk with 3 trait-matching segment conditions, endpoint caches disabled:
The medium project benefits disproportionately because the context deep-clone cost grows with project size;
Arc::cloneis constant-time either way.Isolated microbenchmarks (criterion, not part of this PR) showed
cache.get_context()drops from 5.78 µs → 32 ns and flags serialization drops from 13.27 µs → 3.46 µs for a 50-feature response.