feat(systemplane): add standardization helpers, catalog, and hardened service layer#404
Conversation
… service layer Adds catalog package with shared key validation and registration for Postgres and Redis backends. Introduces domain coercion helpers for type-safe snapshot value conversion (bool, int, float64, duration, string slice) with overflow/NaN/Inf guards and case-insensitive bool parsing. Adds fail-closed default auth adapters (deny mutations, allow reads) and default identity adapters (system identity for internal operations) with Actor.ID length validation. Refactors service/manager into reads, writes, and helpers files. Adds component diff detection for identifying changed settings between snapshots. Introduces shutdown helpers for ordered systemplane teardown. Extracts supervisor helpers with overflow-safe phase sorting and non-mutating tenant ID merge. Hardens BootstrapConfig.Validate with default case, DelegatingAuthorizer with unconditional empty-action rejection, cloneSnapshot with nil TenantSettings preservation, and Swagger MergeInto with tag deduplication. Includes comprehensive tests. X-Lerian-Ref: 0x1
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a canonical configuration catalog and validation, introduces coercion and snapshot/setting helpers, tightens KeyDef validation (env var and redact policy), and extends schema/DTOs with EnvVar and RedactPolicy. It makes the bootstrap backend registry concurrency-safe, refactors manager/snapshot/supervisor flows into helpers, adds component-diffing and shutdown sequencing, provides default authorizer/identity implementations, restructures write/read flows in the manager, and adds extensive unit tests and minor formatting fixes. Sequence Diagram(s)mermaid Supervisor->>Builder: prepareReloadBuild(snapshot input) Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
commons/systemplane/bootstrap/backend.go (1)
125-143:⚠️ Potential issue | 🟡 MinorSnapshot is taken but not used for factory lookup.
Line 125 takes a snapshot but discards the factories map. Line 140 then calls
backendRegistry.factory()directly, which reads from the live registry. This creates a subtle inconsistency: init errors are checked from a snapshot, but factory lookup uses live state.If a factory were registered after the snapshot but before the lookup, it would be used despite the snapshot intent. Consider using the snapshot consistently:
♻️ Suggested fix for consistency
func NewBackendFromConfig(ctx context.Context, cfg *BootstrapConfig) (*BackendResources, error) { - _, initErrors := backendRegistry.snapshot() + factories, initErrors := backendRegistry.snapshot() if len(initErrors) > 0 { return nil, fmt.Errorf("bootstrap backend: init registration errors: %w", errors.Join(initErrors...)) } // ... validation ... - factory, ok := backendRegistry.factory(cfg.Backend) + factory, ok := factories[cfg.Backend] if !ok { return nil, fmt.Errorf("%w %q (no factory registered)", errUnsupportedBackend, cfg.Backend) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/bootstrap/backend.go` around lines 125 - 143, The snapshot call stores initErrors but drops the snapshot of factories, causing a live-registry lookup later; change the call to capture the snapshot (e.g., factories, initErrors := backendRegistry.snapshot()) and then use that captured snapshot for the lookup instead of backendRegistry.factory (e.g., look up cfg.Backend in the factories map or call a method on the snapshot) so the initErrors check and factory selection are consistent; keep the existing error handling with initErrors and errUnsupportedBackend/errNilBackendConfig as-is.commons/systemplane/service/supervisor.go (1)
209-225:⚠️ Potential issue | 🔴 CriticalThe reload path mutates the active bundle during incremental builds while concurrent readers can access it lock-free.
BuildIncrementalis contractually required to mutate thepreviousBundle(the currently-active bundle) by nil-ing out transferred component pointers. AlthoughReload()holdssupervisor.muduringprepareReloadBuildandreconcileCandidateBundle,Current()reads and returns the bundle reference without acquiring the mutex—using onlyatomic.LoadPointer, which atomicizes the pointer itself, not the object's contents. A concurrent thread can therefore callCurrent(), obtain a reference to the bundle, and observe it being partially mutated byBuildIncremental.To fix, either:
- Treat the active bundle as read-only during incremental builds and stage mutations on a detached copy before swap
- Have
BuildIncrementalmutate only the candidate, not the previous bundle- Ensure
Current()returns a defensive snapshot during reload windowsPer coding guidelines, "Keep behavior nil-safe and concurrency-safe by default in exported APIs."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/service/supervisor.go` around lines 209 - 225, prepareReloadBuild/reconcileCandidateBundle currently pass the live active bundle into BuildIncremental which mutates previousBundle (nil-ing transferred pointers) while Current() may return that same object lock-free; to fix, ensure incremental builds never mutate the live bundle: change prepareReloadBuild (and the value passed into BuildIncremental) to create and use a detached deep copy of the previous bundle (or otherwise ensure BuildIncremental only mutates the candidate copy), so all pointer-nilling happens on the candidate and the active bundle remains read-only; additionally, if making copy here is infeasible, make Current() return a defensive snapshot during reload windows or document/guard supervisor.mu so callers cannot observe in-place mutations (reference symbols: BuildIncremental, prepareReloadBuild, reconcileCandidateBundle, Current, supervisor.mu).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commons/systemplane/bootstrap/backend_test.go`:
- Around line 152-158: The repeated snapshot/restore pattern around
backendRegistry.snapshot(), backendRegistry.factories and
backendRegistry.initErrors should be extracted into a test helper; add a
t.Helper() function (e.g. withRegistrySnapshot) that captures origFactories,
origErrors := backendRegistry.snapshot() and registers t.Cleanup to restore
backendRegistry.factories and backendRegistry.initErrors, then replace the
duplicated snapshot/cleanup blocks in tests with a call to
withRegistrySnapshot(t).
In `@commons/systemplane/bootstrap/backend.go`:
- Around line 30-79: The registry mutates and reads s.factories without
synchronization; add a sync.RWMutex field to backendRegistryState and use it to
protect all accessors/mutators: use write locks in register(), reset(),
recordInitError() and any code that mutates s.factories or s.initErrors, and use
read locks in factory() and snapshot() (upgrade snapshot to copy under a read
lock before returning). Update the package-global backendRegistry usage
accordingly so RegisterBackendFactory and NewBackendFromConfig call into these
mutex-protected methods, ensuring the exported API is concurrency-safe by
default.
In `@commons/systemplane/bootstrap/config.go`:
- Around line 76-78: Replace the plain fmt.Errorf in the default branch with a
wrapped sentinel error for consistency: add a package-level sentinel error
ErrUnhandledBackend and change the default return to wrap that sentinel with the
backend name (using fmt.Errorf with %w) so callers can compare errors (e.g.,
errors.Is(err, ErrUnhandledBackend)); reference the default case handling
cfg.Backend and the validate function's default branch when making the change.
In `@commons/systemplane/catalog/validate.go`:
- Around line 117-119: The validation only checks env var mismatches when
allowedEnvVars(sk) returns a non-empty slice, so when the canonical key
intentionally defines no env vars a product can still set pd.EnvVar and go
unnoticed. In ValidateKeyDefs, after computing expectedEnvVars :=
allowedEnvVars(sk), add a branch: if len(expectedEnvVars) == 0 && pd.EnvVar !=
"" then append a mismatch to comparisons (use mismatchForString(pd, sk,
"EnvVar", "", pd.EnvVar) or the existing expected-format) so products that set
an env var where the catalog defines none are flagged; keep existing
containsString branch for the non-empty case. Ensure you reference
allowedEnvVars, pd.EnvVar, comparisons, mismatchForString and sk when making the
change.
In `@commons/systemplane/domain/coercion_helpers.go`:
- Around line 109-119: The float-to-int boundary checks fail at the 2^63
floating-point alias; update intFromFloat64 and scaleDurationFloat64 to reject
values that are at-or-beyond the integer bounds by using math.Nextafter to
detect the next representable float beyond the integer limits. Specifically,
after truncating (or before casting), compute maxF := float64(math.MaxInt) and
minF := float64(math.MinInt) and also nextMax := math.Nextafter(maxF,
math.Inf(1)) and nextMin := math.Nextafter(minF, math.Inf(-1)); return false if
value >= nextMax || value <= nextMin (or truncated >= nextMax / <= nextMin) so
numbers that collapse to the same float as the boundary are correctly rejected
before casting in intFromFloat64 and scaleDurationFloat64.
In `@commons/systemplane/domain/key_def.go`:
- Around line 154-156: The new error in domain.KeyDef.Validate() that returns
ErrInvalidRedactPolicy for keyDef.Secret && keyDef.RedactPolicy == RedactMask
introduces a breaking validation failure; instead preserve compatibility by not
returning an error for Secret+RedactMask—either normalize the value (e.g., set
keyDef.RedactPolicy = RedactFull inside Validate() for Secret keys) or simply
skip this check so existing callers remain valid. Update the logic in
KeyDef.Validate() to perform normalization rather than returning
ErrInvalidRedactPolicy (referencing keyDef.Secret, keyDef.RedactPolicy,
RedactMask, RedactFull and ErrInvalidRedactPolicy).
In `@commons/systemplane/ports/identity_defaults_test.go`:
- Around line 99-143: Add a table-driven test case to
TestFuncIdentityResolver_FailsClosedWithoutIdentity that covers a
FuncIdentityResolver whose ActorFunc is nil but DefaultActor is a
whitespace-only string (e.g., " "); call resolver.Actor(ctx) with a valid
context and assert that it returns domain.ErrPermissionDenied just like the
empty/default-nil cases, referencing the FuncIdentityResolver struct, its
DefaultActor field, ActorFunc, and the Actor(ctx) method to locate the code to
modify.
In `@commons/systemplane/ports/identity_defaults.go`:
- Around line 35-63: Add a godoc comment for the exported method
FuncIdentityResolver.Actor that succinctly describes its behavior and
fail-closed semantics: note that it returns domain.ErrPermissionDenied on nil
receiver or nil context, that it trims whitespace and prefers the value returned
by ActorFunc(ctx) over DefaultActor (fallback) when ActorFunc produces an empty
string, and that it enforces maxActorIDLength returning ErrPermissionDenied for
overly long IDs; include mention that it returns domain.Actor{ID: id} on
success. Reference the symbols FuncIdentityResolver.Actor, ActorFunc,
DefaultActor, maxActorIDLength, and domain.ErrPermissionDenied in the comment.
- Around line 65-80: Add a godoc comment for FuncIdentityResolver.TenantID that
documents its fail-closed behavior and nil-safety, and enforce a maximum tenant
ID length by introducing a maxTenantIDLength constant and validating the trimmed
tenantID in TenantID(ctx context.Context) to return domain.ErrPermissionDenied
when tenantID is empty or exceeds the max; update any tests or callers if needed
and keep the existing nil/context checks and empty-string handling intact.
In `@commons/systemplane/service/component_diff.go`:
- Around line 194-204: Add a short comment above effectiveValueChanged
explaining that it uses reflect.DeepEqual to compare EffectiveValue.Value and
documenting DeepEqual semantics (e.g., nil slice != empty slice, functions
always unequal) and the expected/allowed types for EffectiveValue.Value
(primitives, slices, maps) so future maintainers know the comparison caveats; if
values may include nil-vs-empty slices or functions, note that they must be
normalized before comparison or that such types are not supported by this
comparison.
In `@commons/systemplane/service/manager_writes_helpers.go`:
- Around line 103-110: The call to manager.store.Put returns a persisted
revision (revision) but if manager.applyEscalation fails the function currently
returns an empty WriteResult; change the error path so that when
applyEscalation(ctx, plan.target, plan.escalation) returns an error you still
return the persisted revision in the WriteResult (e.g. return
WriteResult{Revision: revision, /* other needed fields */}, fmt.Errorf("apply
escalation: %w", err)) so callers can reconcile/retry safely; update the return
at the applyEscalation error branch to include the revision from
manager.store.Put and any other required fields of WriteResult.
In `@commons/systemplane/service/shutdown_test.go`:
- Around line 99-101: The test currently only asserts the call order when
tt.wantOrder != nil, skipping the "all nil steps" case; change the test to
assert unconditionally by calling assert.Equal(t, tt.wantOrder, order) (or
assert.Empty for clarity) without the surrounding if so the "all nil steps" case
is explicitly verified; update the assertion near tt.wantOrder and order in
shutdown_test.go accordingly.
In `@commons/systemplane/service/snapshot_builder_test.go`:
- Line 225: The test currently asserts values["app.name"].Redacted without
verifying the key exists, which can hide a missing entry; update the test in
snapshot_builder_test.go (where values is populated, e.g., after calling
initDefaults or the test setup that builds values) to first assert the key
exists (for example, check that val, ok := values["app.name"]; ok) and then
assert val.Redacted is false—i.e., replace the direct access with a presence
check followed by the Redacted assertion referencing the retrieved value.
In `@commons/systemplane/service/supervisor_helpers.go`:
- Around line 104-108: The current error-path calls candidate.Close(ctx) which
can tear down resources shared with the active bundle for incremental builds;
instead detect incremental candidates produced by BuildIncremental and call the
incremental discard path discardFailedCandidate(ctx, candidate, previousBundle)
so shared/adopted resources are released safely, otherwise fall back to
candidate.Close(ctx); update the block that currently checks
isNilRuntimeBundle(candidate) to invoke discardFailedCandidate when a
previousBundle is present/when candidate is incremental and only call
candidate.Close for non-incremental candidates.
- Around line 49-60: In discardFailedCandidate(), replace the three silent
ignores of Discard(ctx) and Close(ctx) (the branches testing
candidate.(discarderType) and rollbackDiscarder and the final candidate.Close
call) with best-effort structured logging: call the Discard/Close, capture the
returned error, and if non-nil log it via the package's structured logger/tracer
(include fields like "candidate", "operation" ("discard" or "close"), and the
error) before returning; similarly, in commitReload() where the previous
bundle's Close(ctx) is currently discarded, capture any error and emit a
structured log/trace entry with context (e.g., "previous_bundle",
"operation":"close", error) since that function cannot propagate errors. Ensure
you use the existing logger/tracer object available in the scope and do not
change control flow or error returns from these functions.
In `@commons/systemplane/swagger/swagger.go`:
- Around line 105-107: The exported godoc for MergeInto is out of sync with
mergeTags: update the MergeInto comment to state that systemplane tags are
merged with deduplication by tag name (i.e., tags from the source with a name
that already exists in the destination are skipped) instead of describing a
plain append; reference the mergeTags semantics in the comment and clearly
document that tag order and duplicates from the source will not be preserved by
MergeInto.
- Around line 127-148: The current merge loop only checks dstArr against the
existing map, so duplicate tag names inside srcArr still get appended; update
the loop that iterates srcArr in swagger.go to mark each parsed tag.Name into
the existing map before appending (use the same anonymous struct parsing as
before) and skip append when existing[tag.Name] is true, ensuring srcArr
duplicates are deduplicated; refer to symbols dstArr, srcArr, existing and the
local var tag when making the change.
---
Outside diff comments:
In `@commons/systemplane/bootstrap/backend.go`:
- Around line 125-143: The snapshot call stores initErrors but drops the
snapshot of factories, causing a live-registry lookup later; change the call to
capture the snapshot (e.g., factories, initErrors := backendRegistry.snapshot())
and then use that captured snapshot for the lookup instead of
backendRegistry.factory (e.g., look up cfg.Backend in the factories map or call
a method on the snapshot) so the initErrors check and factory selection are
consistent; keep the existing error handling with initErrors and
errUnsupportedBackend/errNilBackendConfig as-is.
In `@commons/systemplane/service/supervisor.go`:
- Around line 209-225: prepareReloadBuild/reconcileCandidateBundle currently
pass the live active bundle into BuildIncremental which mutates previousBundle
(nil-ing transferred pointers) while Current() may return that same object
lock-free; to fix, ensure incremental builds never mutate the live bundle:
change prepareReloadBuild (and the value passed into BuildIncremental) to create
and use a detached deep copy of the previous bundle (or otherwise ensure
BuildIncremental only mutates the candidate copy), so all pointer-nilling
happens on the candidate and the active bundle remains read-only; additionally,
if making copy here is infeasible, make Current() return a defensive snapshot
during reload windows or document/guard supervisor.mu so callers cannot observe
in-place mutations (reference symbols: BuildIncremental, prepareReloadBuild,
reconcileCandidateBundle, Current, supervisor.mu).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9464126b-b240-4f6b-bcea-e4ab8c67883d
📒 Files selected for processing (46)
commons/systemplane/adapters/http/fiber/dto.gocommons/systemplane/adapters/http/fiber/dto_test.gocommons/systemplane/bootstrap/backend.gocommons/systemplane/bootstrap/backend_test.gocommons/systemplane/bootstrap/config.gocommons/systemplane/catalog/doc.gocommons/systemplane/catalog/keys_postgres.gocommons/systemplane/catalog/keys_redis.gocommons/systemplane/catalog/keys_shared.gocommons/systemplane/catalog/shared_key.gocommons/systemplane/catalog/validate.gocommons/systemplane/catalog/validate_test.gocommons/systemplane/domain/coercion_helpers.gocommons/systemplane/domain/config_helpers.gocommons/systemplane/domain/key_def.gocommons/systemplane/domain/key_def_test.gocommons/systemplane/domain/setting_helpers.gocommons/systemplane/domain/snapshot_config_helpers_test.gocommons/systemplane/domain/snapshot_duration_slice_helpers_test.gocommons/systemplane/domain/snapshot_setting_helpers_test.gocommons/systemplane/domain/snapshot_test_helpers_test.gocommons/systemplane/ports/authorizer_defaults.gocommons/systemplane/ports/authorizer_defaults_test.gocommons/systemplane/ports/identity_defaults.gocommons/systemplane/ports/identity_defaults_test.gocommons/systemplane/service/component_diff.gocommons/systemplane/service/component_diff_test.gocommons/systemplane/service/manager.gocommons/systemplane/service/manager_helpers.gocommons/systemplane/service/manager_helpers_test.gocommons/systemplane/service/manager_read_helpers.gocommons/systemplane/service/manager_reads.gocommons/systemplane/service/manager_reads_test.gocommons/systemplane/service/manager_schema_test.gocommons/systemplane/service/manager_test.gocommons/systemplane/service/manager_writes.gocommons/systemplane/service/manager_writes_helpers.gocommons/systemplane/service/shutdown.gocommons/systemplane/service/shutdown_test.gocommons/systemplane/service/snapshot_builder.gocommons/systemplane/service/snapshot_builder_test.gocommons/systemplane/service/supervisor.gocommons/systemplane/service/supervisor_helpers.gocommons/systemplane/swagger/spec.jsoncommons/systemplane/swagger/swagger.gocommons/systemplane/swagger/swagger_test.go
💤 Files with no reviewable changes (2)
- commons/systemplane/service/manager_test.go
- commons/systemplane/service/manager_reads_test.go
Sort import groups alphabetically, remove trailing blank line in test, and normalize struct field comment alignment in Manager.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commons/tenant-manager/consumer/multi_tenant_optional_rabbitmq_test.go`:
- Line 13: The import order was changed so the amqp import (amqp alias for
"github.com/rabbitmq/amqp091-go") is placed before the testify imports; reorder
the import block to group external dependencies together by moving the amqp
import so all external packages (e.g., "github.com/rabbitmq/amqp091-go" as amqp,
"github.com/stretchr/testify/assert", "github.com/stretchr/testify/require")
appear together before any internal project imports in
multi_tenant_optional_rabbitmq_test.go; this is a cosmetic fix—adjust the import
list in the file's import block to follow conventional grouping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9292cb9c-6963-4afb-8898-7193b31916a8
📒 Files selected for processing (3)
commons/tenant-manager/consumer/multi_tenant_optional_rabbitmq_test.gocommons/tenant-manager/core/context_test.gocommons/tenant-manager/postgres/manager.go
💤 Files with no reviewable changes (1)
- commons/tenant-manager/core/context_test.go
- Add sync.RWMutex to backend registry for concurrent safety - Use snapshot consistently for factory lookup in NewBackendFromConfig - Extract withRegistrySnapshot test helper to reduce duplication - Add ErrUnhandledBackend sentinel error in config validation - Fix EnvVar validation gap when catalog defines no env vars - Fix float-to-int boundary at 2^63 (>= instead of > for overflow check) - Accept Secret+RedactMask in KeyDef.Validate (runtime normalizes to Full) - Add godocs for Actor/TenantID, maxTenantIDLength, whitespace test case - Add DeepEqual semantics comment on effectiveValueChanged - Return persisted revision on escalation failure in persistAndApplyWrite - Assert unconditionally in shutdown_test (covers all-nil-steps case) - Check key existence before asserting Redacted in snapshot_builder_test - Use discardFailedCandidate for incremental build error path - Document concurrency safety invariant in prepareReloadBuild - Update MergeInto godoc and deduplicate srcArr tags in mergeTags - Auto-formatting: struct field alignment in tenant-manager packages
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (3)
commons/systemplane/service/supervisor_helpers.go (2)
49-60:⚠️ Potential issue | 🟠 MajorHandle cleanup failures instead of discarding them.
Lines 49-60 and 192-194 still drop
Discard/Closeerrors with_ =, so failed rollbacks and superseded bundle shutdowns can leak resources without any diagnostic signal. These are best-effort paths, but they still need structured logging/tracing with operation context before returning.As per coding guidelines, "Do not swallow errors; return or handle errors with context" and "Use the structured log interface
Log(ctx, level, msg, fields...)instead of printf-style methods".Also applies to: 192-194
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/service/supervisor_helpers.go` around lines 49 - 60, The code is currently swallowing errors from candidate.Discard(ctx) and candidate.Close(ctx); update the cleanup paths (the branch checking rollbackDiscarder and the final Close call) to handle returned errors by calling the structured logger Log(ctx, level, msg, fields...) with contextual fields (e.g., "candidate", "operation":"discard" or "close", and the error) instead of assigning to _; preserve the best-effort semantics by logging at debug/warn level and then returning, and apply the same change to the similar cleanup at the other location (the Close/Discard calls around the 192-194 logic).
132-139:⚠️ Potential issue | 🟠 MajorPublish snapshot and bundle through one atomic state object.
Lines 176-177 are still two independent
Storecalls, so concurrent readers can observe a new snapshot paired with the old bundle, or vice versa. That breaks the documented atomic swap contract and weakens the guarantee described in Lines 132-139 onceAdoptResourcesFromstarts mutating the retired bundle.Run the following to confirm the mixed-state window remains:
#!/bin/bash # Expectation: the implementation still publishes and reads snapshot/bundle separately, # so the pair is not swapped atomically. sed -n '130,195p' commons/systemplane/service/supervisor_helpers.go echo '--- separate snapshot/bundle readers ---' rg -n -C3 'snapshot\.Load\(|bundle\.Load\(' commons/systemplane/serviceAs per coding guidelines, "Preserve v4 public API contracts unless a task explicitly asks for breaking changes" and "Keep behavior nil-safe and concurrency-safe by default in exported APIs".
Also applies to: 176-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/service/supervisor_helpers.go` around lines 132 - 139, The code currently does two separate Store calls (snapshot.Store and bundle.Store) which allows a reader to see a mixed pair; change to publish a single atomic state object that contains both the snapshot and bundle and call Store once so Current() readers always see a consistent pair; update the loader/reader code that uses snapshot.Load and bundle.Load to use the new single state.Load, and ensure AdoptResourcesFrom, BuildIncremental and any callers that expect the previousBundle still work by reading the bundle out of the atomic state (preserving read-only sharing semantics).commons/systemplane/domain/coercion_helpers.go (1)
109-123:⚠️ Potential issue | 🟠 MajorRestore the 32-bit upper bound in
intFromFloat64.Line 119 fixes the
2^63alias on 64-bit, but it now hard-codes a 64-bit ceiling for anintresult. On 32-bit builds, truncated values above the 32-bitintmax still pass andint(truncated)becomes implementation-specific.Suggested fix
func intFromFloat64(value float64) (int, bool) { if math.IsNaN(value) || math.IsInf(value, 0) { return 0, false } truncated := math.Trunc(value) - if truncated >= 1<<63 || truncated < math.MinInt { - return 0, false - } + if strconv.IntSize == 32 { + if truncated > math.MaxInt || truncated < math.MinInt { + return 0, false + } + } else if truncated >= 1<<63 || truncated < math.MinInt { + return 0, false + } return int(truncated), true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/domain/coercion_helpers.go` around lines 109 - 123, The function intFromFloat64 currently uses a hard-coded 64-bit ceiling (1<<63) which breaks on 32-bit builds; replace that constant with platform-specific int bounds: compute the platform max and min int (e.g., intMax := int(^uint(0) >> 1) and intMin := -intMax - 1 or their int64 equivalents), convert those to float64 for comparison, and use those values when rejecting out-of-range truncated values in intFromFloat64 (ensure you still guard against float->int boundary aliasing by rejecting values >= float64(intMax)+1 or > float64(intMax) as appropriate and values < float64(intMin)). Ensure you update the comparisons that currently reference 1<<63 and math.MinInt to use these platform-specific bounds and then return int(truncated) when in range.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commons/systemplane/bootstrap/backend_test.go`:
- Around line 167-168: Test mutating code directly modifies
backendRegistry.factories without locking; add and use a test helper that
acquires backendRegistry.mu before mutating so tests follow the production
synchronization contract. Create helpers (e.g., setBackendFactoryLocked(name
domain.BackendType, f backendFactory) and deleteBackendFactoryLocked(name
domain.BackendType)) that lock backendRegistry.mu, perform the assignment or
delete on backendRegistry.factories, and unlock; replace direct mutations like
delete(backendRegistry.factories, domain.BackendPostgres) (and the other
occurrences cited) with calls to these locked helpers, or call them from
withRegistrySnapshot where appropriate.
In `@commons/systemplane/bootstrap/backend.go`:
- Around line 49-53: The recordInitError implementation appends nil errors,
causing initErrors to be non-empty even when no real error occurred; update
backendRegistryState.recordInitError to ignore nil (wrap existing mutex
lock/unlock and only append when err != nil) so stray RecordInitError(nil) calls
are safe—ensure the exported helper (RecordInitError) that forwards to
recordInitError continues to work unchanged but benefits from the nil guard.
In `@commons/systemplane/domain/coercion_helpers.go`:
- Around line 251-257: The type switch over raw in the duration coercion
function misses the time.Duration concrete type, so a time.Duration value won't
match case int64; update the switch (the raw variable in the function that calls
scaleDurationInt64 and scaleDurationFloat64) to include a case time.Duration
that converts v to int64 and calls scaleDurationInt64(int64(v), unit) so
time.Duration inputs are handled directly and consistently with other integer
types.
In `@commons/systemplane/ports/identity_defaults.go`:
- Around line 38-42: Update the Actor godoc to accurately describe the fallback
behavior: state that FuncIdentityResolver.Actor uses DefaultActor not only when
ActorFunc is nil or returns an empty string but also when it returns a
whitespace-only value (after strings.TrimSpace). Mention the use of ActorFunc
and DefaultActor in the comment so it aligns with the implementation of
FuncIdentityResolver.Actor.
In `@commons/systemplane/service/component_diff.go`:
- Around line 114-115: The function snapshotIsZeroValue currently only checks
Configs, GlobalSettings, and TenantSettings, allowing snapshots with non-zero
Revision or BuiltAt to be treated as zero; update snapshotIsZeroValue (in
commons/systemplane/service/component_diff.go) to also require Revision == 0 and
BuiltAt is the zero time (e.g., BuiltAt.IsZero()) as part of the conjunction so
the fast-path only triggers for a true zero-value snapshot.
In `@commons/systemplane/service/manager_writes_helpers.go`:
- Around line 98-115: The callers of persistAndApplyWrite (in particular the
function handling PatchResource/PatchSettings in manager_writes.go) currently
discard the returned partial WriteResult on error and return WriteResult{};
update those error paths to return the WriteResult value returned by
persistAndApplyWrite (e.g., propagate the returned result variable alongside the
error) so the persisted revision is preserved for callers to reconcile/retry;
apply the same change to the PatchSettings caller path as well, ensuring any
function that calls persistAndApplyWrite forwards the returned WriteResult when
err != nil instead of dropping it.
In `@commons/systemplane/swagger/swagger.go`:
- Around line 149-159: The code currently calls json.Unmarshal(raw, &tag) and
ignores errors and also allows a tag with an empty Name to be appended; change
the block around json.Unmarshal so you capture the error (err :=
json.Unmarshal(raw, &tag)) and propagate or return a contextual error instead of
continuing silently when unmarshal fails, and add a check for tag.Name == ""
that rejects/returns an error (or skips with a logged error) rather than
allowing a nameless tag to be added; update the logic that uses
existing[tag.Name] and dstArr = append(dstArr, raw) (the tag variable and
existing map usage) so the append only happens after successful unmarshal and a
non-empty tag.Name.
---
Duplicate comments:
In `@commons/systemplane/domain/coercion_helpers.go`:
- Around line 109-123: The function intFromFloat64 currently uses a hard-coded
64-bit ceiling (1<<63) which breaks on 32-bit builds; replace that constant with
platform-specific int bounds: compute the platform max and min int (e.g., intMax
:= int(^uint(0) >> 1) and intMin := -intMax - 1 or their int64 equivalents),
convert those to float64 for comparison, and use those values when rejecting
out-of-range truncated values in intFromFloat64 (ensure you still guard against
float->int boundary aliasing by rejecting values >= float64(intMax)+1 or >
float64(intMax) as appropriate and values < float64(intMin)). Ensure you update
the comparisons that currently reference 1<<63 and math.MinInt to use these
platform-specific bounds and then return int(truncated) when in range.
In `@commons/systemplane/service/supervisor_helpers.go`:
- Around line 49-60: The code is currently swallowing errors from
candidate.Discard(ctx) and candidate.Close(ctx); update the cleanup paths (the
branch checking rollbackDiscarder and the final Close call) to handle returned
errors by calling the structured logger Log(ctx, level, msg, fields...) with
contextual fields (e.g., "candidate", "operation":"discard" or "close", and the
error) instead of assigning to _; preserve the best-effort semantics by logging
at debug/warn level and then returning, and apply the same change to the similar
cleanup at the other location (the Close/Discard calls around the 192-194
logic).
- Around line 132-139: The code currently does two separate Store calls
(snapshot.Store and bundle.Store) which allows a reader to see a mixed pair;
change to publish a single atomic state object that contains both the snapshot
and bundle and call Store once so Current() readers always see a consistent
pair; update the loader/reader code that uses snapshot.Load and bundle.Load to
use the new single state.Load, and ensure AdoptResourcesFrom, BuildIncremental
and any callers that expect the previousBundle still work by reading the bundle
out of the atomic state (preserving read-only sharing semantics).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a6d133a-0c3e-456c-95ef-91080b60bebb
📒 Files selected for processing (22)
commons/systemplane/bootstrap/backend.gocommons/systemplane/bootstrap/backend_test.gocommons/systemplane/bootstrap/config.gocommons/systemplane/catalog/validate.gocommons/systemplane/domain/coercion_helpers.gocommons/systemplane/domain/key_def.gocommons/systemplane/domain/key_def_test.gocommons/systemplane/ports/identity_defaults.gocommons/systemplane/ports/identity_defaults_test.gocommons/systemplane/service/component_diff.gocommons/systemplane/service/manager_writes_helpers.gocommons/systemplane/service/shutdown_test.gocommons/systemplane/service/snapshot_builder_test.gocommons/systemplane/service/supervisor_helpers.gocommons/systemplane/swagger/swagger.gocommons/tenant-manager/client/client.gocommons/tenant-manager/consumer/multi_tenant_optional_rabbitmq_test.gocommons/tenant-manager/mongo/manager.gocommons/tenant-manager/mongo/manager_test.gocommons/tenant-manager/postgres/manager.gocommons/tenant-manager/rabbitmq/manager.gocommons/tenant-manager/rabbitmq/manager_test.go
- Guard nil init errors in RecordInitError to prevent false positives - Propagate WriteResult.Revision on escalation failure in PatchConfigs/PatchSettings - Include Revision and BuiltAt in zero-snapshot check for accurate diff - Return error instead of silently skipping malformed source tags in swagger merge - Reject empty-name tags in swagger merge to prevent spec corruption - Route backend_test.go registry mutations through locked helpers - Align Actor godoc with whitespace-only fallback behavior
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
commons/systemplane/swagger/swagger.go (1)
142-156:⚠️ Potential issue | 🟠 MajorFail fast on empty source tag names instead of silently skipping.
On Line 154, empty source tag names are currently skipped with
continue. That silently accepts invalid source metadata and contradicts the “rejected” behavior described in this block.♻️ Proposed fix
if tag.Name == "" { - continue + return fmt.Errorf("swagger merge: src tag missing name") }As per coding guidelines, "Do not swallow errors; return or handle errors with context".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commons/systemplane/swagger/swagger.go` around lines 142 - 156, The loop over srcArr in swagger merge currently silently skips entries with empty tag.Name; change that behavior to fail fast by returning a descriptive error instead of using continue. In the block that unmarshals into the local tag struct (the json.Unmarshal call and subsequent if tag.Name == "" check), replace the continue with a return that wraps context (e.g., "swagger merge: empty src tag name") so the caller sees invalid source metadata; reference the srcArr iteration, the tag struct, and the json.Unmarshal location when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commons/systemplane/bootstrap/backend_test.go`:
- Around line 408-409: The test directly reads backendRegistry.factories and
backendRegistry.initErrors without holding the registry lock; change the
assertions to use backendRegistry.snapshot() (or call snapshot() into a local
variable) after ResetBackendFactories() so the reads are performed from the
locked snapshot. Specifically, replace direct accesses to
backendRegistry.factories and backendRegistry.initErrors in the test with values
obtained from backendRegistry.snapshot(), keeping assertions (e.g., assert.Len
and assert.NotEmpty) but operating on the snapshot results.
- Line 359: The test directly accesses backendRegistry.factories[testKind]
without acquiring backendRegistry's lock, which is inconsistent with other
locked helpers; change the access to read under the registry's lock (use
backendRegistry.lock/RLock around the read) or use the registry's snapshot/read
helper (e.g., call a provided snapshot() or GetFactories() method) to obtain a
copy and then index by testKind so the test consistently uses a
locked/read-snapshot approach for backendRegistry.factories.
- Line 379: The map read of backendRegistry.factories using key kind is
performed without holding the registry lock; wrap the lookup with the same read
lock used elsewhere (e.g., use backendRegistry.mu.RLock() before accessing
backendRegistry.factories[kind] and backendRegistry.mu.RUnlock() after—or use
defer to unlock) so the access is consistent with the protected read at the
other occurrence and avoids a race on backendRegistry.
In `@commons/systemplane/ports/identity_defaults.go`:
- Around line 17-25: Comments for ActorFunc, TenantFunc, and DefaultActor are
imprecise: they mention only "" but the Actor implementation trims whitespace
(treating whitespace-only values as empty). Update the exported field godocs for
ActorFunc, TenantFunc and DefaultActor to state they must return a non-empty,
non-whitespace tenant/actor ID (or that whitespace-only strings are treated as
empty) and that DefaultActor is used when ActorFunc is nil or returns an empty
or whitespace-only string; reference ActorFunc, TenantFunc, DefaultActor and the
Actor behavior when making the change.
In `@commons/systemplane/service/component_diff.go`:
- Around line 74-80: The godoc for ChangedComponents is misleading because it
currently implies any differing key marks a component changed, but the
implementation (in ChangedComponents) skips keys whose ApplyBehavior does not
require a bundle rebuild; update the comment to state explicitly that
ChangedComponents only returns components with differences that require a bundle
rebuild, that keys with non-rebuild ApplyBehavior are ignored, and that unknown
keys/invalid apply behaviors or unclassified rebuild-triggering keys force a
full rebuild for safety; reference the ApplyBehavior check in the implementation
and domain.Snapshot to make callers aware this is not a general-purpose
snapshot-diff API.
In `@commons/systemplane/service/manager_writes.go`:
- Around line 31-32: Validation failures from manager.validateConfigOps(req.Ops)
are currently returned directly and skip span error handling and wrapping;
update this branch to call manager.HandleSpanError(span, err) and return a
wrapped error consistent with the other paths (e.g., wrap the original error
with context like "validateConfigOps: %w") before returning WriteResult{} so
observability and error formatting match the other error branches.
- Around line 74-75: The call to manager.validateSettingOps(req.Ops,
subject.Scope) in manager_writes.go does not use HandleSpanError or wrap the
error with context like other error paths (e.g., PatchConfigs); update the error
handling so that if validateSettingOps returns an error you pass it through
HandleSpanError on the current span and return a wrapped error with contextual
text (include function/operation identifiers such as validateSettingOps,
subject.Scope and the surrounding write operation) to match existing patterns
used elsewhere in this function.
---
Duplicate comments:
In `@commons/systemplane/swagger/swagger.go`:
- Around line 142-156: The loop over srcArr in swagger merge currently silently
skips entries with empty tag.Name; change that behavior to fail fast by
returning a descriptive error instead of using continue. In the block that
unmarshals into the local tag struct (the json.Unmarshal call and subsequent if
tag.Name == "" check), replace the continue with a return that wraps context
(e.g., "swagger merge: empty src tag name") so the caller sees invalid source
metadata; reference the srcArr iteration, the tag struct, and the json.Unmarshal
location when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 45e14a08-9f4f-44d7-99b3-f7b9dbae1cb4
📒 Files selected for processing (6)
commons/systemplane/bootstrap/backend.gocommons/systemplane/bootstrap/backend_test.gocommons/systemplane/ports/identity_defaults.gocommons/systemplane/service/component_diff.gocommons/systemplane/service/manager_writes.gocommons/systemplane/swagger/swagger.go
Replace two separate atomic pointers (snapshot + bundle) with a single atomic.Pointer[supervisorState] so lock-free readers always see a consistent (snapshot, bundle) pair. This eliminates the torn-read window where Current() and Snapshot() could return mismatched states. Record cleanup errors from discard/close paths to the active OTEL span instead of silently discarding them with _ assignment. Add 64-bit platform note to intFromFloat64 boundary constant.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commons/systemplane/domain/coercion_helpers.go`:
- Around line 80-83: Several coercion helpers use raw == nil while others use
IsNilValue(raw), causing inconsistent handling of interface-wrapped nil
pointers; update tryCoerceInt, tryCoerceBool, tryCoerceFloat64, and
tryCoerceDuration to use IsNilValue(raw) for their nil-checks (same
behavior/output as before), so all coercion functions use the uniform IsNilValue
pattern; reference the functions tryCoerceInt, tryCoerceBool, tryCoerceFloat64,
and tryCoerceDuration and ensure IsNilValue is available in the package.
- Around line 109-126: intFromFloat64 (and likewise scaleDurationFloat64)
currently only guards against 64-bit overflow using 1<<63 and will still
overflow on 32-bit runtimes; add a runtime IntSize check using strconv.IntSize
to also reject values outside 32-bit range (e.g. if strconv.IntSize == 32 &&
(truncated >= 1<<31 || truncated < -1<<31) return failure) so casting to
int/int64 is safe, or alternatively add a build constraint at the top of
coercion_helpers.go to restrict the file to 64-bit platforms if 32-bit support
is not intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 18226445-dd2d-41d6-af59-284a307dfd13
📒 Files selected for processing (3)
commons/systemplane/domain/coercion_helpers.gocommons/systemplane/service/supervisor.gocommons/systemplane/service/supervisor_helpers.go
Summary
Testing
make cipasses (lint, format, tidy, security, vet, unit tests, integration tests)