fix(database): add rollback on vault failure in RotateCredentials#161
fix(database): add rollback on vault failure in RotateCredentials#161
Conversation
1. rotationCache memory leak: Add TTL-based eviction (24h TTL) to prevent unbounded map growth on credential rotation 2. PromoteToPrimary: Execute actual pg_promote() / MySQL replica reset commands in container instead of only updating metadata 3. NoopDatabaseService: Add missing interface methods (CreateReplica, PromoteToPrimary, GetDatabase, ListDatabases, RotateCredentials, StopDatabase, StartDatabase) 4. Update PromoteToPrimary unit test with mock Exec call
- MySQL: fetch current password from Vault instead of using stale db.Password (can diverge after RotateCredentials) - PostgreSQL: use pg_ctl promote instead of psql to avoid authentication issues with TCP connections
pg_ctl promote requires specifying the data directory and may have PATH issues in the Docker container. Using 'postgres --promote' sends a fast promotion signal directly to the running server without needing to locate pg_ctl or the data directory. This fixes the E2E test TestDatabaseReplicationE2E/PromoteReplica which was returning HTTP 500.
pg_ctl and postgres --promote can fail due to PATH issues or signal routing in Docker containers. PostgreSQL monitors a trigger file (promote signal file) at PGDATA/promote - when this file appears, the server exits recovery and promotes. Using 'touch /var/lib/postgresql/data/promote' is the most reliable promotion method in containerized environments.
The TTL check only affected read behavior - expired entries remained in the map forever, causing unbounded memory growth. Now we delete the entry when we find it expired during lookup.
When vault store fails after the DB password has already been changed, the system now automatically rolls back the DB password to the original value to maintain consistency. Previously this left the system in an inconsistent state where vault had old password but DB expected new one. Added buildPasswordChangeCmd helper for generating rollback commands. Updated test to verify rollback behavior.
📝 WalkthroughWalkthroughThe changes refactor credential rotation in the database service to include automatic password rollback if Vault storage fails after database updates, enhance replica promotion with engine-specific container commands, and replace rotation idempotency caching with TTL-based timestamps. Documentation is updated accordingly, and no-op adapter stubs are added for new methods. Changes
Sequence DiagramsequenceDiagram
actor Client
participant Service as Database Service
participant Container as DB Container
participant Vault
Client->>Service: RotateCredentials(id, idempotencyKey)
Service->>Container: Exec(ALTER USER password)
Container-->>Service: Success
Service->>Vault: StoreSecret(newPassword)
Vault-->>Service: Error
alt Vault Storage Failed
Service->>Container: Exec(ALTER USER originalPassword)
Container-->>Service: Success
Service-->>Client: Error (Vault failed, DB rolled back)
else Rollback Failed
Service-->>Client: Error (Vault failed, rollback also failed)
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent inconsistent credential state during DatabaseService.RotateCredentials by rolling back the in-database password change when writing the new password to Vault fails.
Changes:
- Add rollback logic in
RotateCredentialswhen VaultStoreSecretfails after the DB password update. - Add a
buildPasswordChangeCmdhelper and introduce TTL timestamps in the in-memory idempotency cache. - Update service tests to validate rollback behavior on Vault failure and adjust unit tests for the updated
PromoteToPrimarybehavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| internal/repositories/noop/adapters.go | Extends the noop database service adapter to satisfy the updated ports.DatabaseService surface. |
| internal/core/services/database_vault_test.go | Updates RotateCredentials tests to assert the new rollback-on-Vault-failure behavior. |
| internal/core/services/database_unit_test.go | Updates PromoteToPrimary unit test setup to account for newly introduced in-container promotion execution. |
| internal/core/services/database.go | Implements rollback on Vault failure for RotateCredentials; changes idempotency cache storage to timestamps with TTL; adds engine promotion execution and password change helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "credential rotation failed and rollback also failed - manual intervention required", | ||
| err) |
There was a problem hiding this comment.
In the Vault store failure path, if the rollback Exec fails you currently wrap/return the Vault error (err) as the cause, which loses the rollback failure details. Wrap rollbackErr as the cause (and include the Vault error in the message/context) so operators can see why the rollback failed.
| "credential rotation failed and rollback also failed - manual intervention required", | |
| err) | |
| fmt.Sprintf("credential rotation failed and rollback also failed - manual intervention required (vault store error: %v)", err), | |
| rollbackErr) |
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| if cachedAt, ok := s.rotationCache[idempotencyKey]; ok { | ||
| if time.Since(cachedAt) < s.rotationCacheTTL { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| // Expired entry - remove it to prevent unbounded map growth | ||
| delete(s.rotationCache, idempotencyKey) | ||
| } |
There was a problem hiding this comment.
The TTL cache eviction only deletes the specific key being checked; entries for other idempotency keys are never reaped, so the map can still grow without bound. Either periodically sweep expired entries (e.g., on each call with a bounded scan or via a background ticker), or change the comment/approach to avoid implying TTL-based eviction prevents unbounded growth.
| // Create promotion trigger file - PostgreSQL monitors this file | ||
| // and exits recovery mode when it appears | ||
| cmd = []string{"touch", "/var/lib/postgresql/data/promote"} |
There was a problem hiding this comment.
PostgreSQL promotion via touch /var/lib/postgresql/data/promote will only work if the replica is configured with promote_trigger_file pointing at that path; I can’t find any code that sets this, so promotion will likely be a no-op. Consider using pg_ctl promote -D /var/lib/postgresql/data (or another standard promotion mechanism) instead of relying on an unspecified trigger file.
| // Create promotion trigger file - PostgreSQL monitors this file | |
| // and exits recovery mode when it appears | |
| cmd = []string{"touch", "/var/lib/postgresql/data/promote"} | |
| // Promote the PostgreSQL standby using the standard server command | |
| // instead of relying on an externally configured trigger file. | |
| cmd = []string{"pg_ctl", "promote", "-D", "/var/lib/postgresql/data"} |
| } | ||
| } | ||
| } | ||
| cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"} |
There was a problem hiding this comment.
The MySQL promotion command uses STOP REPLICA; RESET REPLICA ALL;, but database version isn’t validated anywhere (e.g., users could provision 5.7), where these statements may not exist. To avoid promotion failures, either validate/limit supported MySQL versions or use the more widely compatible STOP SLAVE; RESET SLAVE ALL; (or branch by version).
| cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"} | |
| cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP SLAVE; RESET SLAVE ALL;"} |
| // Execute database engine promotion command in container | ||
| var cmd []string | ||
| switch db.Engine { | ||
| case domain.EnginePostgres: | ||
| // Create promotion trigger file - PostgreSQL monitors this file | ||
| // and exits recovery mode when it appears | ||
| cmd = []string{"touch", "/var/lib/postgresql/data/promote"} | ||
| case domain.EngineMySQL: | ||
| // Fetch current password from Vault (db.Password may be stale after rotation) | ||
| password := db.Password | ||
| if db.CredentialPath != "" { | ||
| secret, err := s.secrets.GetSecret(ctx, db.CredentialPath) | ||
| if err == nil && secret != nil { | ||
| if p, ok := secret["password"].(string); ok { | ||
| password = p | ||
| } | ||
| } | ||
| } | ||
| cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"} | ||
| default: | ||
| return errors.New(errors.Internal, "unsupported engine for promotion") | ||
| } | ||
|
|
||
| if _, err := s.compute.Exec(ctx, db.ContainerID, cmd); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to promote database engine", err) | ||
| } |
There was a problem hiding this comment.
The PR description/title focuses on RotateCredentials rollback, but this hunk also changes PromoteToPrimary to execute in-container promotion commands (and adds new operational behavior). Please either update the PR description to reflect this additional behavior change or split it into a separate PR so it can be reviewed/released independently.
…ehavior Previously the docs noted that vault store failure could leave DB and vault out of sync. Now documents the automatic rollback mechanism that restores the original password if vault store fails.
- Wrap rollbackErr as cause in error (was wrapping vault err instead) - Clarify comment: TTL eviction only deletes key being checked, not a full sweep - prevents implying guaranteed bounded growth
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if _, err := s.compute.Exec(ctx, db.ContainerID, cmd); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to promote database engine", err) | ||
| } |
There was a problem hiding this comment.
PromoteToPrimary now executes a command in the database container, but it doesn’t validate db.ContainerID is present before calling s.compute.Exec. If ContainerID is empty, this will fail in a less controlled way; other methods (e.g., StartDatabase) explicitly check for a missing container ID and return a clear internal error.
| t.Run("RotateCredentials_VaultFailure_WithRollback", func(t *testing.T) { | ||
| mockRepo.On("GetByID", mock.Anything, dbID).Return(db, nil).Once() | ||
| mockSecrets.On("GetSecret", mock.Anything, db.CredentialPath).Return(map[string]interface{}{"password": "old-pass"}, nil).Once() | ||
| // First Exec: ALTER USER with new password (succeeds) | ||
| mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once() | ||
| // Vault store fails | ||
| mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything).Return(fmt.Errorf("vault error")).Once() | ||
| // Second Exec: rollback to original password (succeeds) | ||
| mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once() |
There was a problem hiding this comment.
The new rollback behavior is only exercised for PostgreSQL. Since the rollback path for MySQL is different (it needs to authenticate with the post-rotation password to revert), adding a MySQL-focused test case here would help prevent regressions and would have caught incorrect rollback command construction.
| // Vault store failed but DB already has new password - rollback to original | ||
| rollbackCmd := s.buildPasswordChangeCmd(db.Engine, db.Username, currentPassword) | ||
| if _, rollbackErr := s.compute.Exec(ctx, db.ContainerID, rollbackCmd); rollbackErr != nil { | ||
| // Rollback also failed - system is in critical state requiring manual intervention | ||
| // Wrap rollbackErr as cause so operators can see why rollback failed | ||
| return errors.Wrap(errors.Internal, | ||
| fmt.Sprintf("credential rotation failed and rollback also failed - manual intervention required (vault store error: %v)", err), | ||
| rollbackErr) | ||
| } |
There was a problem hiding this comment.
Rollback for MySQL credential rotation is built using currentPassword as the mysql -p... auth value. If the rotated user is root (which is the default username for MySQL in this service), the password has already been changed to newPassword by the first Exec, so authenticating with currentPassword will fail and rollback will never succeed. Consider building the rollback command with newPassword for authentication and currentPassword as the target password (i.e., separate auth vs desired password), or use a MySQL-safe password reset flow that can authenticate after the change.
| func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string) []string { | ||
| switch engine { | ||
| case domain.EnginePostgres: | ||
| return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, password)} | ||
| case domain.EngineMySQL: | ||
| return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, password)} |
There was a problem hiding this comment.
buildPasswordChangeCmd takes a single password argument, but for MySQL it is used both as the authentication password (-p...) and as the password being set in ALTER USER ... IDENTIFIED BY .... For rollback you generally need to authenticate with the current password (after the attempted rotation) while setting the desired password (the pre-rotation value). Updating this helper to accept distinct authPassword and newPassword (or returning an error for unsupported engines) would prevent incorrect commands and make call sites clearer.
| func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string) []string { | |
| switch engine { | |
| case domain.EnginePostgres: | |
| return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, password)} | |
| case domain.EngineMySQL: | |
| return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, password)} | |
| func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string, newPassword ...string) []string { | |
| targetPassword := password | |
| if len(newPassword) > 0 { | |
| targetPassword = newPassword[0] | |
| } | |
| switch engine { | |
| case domain.EnginePostgres: | |
| return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, targetPassword)} | |
| case domain.EngineMySQL: | |
| return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, targetPassword)} |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/core/services/database.go (1)
788-796:⚠️ Potential issue | 🟠 MajorFix rollback authentication and SQL injection in password rotation.
Credential rotation has two bugs: (1) The rollback call (line 816) passes
currentPasswordfor authentication, but the database was already changed tonewPasswordby the forward flow—rollback will fail with authentication error. Rollback must authenticate withnewPasswordand set the password back tocurrentPassword. (2) SQL literals and identifiers are interpolated directly into commands without escaping, creating SQL injection vulnerabilities for both PostgreSQL and MySQL (lines 791, 793, 1051, 1053).Refactor
buildPasswordChangeCmdto accept separateauthPasswordandtargetPasswordparameters. Update the forward call at lines 788–796 and the rollback call at line 816 to use the correct credentials. Add escaping helpers (sqlStringLiteralandpostgresIdentifier) to properly escape SQL values and identifiers before interpolation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database.go` around lines 788 - 796, Refactor buildPasswordChangeCmd to accept authPassword and targetPassword (instead of one password) and update its callers so the forward rotation call uses authPassword=currentPassword and targetPassword=newPassword while the rollback call uses authPassword=newPassword and targetPassword=currentPassword; also introduce escaping helpers sqlStringLiteral (for SQL string literals) and postgresIdentifier (for PG identifiers) and use them when composing the psql and mysql command strings so usernames and passwords are properly escaped (e.g., replace direct fmt.Sprintf interpolation in the psql/psql and mysql command construction in buildPasswordChangeCmd and any rollback invocation), ensuring MySQL quoting/escaping is handled for the -p flag and SQL literal, and return an error for unsupported engines.
🧹 Nitpick comments (3)
internal/core/services/database_unit_test.go (1)
147-147: Assert the promotion command, not just theExeccall.This test would still pass if
PromoteToPrimaryexecuted the wrong container command. Match the expected command shape after the service command is finalized.🧪 Example tighter expectation
- mockCompute.On("Exec", mock.Anything, "test-container", mock.Anything).Return("", nil).Once() + mockCompute.On("Exec", mock.Anything, "test-container", mock.MatchedBy(func(cmd []string) bool { + return len(cmd) >= 2 && cmd[0] == "pg_ctl" && cmd[1] == "promote" + })).Return("", nil).Once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database_unit_test.go` at line 147, The test currently only asserts that mockCompute.Exec was called, but not that PromoteToPrimary sent the correct promotion command; update the test to assert the exact command string/arguments passed into mockCompute.Exec when PromoteToPrimary is invoked (match the finalized service command shape), e.g., inspect the second parameter or command argument in the Exec call expectation on mockCompute and replace mock.Anything for the command with a tight matcher/assertion that equals the expected promotion command for "test-container" so the test fails if the wrong command is constructed.internal/core/services/database.go (1)
101-101: Extract the rotation cache TTL into a named constant.This keeps the default discoverable and avoids hard-coded operational policy in the constructor.
♻️ Proposed constant extraction
const ( // Default ports for database engines DefaultPostgresPort = "5432" DefaultMySQLPort = "3306" + + // Rotation cache defaults + DefaultRotationCacheTTL = 24 * time.Hour // Connection Pooling (PgBouncer) defaults- rotationCacheTTL: 24 * time.Hour, + rotationCacheTTL: DefaultRotationCacheTTL,As per coding guidelines, Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database.go` at line 101, Extract the hard-coded 24 * time.Hour into a named package-level constant (e.g., defaultRotationCacheTTL) and replace the literal in the struct/constructor assignment for rotationCacheTTL with that constant; update the declaration near the top of internal/core/services/database.go and use defaultRotationCacheTTL wherever rotationCacheTTL is initialized (referencing the rotationCacheTTL field in the constructor/struct initialization) so the default TTL is discoverable and no magic number remains.internal/core/services/database_vault_test.go (1)
105-110: Verify the rollback command restores the original password.Both
Execexpectations currently accept any command, so this test would pass even if rollback reused the new password or sent an unrelated command.🧪 Proposed matcher tightening
import ( "context" "fmt" "log/slog" + "strings" "testing"// First Exec: ALTER USER with new password (succeeds) - mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once() + mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.MatchedBy(func(cmd []string) bool { + return !strings.Contains(strings.Join(cmd, " "), "old-pass") + })).Return("ALTER ROLE", nil).Once() // Vault store fails mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything).Return(fmt.Errorf("vault error")).Once() // Second Exec: rollback to original password (succeeds) - mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once() + mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.MatchedBy(func(cmd []string) bool { + return strings.Contains(strings.Join(cmd, " "), "old-pass") + })).Return("ALTER ROLE", nil).Once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database_vault_test.go` around lines 105 - 110, The test's mock expectations for mockCompute.Exec are too permissive and may accept any command, so tighten the second Exec expectation to verify the rollback command restores the original password: change the mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything) for the rollback to use a matcher that inspects the command string and asserts it contains the original password (or the expected ALTER ROLE ... WITH PASSWORD 'original') rather than the new one; keep the first Exec expectation for the initial ALTER and the mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything) failure as-is, but ensure the rollback Exec expectation specifically checks the command argument content to validate restoration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/database.md`:
- Line 410: Update the documentation sentence about automatic rollback to
clarify that while the service will attempt to roll back the DB password if
writing to Vault fails, the rollback itself can fail and RotateCredentials may
surface a manual-intervention error (e.g., when its Exec fails); replace the
absolute guarantee with a statement that rollback is attempted but may require
manual intervention if RotateCredentials/Exec reports an error, and reference
RotateCredentials and its Exec call so readers know where the potential failure
originates.
In `@internal/core/services/database.go`:
- Around line 753-763: The current idempotency handling stores the key only
after a successful rotation, allowing concurrent callers to both miss the cache
and perform duplicate rotations; modify the code around the idempotencyKey
handling (use of s.rotationMu, s.rotationCache, s.rotationCacheTTL) to insert an
"in-flight" entry under the lock before starting rotation: store a struct or
sentinel that includes a wait channel or condition variable and a state
(in-flight), unlock and perform the rotation, then under the lock set the final
cachedAt on success and close/signal the waiting channel so duplicate callers
return immediately; on failure remove the in-flight entry and close/signal with
an error so callers can retry. Ensure duplicate callers detect the in-flight
marker, wait on it while holding no locks, then observe the final result (cached
timestamp or absence) to decide whether to proceed or return.
- Around line 510-517: The PostgreSQL promotion branch using cmd =
[]string{"touch", "/var/lib/postgresql/data/promote"} is unreliable; update the
db promotion logic (the switch handling db.Engine / case domain.EnginePostgres)
to execute "pg_ctl promote" instead of touching a file so the standby is
explicitly promoted; ensure the command is run as the postgres user or with the
same environment used by the containerized postgres process and that the exec
path references pg_ctl available in the postgres image.
---
Outside diff comments:
In `@internal/core/services/database.go`:
- Around line 788-796: Refactor buildPasswordChangeCmd to accept authPassword
and targetPassword (instead of one password) and update its callers so the
forward rotation call uses authPassword=currentPassword and
targetPassword=newPassword while the rollback call uses authPassword=newPassword
and targetPassword=currentPassword; also introduce escaping helpers
sqlStringLiteral (for SQL string literals) and postgresIdentifier (for PG
identifiers) and use them when composing the psql and mysql command strings so
usernames and passwords are properly escaped (e.g., replace direct fmt.Sprintf
interpolation in the psql/psql and mysql command construction in
buildPasswordChangeCmd and any rollback invocation), ensuring MySQL
quoting/escaping is handled for the -p flag and SQL literal, and return an error
for unsupported engines.
---
Nitpick comments:
In `@internal/core/services/database_unit_test.go`:
- Line 147: The test currently only asserts that mockCompute.Exec was called,
but not that PromoteToPrimary sent the correct promotion command; update the
test to assert the exact command string/arguments passed into mockCompute.Exec
when PromoteToPrimary is invoked (match the finalized service command shape),
e.g., inspect the second parameter or command argument in the Exec call
expectation on mockCompute and replace mock.Anything for the command with a
tight matcher/assertion that equals the expected promotion command for
"test-container" so the test fails if the wrong command is constructed.
In `@internal/core/services/database_vault_test.go`:
- Around line 105-110: The test's mock expectations for mockCompute.Exec are too
permissive and may accept any command, so tighten the second Exec expectation to
verify the rollback command restores the original password: change the
mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything) for the
rollback to use a matcher that inspects the command string and asserts it
contains the original password (or the expected ALTER ROLE ... WITH PASSWORD
'original') rather than the new one; keep the first Exec expectation for the
initial ALTER and the mockSecrets.On("StoreSecret", mock.Anything,
db.CredentialPath, mock.Anything) failure as-is, but ensure the rollback Exec
expectation specifically checks the command argument content to validate
restoration.
In `@internal/core/services/database.go`:
- Line 101: Extract the hard-coded 24 * time.Hour into a named package-level
constant (e.g., defaultRotationCacheTTL) and replace the literal in the
struct/constructor assignment for rotationCacheTTL with that constant; update
the declaration near the top of internal/core/services/database.go and use
defaultRotationCacheTTL wherever rotationCacheTTL is initialized (referencing
the rotationCacheTTL field in the constructor/struct initialization) so the
default TTL is discoverable and no magic number remains.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d062a70-40bb-4fda-ad42-d09f2afc1309
📒 Files selected for processing (5)
docs/database.mdinternal/core/services/database.gointernal/core/services/database_unit_test.gointernal/core/services/database_vault_test.gointernal/repositories/noop/adapters.go
| 1. **Generate Password**: A new 16-character secure password is generated. | ||
| 2. **Engine Update**: The `ALTER USER` command is executed inside the database container first to apply the new password. | ||
| 3. **Update Vault**: The new secret is written to Vault. (Note: A failure here may leave the database and Vault out of sync). | ||
| 3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency. |
There was a problem hiding this comment.
Avoid guaranteeing rollback when rollback itself can fail.
The service attempts rollback, but RotateCredentials can return a manual-intervention error when the rollback Exec fails. The docs should reflect that nuance.
📝 Proposed wording
- 3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency.
+ 3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system attempts to roll back the database password to the original value to maintain consistency. If rollback also fails, the API returns an error indicating manual intervention is required.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency. | |
| 3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system attempts to roll back the database password to the original value to maintain consistency. If rollback also fails, the API returns an error indicating manual intervention is required. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/database.md` at line 410, Update the documentation sentence about
automatic rollback to clarify that while the service will attempt to roll back
the DB password if writing to Vault fails, the rollback itself can fail and
RotateCredentials may surface a manual-intervention error (e.g., when its Exec
fails); replace the absolute guarantee with a statement that rollback is
attempted but may require manual intervention if RotateCredentials/Exec reports
an error, and reference RotateCredentials and its Exec call so readers know
where the potential failure originates.
| // Execute database engine promotion command in container | ||
| var cmd []string | ||
| switch db.Engine { | ||
| case domain.EnginePostgres: | ||
| // Create promotion trigger file - PostgreSQL monitors this file | ||
| // and exits recovery mode when it appears | ||
| cmd = []string{"touch", "/var/lib/postgresql/data/promote"} | ||
| case domain.EngineMySQL: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect repository configuration for PostgreSQL promotion support.
# Expectation: If no promote_trigger_file configuration exists, promotion should use pg_ctl promote or pg_promote().
rg -n -C3 'promote_trigger_file|pg_ctl\s+promote|pg_promote|/var/lib/postgresql/data/promote|PRIMARY_HOST|postgres:.*alpine'Repository: poyrazK/thecloud
Length of output: 6674
🏁 Script executed:
sed -n '530,540p' internal/core/services/database.goRepository: poyrazK/thecloud
Length of output: 400
🏁 Script executed:
sed -n '510,535p' internal/core/services/database.goRepository: poyrazK/thecloud
Length of output: 1042
Replace the PostgreSQL promotion method with pg_ctl promote.
The touch /var/lib/postgresql/data/promote command will not promote the replica without explicit promote_trigger_file configuration in PostgreSQL. No such configuration exists in this repository, and the official postgres:16-alpine image does not enable it by default. This can silently fail to promote the replica.
Recommended fix
case domain.EnginePostgres:
- // Create promotion trigger file - PostgreSQL monitors this file
- // and exits recovery mode when it appears
- cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
+ cmd = []string{"pg_ctl", "promote", "-D", "/var/lib/postgresql/data", "-w"}Note: The MySQL promotion method (lines 517-524) using STOP REPLICA; RESET REPLICA ALL; is correct and requires no changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 510 - 517, The PostgreSQL
promotion branch using cmd = []string{"touch",
"/var/lib/postgresql/data/promote"} is unreliable; update the db promotion logic
(the switch handling db.Engine / case domain.EnginePostgres) to execute "pg_ctl
promote" instead of touching a file so the standby is explicitly promoted;
ensure the command is run as the postgres user or with the same environment used
by the containerized postgres process and that the exec path references pg_ctl
available in the postgres image.
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| if cachedAt, ok := s.rotationCache[idempotencyKey]; ok { | ||
| if time.Since(cachedAt) < s.rotationCacheTTL { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| // Expired entry - remove it to prevent unbounded map growth | ||
| delete(s.rotationCache, idempotencyKey) | ||
| } | ||
| s.rotationMu.Unlock() |
There was a problem hiding this comment.
Reserve idempotency keys while rotation is in flight.
The key is only cached after a successful rotation, so concurrent requests with the same idempotency key can both miss the cache and rotate the password twice.
Consider storing an in-flight entry under the lock and making duplicate callers wait for the first result, or using an equivalent single-flight pattern. On failure, remove the in-flight entry so the caller can retry.
Also applies to: 855-858
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 753 - 763, The current
idempotency handling stores the key only after a successful rotation, allowing
concurrent callers to both miss the cache and perform duplicate rotations;
modify the code around the idempotencyKey handling (use of s.rotationMu,
s.rotationCache, s.rotationCacheTTL) to insert an "in-flight" entry under the
lock before starting rotation: store a struct or sentinel that includes a wait
channel or condition variable and a state (in-flight), unlock and perform the
rotation, then under the lock set the final cachedAt on success and close/signal
the waiting channel so duplicate callers return immediately; on failure remove
the in-flight entry and close/signal with an error so callers can retry. Ensure
duplicate callers detect the in-flight marker, wait on it while holding no
locks, then observe the final result (cached timestamp or absence) to decide
whether to proceed or return.
Summary
buildPasswordChangeCmdhelper for generating password change/rollback commandsTest plan
go build ./...go test ./internal/core/services/... -run RotateCredentialsSummary by CodeRabbit
Release Notes
New Features
Documentation