Conversation
…es forced by cnpg
Reshape cluster class crd
…ntroller postgresql logical db controller
…ster-controller Postgres Cluster controller with base functionality.
* addition of connection pooler types * moved ConnectionPooler to CNPGConfig * createdconnectionPoolerEnabled in cluster config * delete connectionpooler_types.go
remove semver check, add messages extended description
fix validation logic for storage and postgresversion
…ogic postgres db controller logic
…to fix-infinite-loop-cluster-reconciler
…iler fix infinite loop on reconciliation
…ecrets-management Add secrets management for roles and its use in cluster
…-and-refactor-CM-secret-logic simplify configMap and Secret, fix finalizer logic
kubabuczak
left a comment
There was a problem hiding this comment.
Please rebase with develop. For some reason part of CI checks were not executed (it could be because of merge conflicts)
There was a problem hiding this comment.
Why this file is part of this MR?
There was a problem hiding this comment.
removing, this is an orphan it shouldn't be here
There was a problem hiding this comment.
Should this file be part of this MR?
There was a problem hiding this comment.
removing, this is an orphan it shouldn't be here
There was a problem hiding this comment.
I'm confused, because this is part of current develop:
| @@ -28,6 +28,16 @@ spec: | |||
| kind: ClusterMaster | |||
| name: clustermasters.enterprise.splunk.com | |||
| version: v3 | |||
| - description: DatabaseClass is the Schema for the databaseclasses API. | |||
There was a problem hiding this comment.
Hmm.. I have not idea what this comment is referring to.
There was a problem hiding this comment.
Are those validation tests used anywhere?
I would say they are good candidates for kuttl based testing
There was a problem hiding this comment.
This will be moved to ginko integration/e2e testing once we have it. We already started this work
| // Note: CNPG PoolerStatus only tracks scheduled instances, not ready pods. | ||
| func (r *PostgresClusterReconciler) isPoolerReady(pooler *cnpgv1.Pooler, err error) bool { | ||
| if err != nil { | ||
| return false |
There was a problem hiding this comment.
Agree - instead you should check if poller != nil
|
|
||
| if role.PasswordSecretRef != nil { | ||
| cnpgRole.PasswordSecret = &cnpgv1.LocalObjectReference{ | ||
| Name: role.PasswordSecretRef.Name, |
There was a problem hiding this comment.
LocalObjectReference doesn't contain namespace
| endpoints := resolveClusterEndpoints(cluster, cnpgCluster, postgresDB.Namespace) | ||
| if err := r.reconcileUserConfigMaps(ctx, postgresDB, endpoints); err != nil { | ||
| if statusErr := updateStatus(configMapsReady, metav1.ConditionFalse, reasonConfigMapsCreationFailed, | ||
| fmt.Sprintf("Failed to reconcile ConfigMaps: %v", err), provisioningDBPhase); statusErr != nil { |
There was a problem hiding this comment.
In general it's better not to expose internal go errors in k8s resources (apply everywhere)
| func (r *PostgresDatabaseReconciler) verifyRolesReady( | ||
| ctx context.Context, | ||
| expectedUsers []string, | ||
| cnpgCluster *cnpgv1.Cluster, | ||
| ) ([]string, error) { |
There was a problem hiding this comment.
[nit] Hmm.. if it's only verification, why we are returning string slice containing notReady? Maybe it should be named differently
There was a problem hiding this comment.
Same for verifyDatabasesReady
There was a problem hiding this comment.
it is verification but we return list of notReady Databases so we can update status of our CR with that list so users see what is not ready. Aside of that it could be bool. Do you think this pattern is not accurate?
| // userSecretName gives both secret creation and status wiring a single source of truth | ||
| // for naming — eliminating any risk of the two sides drifting out of sync. | ||
| func userSecretName(postgresDBName, dbName, role string) string { | ||
| return fmt.Sprintf("%s-%s-%s", postgresDBName, dbName, role) |
There was a problem hiding this comment.
I would add some prefix/postfix to reduce the possibility of conflict (apply to all derived names)
Also there is a length limit for name of each k8s resource - maybe we need to check for it.
There was a problem hiding this comment.
we explicitly removed prefix/postfix to simplify the logic. the secret is named by postgres cluster name and db-name which provide needed uniqueness within one namespace. I see we have maximum size for resources around https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names 253 chars which should be sufficient
kubabuczak
left a comment
There was a problem hiding this comment.
Some review from AI - some of them makes a lot of sense, some of them needs some discovery
Secret deletion breaks retained CNPG clusters
When ClusterDeletionPolicy: Retain, the finalizer removes the owner reference from the CNPG Cluster but not from the superuser Secret. The Secret is cascade-deleted with PostgresCluster, leaving the retained CNPG cluster referencing a non-existent secret.
Samples kustomization references non-existent files
config/samples/kustomization.yaml references enterprise_v4_database.yaml, enterprise_v4_databaseclass.yaml, and enterprise_v4_postgrescluster.yaml — none of which exist. This will break kustomize build.
Typo in type name: PosgresClusterClassConfig
The type is named PosgresClusterClassConfig (missing a t in Postgres) in postgresclusterclass_types.go. This propagates into the generated CRD and deepcopy code.
Broken print columns in PostgresClusterClass CRD
Print columns reference .spec.postgresClusterConfig.* but the actual spec field serializes as .spec.config.*. kubectl get postgresclusterclass will show empty columns.
No conflict handling for finalizer updates in PostgresDatabase
PostgresCluster handles apierrors.IsConflict when adding finalizers; PostgresDatabase does not, causing unnecessary reconcile failures on resource version conflicts.
Status updates use stale objects
updateStatus in PostgresCluster mutates the in-memory object and calls r.Status().Update(). Under concurrent updates, this can overwrite changes. Should use Patch with client.MergeFrom.
ConnectionPoolerConfig field is dead code
PostgresCluster spec includes ConnectionPoolerConfig with a comment implying it has effect, but a CEL validation rule (!has(self.connectionPoolerConfig)) prevents it from ever being set. Either remove the field or the validation.
No minLength validation on clusterRef.name and database name
Both fields accept empty strings, which would cause the reconciler to create invalid CNPG resources.
poolerExists swallows errors
The function treats all non-NotFound API errors as "pooler does not exist" (returns false), hiding real API issues and potentially causing incorrect behavior.
PostgresClusterClass immutability not enforced
Description says "immutable after creation" but there are no CEL rules preventing mutations to the spec after initial creation.
Sample comment contradicts value
enterprise_v4_postgrescluster_override.yaml has a comment saying clusterDeletionPolicy: Retain but the actual value is Delete.
JSON tag inconsistency: ConfigMapRef serializes as "configMap"
In DatabaseInfo, other refs use the *Ref suffix in JSON tags (databaseRef, adminUserSecretRef, rwUserSecretRef) but ConfigMapRef serializes as configMap.
No pattern validation on database names
DatabaseDefinition.Name has MaxLength: 30 but no pattern restricting to valid PostgreSQL identifiers.
PostgresClusterClass missing resource path in marker
Should have +kubebuilder:resource:path=postgresclusterclasses,scope=Cluster for consistency with other types.
Database CRDs not included in preserveUnknownFields patch
Existing CRDs are patched with preserveUnknownFields: false but the three new database CRDs are not.
Status updated unconditionally
syncStatus and updateStatus always call r.Status().Update() even when nothing changed, creating unnecessary API calls and potential conflicts.
Role removal not propagated
When a role is removed from spec.managedRoles, it is not patched as Ensure: absent in CNPG. Roles can only be added, not removed (acknowledged via TODO).
ObservedGeneration early-exit may miss state regression
If external state changes (e.g., CNPG cluster deleted) but the CR's generation hasn't changed, status won't be updated
bf24bbb to
6bd518c
Compare
…alizer-leaving-resources
…ng-resources Feature/database finalizer leaving resources
…-rw-user-db-access Add support to apply RW role privileges to the database
| @@ -0,0 +1,203 @@ | |||
| /* | |||
| Copyright 2026. | |||
There was a problem hiding this comment.
[nit] I would double check what is prefered option for copyright year range
There was a problem hiding this comment.
who might give us this info? What is the policy for the SOK itself?
fix reconciliation storm issue for postgrescluster
…trollers-base-refactor
…e-refactor Move postgres controller code to package, cleanup crds, fix minor issues
Description
Introduces the MVP PostgreSQL contoller stack to the Splunk Operator. Rather than
implementing PostgreSQL cluster management from scratch, this PR builds a Kubernetes-native
wrapper on top of CloudNativePG (CNPG) — a production-grade
PostgreSQL operator. The Splunk Operator layer adds three new CRDs and two new controllers
that translate high-level declarative intent (cluster classes, logical databases, managed
roles) into CNPG primitives, while handling the Splunk-specific lifecycle concerns:
credential provisioning, connection metadata publishing, and multi-phase status tracking.
Key Changes
api/v4/postgresclusterclass_types.go(new)Defines the Go API schema for
PostgresClusterClass— a cluster-scoped CRD thatstandardises PostgreSQL cluster configuration (instance count, storage, Postgres version,
resource limits, pg_hba rules, connection pooler) across environments. Operators define
devandprodclasses; teams reference them by name.api/v4/postgrescluster_types.go(new)Defines the Go API schema for
PostgresCluster— the namespaced CRD teams use to requesta PostgreSQL cluster. Declares spec fields for merging class defaults with per-cluster
overrides, and status fields for provisioner reference, connection pooler state, and
managed role tracking.
api/v4/postgresdatabase_types.go(new)Defines the Go API schema for
PostgresDatabase— the CRD that declares one or morelogical databases on a
PostgresCluster, including reclaim policy and the list ofdatabase definitions. Status tracks per-database secret and ConfigMap references.
internal/controller/postgresoperator_common_types.go(new)Shared constants, phase enums, condition types, and condition reasons used across both
controllers. Single source of truth for retry delays, finalizer names, endpoint suffixes,
and all status strings.
internal/controller/postgrescluster_controller.go(new)Reconciles
PostgresClusterCRs against CNPG:PostgresClusterspec into a CNPGcluster, merging any defaults inherited from the referenced
PostgresClusterClasswithper-cluster overrides
PgBouncer) lifecycle alongside the cluster,enabling high-throughput workloads to share a bounded pool of database connections
preventing orphaned infrastructure
ClusterReady,PoolerReady) so thePostgresDatabasecontroller and operators can wait on a confirmed healthy cluster beforeproceeding
internal/controller/postgresdatabase_controller.go(new)Reconciles
PostgresDatabaseCRs against CNPG:PostgresClusteris healthy before creating any database resources,so partial provisioning is never left in an inconsistent state
each role is scoped to its own database to enforce least-privilege access
Secrets,ready to be mounted by workloads that need database access
when available) into a
ConfigMapper database, giving consuming services a stablereference without hardcoding hostnames
deletion policy that controls whether the database is physically dropped when the CR is
deleted
ClusterReady,SecretsReady,ConfigMapsReady,UsersReady,DatabasesReady) so operators can track exactly which phase reconciliationis in at any point
cmd/main.goBoth new controllers registered with the manager.
config/rbac/Nine new RBAC files: admin, editor, and viewer roles for
PostgresCluster,PostgresClusterClass, andPostgresDatabase.config/samples/Five sample CRs:
devandprodcluster classes,defaultandoverrideclusterinstances, and a
PostgresDatabaseexample.Testing and Verification
Manual Testing
End-to-end flow verified by applying the full CR chain in order:
1. Verify the
PostgresClusteris ready2. Verify CNPG provisioned the underlying cluster
3. Verify the
PostgresDatabasestatus and conditions4. Verify CNPG
DatabaseCRs were created for each declared database5. Inspect Secrets and ConfigMaps created per database
6. Connect to the database using credentials and endpoints from the ConfigMap and Secret
Additional scenarios verified:
pooler-rw-hostandpooler-ro-hostkeys appear in ConfigMap only when the clusterhas connection pooling enabled
PostgresDatabaseCR triggers clean-up respecting thedeletionPolicydeclared per database (
Retainpreserves data,Deletedrops the database)Automated Tests — Planned Next
Unit and integration tests covering the full reconciliation lifecycle, error paths, and
status condition transitions are scheduled as the next deliverable for this feature branch.
Related Issues
Jira tickets, GitHub issues, Support tickets...
PR Checklist