Skip to content

WIP: Test: requeue PerformanceStatus update when write fails#1479

Open
MarSik wants to merge 1 commit intoopenshift:release-4.20from
MarSik:release-4.20
Open

WIP: Test: requeue PerformanceStatus update when write fails#1479
MarSik wants to merge 1 commit intoopenshift:release-4.20from
MarSik:release-4.20

Conversation

@MarSik
Copy link
Copy Markdown
Contributor

@MarSik MarSik commented Mar 10, 2026

Test PR, do not merge here. I will resubmit for the main branch once the fix is validated.

A failure to write a PerformanceProfile status could cause the status to be stuck in stale state for a long time, because it will only be recomputed when a new reconcile event arrives. That can actually take hours.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Walkthrough

Modified error handling in the performance profile controller to include a fixed 30-second requeue delay when status update operations fail, replacing empty reconcile results with results containing the requeue timing.

Changes

Cohort / File(s) Summary
Performance Profile Controller Error Handling
pkg/performanceprofile/controller/performanceprofile_controller.go
Added statusUpdateRequeueAfter constant set to 30 seconds and updated three error paths to include RequeueAfter timing when status updates fail, affecting component creation failures, condition updates, and machine config pool validation scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/RHsyseng/operator-utils@v1.4.13: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/go-systemd@v0.0.0-20191104093116-d3cd4ed1dbcf: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/ignition@v0.35.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/ignition/v2@v2.22.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/docker/go-units@v0.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-logr/stdr@v1.2.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0

... [truncated 18026 characters] ...

io/kubectl: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/kubelet: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/legacy-cloud-providers: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2026
@MarSik
Copy link
Copy Markdown
Contributor Author

MarSik commented Mar 10, 2026

@jmencak This could solve the issue we were investigating.

@openshift-ci openshift-ci bot requested review from Tal-or and jmencak March 10, 2026 14:49
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2026
@jmencak
Copy link
Copy Markdown
Contributor

jmencak commented Mar 11, 2026

/test e2e-hypershift

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 11, 2026

@MarSik: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jmencak
Copy link
Copy Markdown
Contributor

jmencak commented Apr 8, 2026

/agentic_review

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 8, 2026

Code Review by Qodo

🐞 Bugs (1)   📘 Rule violations (0)   📎 Requirement gaps (0)   🎨 UX Issues (0)
🐞\ ≡ Correctness (1)

Grey Divider


Action required

1. RequeueAfter ignored with error 🐞
Description
Reconcile returns (Result{RequeueAfter: statusUpdateRequeueAfter}, err) on status update
failures, but controller-runtime ignores any non-zero Result when err != nil and requeues using
exponential backoff instead of the intended 30s delay. This breaks the PR’s stated goal and can
cause immediate/repeated retries rather than the fixed retry delay.
Code

pkg/performanceprofile/controller/performanceprofile_controller.go[R506-516]

		if err := r.StatusWriter.Update(ctx, instance, conditions); err != nil {
			klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
-			return reconcile.Result{}, err
+			return reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
		}
		return reconcile.Result{}, err
	}
	err = r.StatusWriter.UpdateOwnedConditions(ctx, instance)
	if err != nil {
		klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
+		return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, err
	}
Evidence
The reconciler explicitly documents statusUpdateRequeueAfter as a delay for retrying after status
update failures, but it returns that delay together with a non-nil error. Controller-runtime’s
reconcile handler logs a warning that the result is ignored when an error is returned, and it
rate-limits the request instead of honoring RequeueAfter.

pkg/performanceprofile/controller/performanceprofile_controller.go[63-69]
pkg/performanceprofile/controller/performanceprofile_controller.go[502-517]
vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go[333-356]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The reconciler currently returns `Result{RequeueAfter: ...}` together with a non-nil `err` in status-write failure paths. In controller-runtime, `Result` is ignored when `err != nil`, so the requested 30s delay is not applied.

### Issue Context
Goal is to retry reconciliation after a status update failure with a fixed delay (30s) so status doesn’t remain stale, without triggering immediate rate-limited retries.

### Fix Focus Areas
- pkg/performanceprofile/controller/performanceprofile_controller.go[502-517]

### Suggested change
- In the status update failure branches, change returns from `return ctrl/reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err` to `return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, nil` (keeping the existing log line so the failure is still visible).
- Ensure any other requeue-after paths do not return a non-nil error if the delay must be honored.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@jmencak
Copy link
Copy Markdown
Contributor

jmencak commented Apr 8, 2026

/agentic_describe

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Requeue PerformanceProfile reconciliation on status update failures

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add requeue delay when PerformanceProfile status update fails
• Prevent status from becoming stale during write failures
• Introduce statusUpdateRequeueAfter constant for retry timing
• Apply requeue logic to three status update failure paths
Diagram
flowchart LR
  A["Status Update Fails"] --> B["Return RequeueAfter Result"]
  B --> C["Retry After 30 Seconds"]
  C --> D["Prevent Stale Status State"]
Loading

Grey Divider

File Changes

1. pkg/performanceprofile/controller/performanceprofile_controller.go 🐞 Bug fix +7/-3

Add requeue delays for status update failures

• Added statusUpdateRequeueAfter constant set to 30 seconds for retry delays
• Modified three status update failure handlers to return RequeueAfter instead of immediate error
• Updated error paths in Reconcile() method for component creation and owned conditions failures
• Updated error path in getAndValidateMCP() method for MachineConfigPool validation failures

pkg/performanceprofile/controller/performanceprofile_controller.go


Grey Divider

Qodo Logo

Comment on lines 506 to 516
if err := r.StatusWriter.Update(ctx, instance, conditions); err != nil {
klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
return reconcile.Result{}, err
return reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
}
return reconcile.Result{}, err
}
err = r.StatusWriter.UpdateOwnedConditions(ctx, instance)
if err != nil {
klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Requeueafter ignored with error 🐞 Bug ≡ Correctness

Reconcile returns (Result{RequeueAfter: statusUpdateRequeueAfter}, err) on status update
failures, but controller-runtime ignores any non-zero Result when err != nil and requeues using
exponential backoff instead of the intended 30s delay. This breaks the PR’s stated goal and can
cause immediate/repeated retries rather than the fixed retry delay.
Agent Prompt
### Issue description
The reconciler currently returns `Result{RequeueAfter: ...}` together with a non-nil `err` in status-write failure paths. In controller-runtime, `Result` is ignored when `err != nil`, so the requested 30s delay is not applied.

### Issue Context
Goal is to retry reconciliation after a status update failure with a fixed delay (30s) so status doesn’t remain stale, without triggering immediate rate-limited retries.

### Fix Focus Areas
- pkg/performanceprofile/controller/performanceprofile_controller.go[502-517]

### Suggested change
- In the status update failure branches, change returns from `return ctrl/reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err` to `return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, nil` (keeping the existing log line so the failure is still visible).
- Ensure any other requeue-after paths do not return a non-nil error if the delay must be honored.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Contributor

@jmencak jmencak Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, CodeRabbit didn't catch this @MarSik , looks like there's some value in Qodo.

Edit: CodeRabbit AI found it too.

@jmencak
Copy link
Copy Markdown
Contributor

jmencak commented Apr 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/performanceprofile/controller/performanceprofile_controller.go (1)

543-556: ⚠️ Potential issue | 🟠 Major

Same issue: RequeueAfter ignored due to non-nil error return.

Apply the same fix pattern here—return nil for error when status update fails to honor the 30-second requeue.

Proposed fix
 	if err != nil {
 		conditions := status.GetDegradedConditions(status.ConditionFailedToFindMachineConfigPool, err.Error())
-		if err := r.StatusWriter.Update(ctx, profile, conditions); err != nil {
+		if statusErr := r.StatusWriter.Update(ctx, profile, conditions); statusErr != nil {
 			klog.Errorf("failed to update performance profile %q status: %v", profile.GetName(), err)
-			return nil, &reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
+			return nil, &reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, nil
 		}
 		return nil, &reconcile.Result{}, nil
 	}

 	if err := validateProfileMachineConfigPool(profile, profileMCP); err != nil {
 		conditions := status.GetDegradedConditions(status.ConditionBadMachineConfigLabels, err.Error())
-		if err := r.StatusWriter.Update(ctx, profile, conditions); err != nil {
+		if statusErr := r.StatusWriter.Update(ctx, profile, conditions); statusErr != nil {
 			klog.Errorf("failed to update performance profile %q status: %v", profile.GetName(), err)
-			return nil, &reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
+			return nil, &reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, nil
 		}
 		return nil, &reconcile.Result{}, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/performanceprofile/controller/performanceprofile_controller.go` around
lines 543 - 556, The status update error handling after
validateProfileMachineConfigPool(profile, profileMCP) incorrectly returns a
non-nil error which prevents the reconcile.Result.RequeueAfter
(statusUpdateRequeueAfter) from being honored; update the block that calls
r.StatusWriter.Update(ctx, profile, conditions) so that when the status update
fails you still log the error (klog.Errorf) but return nil for the error and
&reconcile.Result{RequeueAfter: statusUpdateRequeueAfter} (i.e., mirror the
earlier fixed pattern used above), referencing validateProfileMachineConfigPool,
status.GetDegradedConditions, r.StatusWriter.Update, and
statusUpdateRequeueAfter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/performanceprofile/controller/performanceprofile_controller.go`:
- Around line 506-515: The status update error handling is returning both a
non-nil error and a Result with RequeueAfter (so the fixed 30s requeue is
ignored); in the Reconcile flow (and in getAndValidateMCP) change the error
returns after r.StatusWriter.Update and r.StatusWriter.UpdateOwnedConditions to
log the error via klog.Errorf (including instance.GetName() and err), then
return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, nil (i.e., drop the
error) so the fixed requeue is honored; ensure this change is applied to all
four locations referencing r.StatusWriter.Update,
r.StatusWriter.UpdateOwnedConditions and getAndValidateMCP where
statusUpdateRequeueAfter is currently paired with a non-nil err.

---

Outside diff comments:
In `@pkg/performanceprofile/controller/performanceprofile_controller.go`:
- Around line 543-556: The status update error handling after
validateProfileMachineConfigPool(profile, profileMCP) incorrectly returns a
non-nil error which prevents the reconcile.Result.RequeueAfter
(statusUpdateRequeueAfter) from being honored; update the block that calls
r.StatusWriter.Update(ctx, profile, conditions) so that when the status update
fails you still log the error (klog.Errorf) but return nil for the error and
&reconcile.Result{RequeueAfter: statusUpdateRequeueAfter} (i.e., mirror the
earlier fixed pattern used above), referencing validateProfileMachineConfigPool,
status.GetDegradedConditions, r.StatusWriter.Update, and
statusUpdateRequeueAfter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ca6448a-225d-4081-af9c-19ad5a481598

📥 Commits

Reviewing files that changed from the base of the PR and between 601b61f and 7adaaa1.

📒 Files selected for processing (1)
  • pkg/performanceprofile/controller/performanceprofile_controller.go

Comment on lines 506 to +515
if err := r.StatusWriter.Update(ctx, instance, conditions); err != nil {
klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
return reconcile.Result{}, err
return reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
}
return reconcile.Result{}, err
}
err = r.StatusWriter.UpdateOwnedConditions(ctx, instance)
if err != nil {
klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

RequeueAfter is ignored when returning a non-nil error.

Per controller-runtime's reconciliation loop (see vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:334-347), when an error is returned, the framework uses rate-limited exponential backoff and ignores the Result entirely. The log at line 344 confirms: "The result will always be ignored if the error is non-nil."

All four modified locations return both RequeueAfter: statusUpdateRequeueAfter and a non-nil err, so the 30-second fixed delay will never be applied.

To achieve the intended fixed requeue interval, return nil for the error and log it separately:

Proposed fix for lines 506-516
 		conditions := status.GetDegradedConditions(status.ConditionReasonComponentsCreationFailed, err.Error())
-		if err := r.StatusWriter.Update(ctx, instance, conditions); err != nil {
+		if statusErr := r.StatusWriter.Update(ctx, instance, conditions); statusErr != nil {
 			klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
-			return reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, err
+			return reconcile.Result{RequeueAfter: statusUpdateRequeueAfter}, nil
 		}
 		return reconcile.Result{}, err
 	}
-	err = r.StatusWriter.UpdateOwnedConditions(ctx, instance)
-	if err != nil {
+	if statusErr := r.StatusWriter.UpdateOwnedConditions(ctx, instance); statusErr != nil {
 		klog.Errorf("failed to update performance profile %q status: %v", instance.GetName(), err)
-		return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, err
+		return ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, nil
 	}

The same pattern applies to the status update failures in getAndValidateMCP (lines 543-545 and 551-554).

Alternatively, if exponential backoff is acceptable, remove the RequeueAfter to avoid misleading code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/performanceprofile/controller/performanceprofile_controller.go` around
lines 506 - 515, The status update error handling is returning both a non-nil
error and a Result with RequeueAfter (so the fixed 30s requeue is ignored); in
the Reconcile flow (and in getAndValidateMCP) change the error returns after
r.StatusWriter.Update and r.StatusWriter.UpdateOwnedConditions to log the error
via klog.Errorf (including instance.GetName() and err), then return
ctrl.Result{RequeueAfter: statusUpdateRequeueAfter}, nil (i.e., drop the error)
so the fixed requeue is honored; ensure this change is applied to all four
locations referencing r.StatusWriter.Update,
r.StatusWriter.UpdateOwnedConditions and getAndValidateMCP where
statusUpdateRequeueAfter is currently paired with a non-nil err.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants