Skip to content

[rhcos-4.18] tests: add fips.hmac to verify VM will fail to reboot with FIPS and wrong hmac#4473

Open
openshift-cherrypick-robot wants to merge 1 commit intocoreos:rhcos-4.18from
openshift-cherrypick-robot:cherry-pick-4437-to-rhcos-4.18
Open

[rhcos-4.18] tests: add fips.hmac to verify VM will fail to reboot with FIPS and wrong hmac#4473
openshift-cherrypick-robot wants to merge 1 commit intocoreos:rhcos-4.18from
openshift-cherrypick-robot:cherry-pick-4437-to-rhcos-4.18

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #4437

/assign HuijingHei

@openshift-ci
Copy link

openshift-ci bot commented Mar 6, 2026

Hi @openshift-cherrypick-robot. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new test to verify that a FIPS-enabled VM fails to reboot if the kernel HMAC is corrupted. The changes include adding a new test flag and the test implementation itself. My review focuses on improving the robustness of the new test by replacing a fixed-duration sleep with a more idiomatic and reliable waiting mechanism, which should prevent test flakiness and improve clarity.

Comment on lines +72 to +90
// Remount /boot to change HMAC value in /boot/ostree/<hash>/.vmlinuz.*.hmac
c.RunCmdSync(m, "sudo mount -o remount,rw /boot")
c.RunCmdSyncf(m, "sudo sh -c 'echo change > %s'", hmacFile)

// Initiate reboot
if err := platform.StartReboot(m); err != nil {
c.Fatalf("Failed to initiate reboot: %v", err)
}

// Wait for the boot to fail. Since the HMAC is corrupted, the machine
// will fail FIPS integrity check and never come back online.
// Using a 90 second timeout to allow enough time for boot attempt to fail.
time.Sleep(90 * time.Second)

// Verify the machine did not come back online by attempting SSH
_, _, err = m.SSH("whoami")
if err == nil {
c.Fatal("Expected machine to fail booting with corrupted HMAC, but it came back online")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed time.Sleep to wait for a machine to fail can lead to flaky tests (if the failure takes longer than the sleep duration) or slow test execution. A more robust approach is to actively check for the expected state.

In this case, you can use m.WaitForReboot() and assert that it times out, which proves the machine did not come back online. This is more idiomatic within the test framework and avoids a hardcoded sleep.

Here's a suggested implementation that first retrieves the boot ID, then reboots and uses WaitForReboot to confirm the machine does not recover:

	// Get boot ID for reboot check
	bootIDBytes, err := c.SSH(m, "cat /proc/sys/kernel/random/boot_id")
	if err != nil {
		c.Fatalf("Failed to get boot ID: %v", err)
	}
	bootID := strings.TrimSpace(string(bootIDBytes))

	// Remount /boot to change HMAC value in /boot/ostree/<hash>/.vmlinuz.*.hmac
	c.RunCmdSync(m, "sudo mount -o remount,rw /boot")
	c.RunCmdSyncf(m, "sudo sh -c 'echo change > %s'", hmacFile)

	// Initiate reboot
	if err := platform.StartReboot(m); err != nil {
		c.Fatalf("Failed to initiate reboot: %v", err)
	}

	// Wait for the boot to fail. Since the HMAC is corrupted, the machine
	// will fail FIPS integrity check and never come back online.
	// We use WaitForReboot and expect it to time out.
	err = m.WaitForReboot(90*time.Second, bootID)
	if err == nil {
		c.Fatal("Expected machine to fail booting with corrupted HMAC, but it came back online")
	}
	// We expect a timeout error, anything else is a problem.
	if !strings.Contains(err.Error(), "timed out") {
		c.Fatalf("Unexpected error waiting for reboot: %v", err)
	}

@HuijingHei
Copy link
Member

/ok-to-test

@HuijingHei
Copy link
Member

/test rhcos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants