[rhcos-4.18] tests: add fips.hmac to verify VM will fail to reboot with FIPS and wrong hmac#4473
Conversation
|
Hi @openshift-cherrypick-robot. Thanks for your PR. I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Code Review
This pull request adds a new test to verify that a FIPS-enabled VM fails to reboot if the kernel HMAC is corrupted. The changes include adding a new test flag and the test implementation itself. My review focuses on improving the robustness of the new test by replacing a fixed-duration sleep with a more idiomatic and reliable waiting mechanism, which should prevent test flakiness and improve clarity.
| // Remount /boot to change HMAC value in /boot/ostree/<hash>/.vmlinuz.*.hmac | ||
| c.RunCmdSync(m, "sudo mount -o remount,rw /boot") | ||
| c.RunCmdSyncf(m, "sudo sh -c 'echo change > %s'", hmacFile) | ||
|
|
||
| // Initiate reboot | ||
| if err := platform.StartReboot(m); err != nil { | ||
| c.Fatalf("Failed to initiate reboot: %v", err) | ||
| } | ||
|
|
||
| // Wait for the boot to fail. Since the HMAC is corrupted, the machine | ||
| // will fail FIPS integrity check and never come back online. | ||
| // Using a 90 second timeout to allow enough time for boot attempt to fail. | ||
| time.Sleep(90 * time.Second) | ||
|
|
||
| // Verify the machine did not come back online by attempting SSH | ||
| _, _, err = m.SSH("whoami") | ||
| if err == nil { | ||
| c.Fatal("Expected machine to fail booting with corrupted HMAC, but it came back online") | ||
| } |
There was a problem hiding this comment.
Using a fixed time.Sleep to wait for a machine to fail can lead to flaky tests (if the failure takes longer than the sleep duration) or slow test execution. A more robust approach is to actively check for the expected state.
In this case, you can use m.WaitForReboot() and assert that it times out, which proves the machine did not come back online. This is more idiomatic within the test framework and avoids a hardcoded sleep.
Here's a suggested implementation that first retrieves the boot ID, then reboots and uses WaitForReboot to confirm the machine does not recover:
// Get boot ID for reboot check
bootIDBytes, err := c.SSH(m, "cat /proc/sys/kernel/random/boot_id")
if err != nil {
c.Fatalf("Failed to get boot ID: %v", err)
}
bootID := strings.TrimSpace(string(bootIDBytes))
// Remount /boot to change HMAC value in /boot/ostree/<hash>/.vmlinuz.*.hmac
c.RunCmdSync(m, "sudo mount -o remount,rw /boot")
c.RunCmdSyncf(m, "sudo sh -c 'echo change > %s'", hmacFile)
// Initiate reboot
if err := platform.StartReboot(m); err != nil {
c.Fatalf("Failed to initiate reboot: %v", err)
}
// Wait for the boot to fail. Since the HMAC is corrupted, the machine
// will fail FIPS integrity check and never come back online.
// We use WaitForReboot and expect it to time out.
err = m.WaitForReboot(90*time.Second, bootID)
if err == nil {
c.Fatal("Expected machine to fail booting with corrupted HMAC, but it came back online")
}
// We expect a timeout error, anything else is a problem.
if !strings.Contains(err.Error(), "timed out") {
c.Fatalf("Unexpected error waiting for reboot: %v", err)
}|
/ok-to-test |
|
/test rhcos |
This is an automated cherry-pick of #4437
/assign HuijingHei