fix: verify VMM startup before reporting container as running #397

goyalpalak18 · 2026-01-24T19:34:44Z

Description

I’ve fixed a critical race condition in our container lifecycle where we were telling the runtime (CRI) that the container had started successfully before we actually knew if the VMM was running.

Previously, we were using syscall.Exec() to replace the urunc process with the VMM. The problem with exec is that it doesn't return on success, so we were forced to send the StartSuccess signal optimistically before making the call. This meant that if the VMM failed to launch (missing binary, bad config, etc.), we ended up with a "zombie" state: Kubernetes thought the container was Running, but the process was actually dead.

To fix this, I refactored the startup logic to use a fork+exec pattern. My new flow looks like this:

Spawn: I start the VMM as a child process using exec.Command().Start().
Verify: I wait a brief window (100ms) and then check if the process is still alive using kill -0.
Record: I capture the actual child PID and save it to state.json.
Signal: I only send StartSuccess once I've confirmed the VMM is up and running.
Detach: Finally, I exit the reexec process, leaving the VMM correctly orphaned to the shim.

Changes

pkg/unikontainers/unikontainers.go:
- Switched from syscall.Exec to exec.Command().Start().
- Added the liveness check (wait 100ms + kill -0) to catch immediate crashes.
- Updated the logic to ensure we write the correct child PID to state.json before exiting.
pkg/unikontainers/types/types.go:
- I updated the VMM interface with a BuildExecCmd method. This lets us build the command arguments separately from executing them.
Hypervisor Implementations (qemu.go, firecracker.go, hvt.go, spt.go):
- I implemented BuildExecCmd for all our supported hypervisors.
- Side fix: While I was in firecracker.go, I also fixed a swallowed json.Marshal error I noticed.
pkg/unikontainers/hypervisors/hedge.go:
- Added a stub for BuildExecCmd to keep the interface satisfied.

Testing

I ran a few scenarios to confirm this behaves as expected:

Reproduction (The "Before" State): I intentionally configured urunc with a bad path to the QEMU binary. Before this fix, the pod would falsely report as Running with no logs and a bogus PID.
Verification (The Fix): using that same bad configuration, the pod now correctly fails to start. Kubernetes catches the error immediately (CreateContainerError / CrashLoopBackOff), which is exactly what we want.
Success Path: I also verified that valid unikernels still boot up fine, network comes up, and the process handoff works correctly.

Impact

No More Silent Failures: If the VMM breaks, the user will actually see it fail now.
Better Observability: Kubernetes will receive proper exit codes and error messages.
Accurate State: The PID inside state.json is finally trustworthy.

netlify · 2026-01-24T19:34:50Z

✅ Deploy Preview for urunc ready!

Name	Link
🔨 Latest commit	`b728321`
🔍 Latest deploy log	https://app.netlify.com/projects/urunc/deploys/69788ab6a073b50008b84ea5
😎 Deploy Preview	https://deploy-preview-397--urunc.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

goyalpalak18 · 2026-01-24T19:41:27Z

Hi @cmainas , this fixes the critical race condition causing silent zombie containers by switching to a fork+exec model with a liveness check.

Please take a look when you have a moment.

cmainas

Hello @goyalpalak18 ,

thank you for this contribution. This is indeed a well known race, also documented in the comments inside the codebase. However, I have some comments:

I was not able to reproduce your Testing description. K8s was correctly identifying the container as failed.
This proposal is a significant change in urunc's execution model, where the monitor process is the "init" (pid 1) of the host container. It is not easy to make such a design change that way. There are ways to resolve the race issue without changing the design. The suggested changes to pre-built the monitor command before sending the started message is one of the ways to do that.
As you can see from the tests, the changes do not work.

Add BuildExecCmd() method to VMM interface to validate command construction before sending StartSuccess message. This prevents reporting a container as "Running" when the VMM command cannot be built (e.g., due to invalid arguments or failed config writes). The fix preserves the existing PID 1 execution model where the reexec process uses syscall.Exec to become the VMM.

goyalpalak18 · 2026-01-26T18:15:01Z

@cmainas Thanks for the review. I have updated the PR based on your feedback:

Reverted Execution Model: I undid the PID 1 changes, so the original process hierarchy is preserved.

Implemented Suggestion: I added BuildExecCmd to validate the command before sending the StartSuccess message.

This resolves the race condition without changing the design.

cmainas

Thank you @goyalpalak18 for the changes, but why are we building the exec command twice? Also, if we have the build command, then we can simply exec from unikontainers.Exec rather than inside each monitor

cmainas · 2026-01-27T08:37:01Z

pkg/unikontainers/hypervisors/firecracker.go

+}

+func (fc *Firecracker) Execve(args types.ExecArgs, ukernel types.Unikernel) error {
+	exArgs, err := fc.BuildExecCmd(args, ukernel)


Why are we building the exec command twice?

goyalpalak18 · 2026-01-27T09:54:30Z

@cmainas Thanks for catching that! I've refactored the logic to simplify the flow as you suggested:

Removed the double build: BuildExecCmd is now called only once to get the arguments, which we then pass directly to the exec call.
Centralized Exec: I moved syscall.Exec up into the common unikontainers.go logic, so we don't need to repeat it inside every driver.
Simplified the Interface: I replaced Execve with a lighter PreExec hook in the VMM interface to handle just the driver-specific setup (like seccomp) before the main exec happens.

cmainas added the ok-to-test label Jan 26, 2026

cmainas reviewed Jan 26, 2026

View reviewed changes

goyalpalak18 force-pushed the fix/vmm-startup-verification branch from d40f5ac to f1ee61b Compare January 26, 2026 18:03

cmainas requested changes Jan 27, 2026

View reviewed changes

refactor: centralize Exec logic and remove redundant command building

b728321

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: verify VMM startup before reporting container as running #397

fix: verify VMM startup before reporting container as running #397

goyalpalak18 commented Jan 24, 2026

Uh oh!

netlify bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

goyalpalak18 commented Jan 24, 2026

Uh oh!

cmainas left a comment •

edited

Loading

Uh oh!

goyalpalak18 commented Jan 26, 2026

Uh oh!

cmainas left a comment

Uh oh!

cmainas Jan 27, 2026

Uh oh!

goyalpalak18 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: verify VMM startup before reporting container as running #397

Are you sure you want to change the base?

fix: verify VMM startup before reporting container as running #397

Conversation

goyalpalak18 commented Jan 24, 2026

Description

Changes

Testing

Impact

Uh oh!

netlify bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for urunc ready!

Uh oh!

goyalpalak18 commented Jan 24, 2026

Uh oh!

cmainas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goyalpalak18 commented Jan 26, 2026

Uh oh!

cmainas left a comment

Choose a reason for hiding this comment

Uh oh!

cmainas Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

goyalpalak18 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Jan 24, 2026 •

edited

Loading

cmainas left a comment •

edited

Loading