RFC: Initialize vTPM with correct NVRAM size by stunes-ms · Pull Request #2819 · microsoft/openvmm

stunes-ms · 2026-02-19T21:59:02Z

OpenHCL has a bug where the vTPM can try to write past the end of its NVRAM after OpenHCL servicing (the old TpmFail bug). The TPM's serialized state does not contain the NVRAM size, so after servicing, the TPM core thinks it has 32kB available regardless of the actual size.

OpenHCL creates the TPM device by creating a new TPM engine with a blank state, then reinitializing it with the saved state from the VMGS. This reinit makes it see the correct NVRAM size. In servicing, though, we skip that reinitialization and instead restore the serialized state. The TPM will have the size it was first initialized with, which is 32kB by default.

This is a proposed fix:

When creating the TPM device, get the correct NVRAM size from the VMGS.
Create the initial empty TPM with the correct size, instead of the default 32kB.

This fix depends on an ms-tpm-20-ref-rs change: microsoft/ms-tpm-20-ref-rs@32a4da7

The downside of this approach is that we now access the VMGS during servicing init. One way around that would be to save the NVRAM size in the OpenHCL servicing save state, but that doesn't fix the problem for existing VMs. AFAICT, the NVRAM size isn't available anywhere else when the TPM device is created.

The best solution is to fix this in the TPM core code by adding the NVRAM size to its serialized state. However, that would change the format of its save state, which would break UH's ability to live-service the TPM core. Doing this fix correctly would probably require adding versioning to the TPM core save-state, and also figuring out how to handle a serialized TPM state that doesn't have the NVRAM size in the TPM core.

Copilot

Pull request overview

This RFC proposes a fix for a bug in OpenHCL where the vTPM incorrectly uses a default 32KB NVRAM size after servicing, even when the actual NVRAM is a different size (e.g., 16KB). The issue occurs because the TPM's serialized state doesn't include the NVRAM size, and during servicing restore (when is_restoring: true), the TPM device skips the on_first_boot() initialization that would normally load the saved NVRAM and detect the correct size.

Changes:

Modified TPM device initialization to read the existing NVRAM blob before creating the TPM engine and use its size for initialization
Updated the dependency to point to a fork of ms-tpm-20-ref-rs that supports InitKind::ColdInitWithSize
Added the Cargo.lock changes reflecting the dependency update

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File	Description
vm/devices/tpm/tpm_device/src/lib.rs	Reads NVRAM blob size during TPM initialization and uses it to create the TPM engine with the correct size instead of the default 32KB
Cargo.toml	Updates ms-tpm-20-ref dependency to point to a personal fork with the ColdInitWithSize feature
Cargo.lock	Reflects the dependency change, including transitive updates to windows-sys versions

Copilot · 2026-02-19T22:04:43Z

vm/devices/tpm/tpm_device/src/lib.rs

+        let blob_size = existing_nvmem_blob
+            .map(|v| v.len())
+            .unwrap_or(ms_tpm_20_ref::NV_MEMORY_SIZE);


The blob size is determined directly from the stored NVRAM data without validation. While the expected sizes are 16KB (LEGACY_VTPM_SIZE) or 32KB (STANDARD_VTPM_SIZE), this code will accept any blob size. Consider adding validation to ensure the blob_size is within expected bounds (e.g., at least a minimum size and not exceeding a maximum reasonable size) to prevent issues if the NVRAM store is corrupted or contains unexpected data. This is especially important in the servicing restore path where on_first_boot() is skipped and the blob won't be validated by reset().

Copilot · 2026-02-19T22:04:44Z

vm/devices/tpm/tpm_device/src/lib.rs

+        let existing_nvmem_blob = nvram_store
+            .restore()
+            .await
+            .map_err(TpmErrorKind::ReadNvramState)?;
+        let blob_size = existing_nvmem_blob
+            .map(|v| v.len())
+            .unwrap_or(ms_tpm_20_ref::NV_MEMORY_SIZE);
+
        let tpm_engine = MsTpm20RefPlatform::initialize(
            Box::new(TpmPlatformCallbacks {
                pending_nvram: pending_nvram.clone(),
                monotonic_timer,
            }),
-            ms_tpm_20_ref::InitKind::ColdInit,
+            ms_tpm_20_ref::InitKind::ColdInitWithSize(blob_size),
        )
        .map_err(TpmErrorKind::InstantiateTpm)?;


This change introduces logic to initialize the TPM with the correct NVRAM size from the stored blob, which is critical for the servicing restore path. However, there's no test coverage for the servicing scenario where is_restoring is true. Consider adding a test that verifies:

A TPM can be created with is_restoring: true and an existing NVRAM blob

The TPM correctly uses the blob's size instead of the default 32KB

The TPM can be successfully restored from servicing state without encountering size mismatch errors

This would help ensure the fix addresses the TpmFail bug in servicing scenarios.

Copilot · 2026-02-19T22:04:44Z

vm/devices/tpm/tpm_device/src/lib.rs

+        let existing_nvmem_blob = nvram_store
+            .restore()
+            .await
+            .map_err(TpmErrorKind::ReadNvramState)?;
+        let blob_size = existing_nvmem_blob
+            .map(|v| v.len())
+            .unwrap_or(ms_tpm_20_ref::NV_MEMORY_SIZE);


This PR introduces a second call to nvram_store.restore() during TPM initialization. The first call occurs here to determine the blob size, and a second call happens in on_first_boot() (when is_restoring is false). While this is safe according to the NonVolatileStore trait semantics, it may have performance implications depending on the store implementation, especially if the NVRAM store involves expensive I/O operations (e.g., reading from VMGS). Consider whether the blob could be cached after the first read to avoid the duplicate I/O, or add a comment explaining why the double read is acceptable.

Copilot · 2026-02-19T22:04:44Z

Cargo.toml

 macaddr = "1.0"
 mimalloc = { version = "0.1.39", default-features = false }
-ms-tpm-20-ref = { version = "0.1", git = "https://github.com/microsoft/ms-tpm-20-ref-rs.git", branch = "main" }
+ms-tpm-20-ref = { version = "0.1", git = "https://github.com/stunes-ms/ms-tpm-20-ref-rs.git", branch = "user/mikestunes/init-size" }


This PR points to a personal fork of ms-tpm-20-ref-rs on a user branch rather than the official Microsoft repository. This is appropriate for an RFC but should NOT be merged until the upstream changes are merged to the main branch of the official repository. The dependency change introduces a risk of using unvetted code and makes the build non-reproducible for others. Please ensure the upstream PR is merged and update this dependency to point to the official repository before merging.

Suggested change

ms-tpm-20-ref = { version = "0.1", git = "https://github.com/stunes-ms/ms-tpm-20-ref-rs.git", branch = "user/mikestunes/init-size" }

ms-tpm-20-ref = { version = "0.1", git = "https://github.com/microsoft/ms-tpm-20-ref-rs.git" }

github-actions · 2026-02-20T02:18:12Z

At least one Petri test failed.

chris-oo · 2026-02-20T23:28:39Z

vm/devices/tpm/tpm_device/src/lib.rs

 struct TpmPlatformCallbacks {
    pending_nvram: Arc<Mutex<Vec<u8>>>,
-    monotonic_timer: MonotonicTimer,
+    monotonic_timer: Arc<Mutex<MonotonicTimer>>,


why did this become an arc mutex?

ah, for the unknown case?

chris-oo · 2026-02-20T23:29:56Z

vm/devices/tpm/tpm_device/src/lib.rs

            };
            self.requested_locality = requested_locality;
+
+            // TODO: UNKNOWN: if the NVRAM sizes in the save state and the VMGS don't match,


In what case is this even possible? It seems like this is fatal or we migrated in a way that's not really expected, right?

Yes, I think that's right -- if we get here, we screwed up somewhere, somehow. Maybe I'll just crash here. (That'll let me undo the Arc also.)

openhcl/underhill_core/src/worker.rs

mebersol · 2026-02-23T19:34:17Z

can we add a servicing test here?

mebersol · 2026-02-23T19:35:55Z

Cargo.lock

 dependencies = [
 "libc",
- "windows-sys 0.60.2",
+ "windows-sys 0.59.0",


"windows-sys 0.59.0",

rebase needed?

Deps need to be cleaned up altogether -- I'll take care of this after microsoft/ms-tpm-20-ref-rs#9 merges.

stunes-ms added 4 commits February 18, 2026 14:39

(WIP) logs in TPM servicing

37db352

Add logs

17f4a10

Fixup

94595f9

Fmt

6775794

stunes-ms requested a review from a team as a code owner February 19, 2026 21:59

Copilot AI review requested due to automatic review settings February 19, 2026 21:59

stunes-ms requested a review from a team as a code owner February 19, 2026 21:59

Copilot started reviewing on behalf of stunes-ms February 19, 2026 21:59 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

stunes-ms added 2 commits February 19, 2026 17:17

save/restore NVRAM size

aafab23

Fix

d6a897a

stunes-ms force-pushed the user/mikestunes/service-tpm-size branch from abdcb38 to d6a897a Compare February 20, 2026 01:18

stunes-ms marked this pull request as draft February 20, 2026 19:49

stunes-ms added 4 commits February 20, 2026 11:53

Get size from vmgs

d8af59e

Tidy up

79a7211

Use usize

b48f140

Fix usize

599f384

chris-oo reviewed Feb 20, 2026

View reviewed changes

stunes-ms added 2 commits February 23, 2026 10:01

Use only VMGS size

f8ea23d

Remove testing log

6a68058

mebersol reviewed Feb 23, 2026

View reviewed changes

	ms-tpm-20-ref = { version = "0.1", git = "https://github.com/stunes-ms/ms-tpm-20-ref-rs.git", branch = "user/mikestunes/init-size" }
	ms-tpm-20-ref = { version = "0.1", git = "https://github.com/microsoft/ms-tpm-20-ref-rs.git" }

Comments

Conversation

stunes-ms commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

chris-oo Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

chris-oo Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

chris-oo Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

stunes-ms Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mebersol commented Feb 23, 2026

Uh oh!

mebersol Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

stunes-ms Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants