Skip to content

Simulate numa nodes#4428

Merged
ravanelli merged 1 commit intocoreos:mainfrom
angelcerveraroldan:numa-nodes-option
Feb 27, 2026
Merged

Simulate numa nodes#4428
ravanelli merged 1 commit intocoreos:mainfrom
angelcerveraroldan:numa-nodes-option

Conversation

@angelcerveraroldan
Copy link
Member

Add an option for external tests to be executed by a machine with multiple numa nodes.

@openshift-ci
Copy link

openshift-ci bot commented Feb 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the capability to simulate NUMA nodes in QEMU for external tests. The changes involve adding a NumaNodes option and plumbing it through various layers of the test harness and platform configuration. The core logic for generating QEMU arguments for NUMA is in mantle/platform/qemu.go.

I've identified a couple of issues in the implementation within mantle/platform/qemu.go that could lead to incorrect resource allocation for the simulated NUMA nodes. My review includes suggestions to fix these bugs.

Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, it is looking good, let's squash the commits and follow the suggestion to simply it with using on/off with 2 NUMA nodes

@angelcerveraroldan
Copy link
Member Author

/test all

@angelcerveraroldan angelcerveraroldan force-pushed the numa-nodes-option branch 2 times, most recently from 5590322 to aafd091 Compare February 12, 2026 15:46
Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments


ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node0MemoryDevice, node0Memory))
ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node1MemoryDevice, memoryMiB-node0Memory))
ret = append(ret, "-numa", fmt.Sprintf("node,memdev=%s,cpus=%d-%d,nodeid=0", node0MemoryDevice, 0, node0Cpus-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 0, does here? That the second does not need it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you used this doc? https://www.qemu.org/docs/master/system/invocation.html#cmdoption-smp
Can you add the link for it, so we can understand what this code is doing?
Also, why you did not used the socket option? -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the docs I used, yes. Should I add the link as a comment in the codebase, or just to this PR ?

There was no big reason why I didnt use the socket flag. Would you prefer it be used when defining the nodes ?

The 0 is just the starting CPU for node0. The reason it isn't there for the other node is that they are both being given different CPUs. Right now, if your machine has 4 cpus, node0 would be given cpus=0-1 and node1 would be given cpus=2-3.

Maybe it would simplify the code if we hard coded it to be 2 cpus, one for each node, or even simpler if we gave node 0 one cpu, and node 1 only has memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the link as a comment in the codebase

Yes, add it as a comment so, everyone can understand what it is doing

The 0 is just the starting CPU for node0. The reason it isn't there for the other node is that they are both being given different CPUs. Right now, if your machine has 4 cpus, node0 would be given cpus=0-1 and node1 would be given cpus=2-3.

And what if we have 128 cpus? Where is set the limit?
For for the approach bellow, we do have the limit. It seems to be a more precise way.
-smp 8,sockets=2,cores=2,threads=2,maxcpus=8
Maybe we do need to set sockets and such, but at least a limit like -smp 4 or even 8, something like:

ret = append(ret, "-smp", "8")
ret = append(ret, "-numa", fmt.Sprintf("node,memdev=%s,cpus=%d-%d,nodeid=0", node0MemoryDevice, 0, node0Cpus-1))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, to be honestly, that's maybe not worth since we arealdy pass the cpu as part of the qemu test/emulation, yeah, we can skip the -smp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the -smp flag is currently being set to whatver integer builder.Processors is set to. If this has not been defined, then it will be set to 2 by default for NUMA tests, and to 1 if it is not a NUMA test. So by default, its just 2 CPUs.

If builder.Processors has been set to be more than 2 processors, then the nodes will try and split them evenly.

Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angelcerveraroldan can you also create the actually Numa test to be added in the kola tests? So we can validate the test?

@angelcerveraroldan angelcerveraroldan force-pushed the numa-nodes-option branch 2 times, most recently from e236364 to 1758cc5 Compare February 25, 2026 10:45
Add a new boolean flag "numaNodes" allowing QEMU simulation of two
NUMA nodes.
Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Thanks for working on it!
/lgtm

@ravanelli ravanelli merged commit 014dc0e into coreos:main Feb 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants