Empty NUMA topology in machines with high number of CPUs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
libvirt (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[impact]
libvirt fails to populate its NUMA topology when the machine has a large number of CPUs assigned to a single node. This happens when the number of CPUs fills the bitmask (all to one), hitting a workaround introduced to build the NUMA topology on machines that have non contiguous node ids. This has been already fixed upstream in the commits listed below.
[scope]
The fix is needed for Xenial, Bionic, Focal and Groovy.
It's fixed upstream with commits 24d7d85208 and 551fb778f5 which are included in v6.8, so both are already in hirsute.
[test case]
On a machine like the EPYC 7702P, after setting the NUMA config to NPS1 (single node per processor), or just a VM with 128 CPUs, "virsh capabilities" does not show the NUMA topology:
# virsh capabilities | xmllint --xpath '/capabilities/
<topology>
<cells num="0">
</cells>
</topology>
When it should show (edited to shorten the description):
<topology>
<cells num="1">
<cell id="0">
<memory unit="KiB"
<pages unit="KiB" size="4"
<pages unit="KiB" size="2048"
<cpus num="128">
<cpu id="0" socket_id="0" core_id="0" siblings="0"/>
....
<cpu id="127" socket_id="127" core_id="0" siblings="127"/>
</cpus>
</cell>
</cells>
</topology>
[Where problems could occur]
Any regression would likely involve a misconstruction of the NUMA topology, in particular for machines with non contiguous node ids.
Changed in libvirt (Ubuntu): | |
status: | New → Fix Released |
tags: | added: server-next |
no longer affects: | cloud-archive/victoria |
Changed in cloud-archive: | |
status: | New → Invalid |
Thanks for the report. I've subscribed Christian, who will look into this issue when possible.