CPU hotplug fails in the system with empty numa nodes, "Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument"

Bug #1709877 reported by bugproxy on 2017-08-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
David Britton
libvirt (Ubuntu)
Undecided
Ubuntu on IBM Power Systems Bug Triage
Xenial
Low
ChristianEhrhardt

Bug Description

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2017-07-19 04:13:18 ==
CPU hotplug operation fails in the host with empty numa nodes(with no memory) even though VM placement is static and with/without numad is running.
..
 <vcpu placement='static' current='4'>32</vcpu>
...

# virsh setvcpus virt-tests-vm1 6 --live
error: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument

# numactl --hardware
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32 40
node 0 size: 16188 MB
node 0 free: 1119 MB
node 1 cpus: 48 56 64 72 80 88
node 1 size: 32630 MB
node 1 free: 13233 MB
node 16 cpus: 96 104 112 120 128 136
node 16 size: 0 MB
node 16 free: 0 MB
node 17 cpus: 144 152 160 168 176 184
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node 0 1 16 17
  0: 10 20 40 40
  1: 20 10 40 40
 16: 40 40 10 20
 17: 40 40 20 10

# cat /sys/fs/cgroup/cpuset/cpuset.mems
0-1

Host:
#uname -a
Linux powerkvm4-lp1 4.10.0-27-generic #30~16.04.2-Ubuntu SMP Thu Jun 29 16:06:52 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

ii libvirt-bin 1.3.1-1ubuntu10.11
ii numad 0.5+20150602-4
qemu-kvm 1:2.5+dfsg-5ubuntu10.14

bugproxy (bugproxy) on 2017-08-10
tags: added: architecture-ppc64le bugnameltc-156806 severity-high targetmilestone-inin16043
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → libvirt (Ubuntu)

------- Comment From <email address hidden> 2017-08-10 09:35 EDT-------
From Nitesh:

-----------------------------

The following commit resolves the issue:

commit 77cb01bc0fec4d0da02e1d4df75d28870b0ef926
Author: Peter Krempa <email address hidden>
Date: Tue Sep 13 15:55:06 2016 +0200

numa: Rename virNumaGetHostNodeset and make it return only nodes with memory
Name it virNumaGetHostMemoryNodeset and return only NUMA nodes which
have memory installed. This is necessary as the kernel is not very happy
to set the memory cgroup setting for nodes which do not have any memory.
This would break vcpu hotplug with following message on such
configruation:
Invalid value '0,8' for 'cpuset.mems': Invalid argument
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1375268

ChristianEhrhardt (paelzer) wrote :

Change is in >=v2.3.0 which makes this Fix released for a while.
Lets add and consider SRU from there.

Changed in libvirt (Ubuntu):
status: New → Fix Released
ChristianEhrhardt (paelzer) wrote :

The first release to have a version >=2.3 was Zesty which implied this is good >=UCA-Ocata and thereby available in a supported way to LTS users as well if they opt into UCA.

ChristianEhrhardt (paelzer) wrote :

There is some noise when applying this to Xenial.
Nothing too big, but I think at least
commit 5555dc0d7fe0267e2ff6e5a9625164f2896f9cc5 (HEAD)
Author: Peter Krempa <email address hidden>
Date: Tue Sep 13 14:28:33 2016 +0200

    util: numa: Remove impossible error handling

Would be needed to apply better, yet OTOH the code back in Xenial might not fulfill this condition.

Not rocket science but I see some work and regression potential which means for the SRU I'd like to have a really good case.
In that sense I wonder how "real" or "artificial" a system with an empty numa node is.
Is that a thing that really exists outside of a lab?
If so great - lets work on the SRU and please help me to add a SRU Template with your arguments to make a case for it convincing the SRU Team.

Changed in ubuntu-power-systems:
status: New → In Progress
Changed in ubuntu-power-systems:
importance: Undecided → High
status: In Progress → Incomplete
Changed in libvirt (Ubuntu Xenial):
status: New → Incomplete
Changed in ubuntu-power-systems:
assignee: nobody → David Britton (davidpbritton)
ChristianEhrhardt (paelzer) wrote :

Lacking feedback on how real the case is to make a compelling SRU statement for the SRU Team.
Please see my comment #4 and reply with the details needed to make a SRU possible.

Until that was provided I set this back from incomplete to invalid (no offense, consider it a timeout on the "incomplete" to clear the view for currently actionable items), please set back to new once the data was provided.

Changed in libvirt (Ubuntu Xenial):
status: Incomplete → Invalid
Changed in ubuntu-power-systems:
status: Incomplete → Invalid
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-13 00:51 EDT-------
(In reply to comment #12)
> There is some noise when applying this to Xenial.
> Nothing too big, but I think at least
> commit 5555dc0d7fe0267e2ff6e5a9625164f2896f9cc5 (HEAD)
> Author: Peter Krempa <email address hidden>
> Date: Tue Sep 13 14:28:33 2016 +0200
>
> util: numa: Remove impossible error handling
>
> Would be needed to apply better, yet OTOH the code back in Xenial might not
> fulfill this condition.
>
> Not rocket science but I see some work and regression potential which means
> for the SRU I'd like to have a really good case.
> In that sense I wonder how "real" or "artificial" a system with an empty
> numa node is.
> Is that a thing that really exists outside of a lab?
> If so great - lets work on the SRU and please help me to add a SRU Template
> with your arguments to make a case for it convincing the SRU Team.

There can a real possibility of having a memory less numa node as a valid config of system provided the system does not have full config(maximum memory possible for that system), which can cause these functional issues that can be resolved by having this fix.

More over host numa node config affecting guest functional is unacceptable, so it is good to have this fix applied, Thanks.

Regards,
-Satheesh

Changed in libvirt (Ubuntu Xenial):
status: Invalid → Triaged
importance: Undecided → Low
Changed in ubuntu-power-systems:
status: Invalid → Triaged
ChristianEhrhardt (paelzer) wrote :

I checked and I don't have such a system, so I'll rely on you testing the code.
I can do the general regression checks but on the case I will need you to confirm it working.

I'll first provide a PPA with the fix that you should verify to fix your case.
If that passed your verification and my regression checks, we will move on to the actual SRU.

There you will then need to verify what we have in -proposed.

At any time if there is any way to "construct" such a case artificially please post how to do so.

Changed in libvirt (Ubuntu Xenial):
assignee: nobody → ChristianEhrhardt (paelzer)
ChristianEhrhardt (paelzer) wrote :

Hi,
There is a test build of a backport available at [1].

This did work through some basic checks, but a full regression test will take some more time.

Please could you check if that fixes the issue you have with the empty numa node setup?

If it does we can go on with the SRU. It would be great if you could provide as much Detail for the SRU Template [2] to this bugs description, I'll then help to add the rest.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2959
[2]: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Changed in libvirt (Ubuntu Xenial):
status: Triaged → In Progress
Changed in ubuntu-power-systems:
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.