"numastat" doesn't display correct information for the guest.

Bug #1817258 reported by bugproxy on 2019-02-22
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Server Team
numactl (Ubuntu)
High
Ubuntu on IBM Power Systems Bug Triage
Bionic
High
Unassigned
Cosmic
High
Unassigned

Bug Description

[Impact]

 * Some less common numa hierarchies break numactl
 * backport upstream change to fix that
   - https://github.com/numactl/numactl/commit/b60868703

[Test Case]

 * You have to create an "affected" Numa topology
   E.g. on a full mem stacked P9 create a guest like:
     <vcpu placement='static' current='4'>8</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='64'/>
        <vcpupin vcpu='1' cpuset='65'/>
        <vcpupin vcpu='2' cpuset='66'/>
        <vcpupin vcpu='3' cpuset='67'/>
        <emulatorpin cpuset='64-67'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='8'/>
      </numatune>
 * Then after starting that numactl will fail to show it's data, for
   example:
# numastat -c qemu-system-ppc64
Per-node process memory usage (in MBs) for PID 5738 (qemu-system-ppc)
             Node 0 Node 8 ...
             ------ ------ ...
    Huge 0 0 ...
    Heap 0 0 ...
    Stack 14 0 ...
    Private 936 0 ...
    ------- ------ ------ ...
    Total 936 0 ...
   (more details on the Host used below)

[Regression Potential]

 * The change is rather small just skipping nodes not matching the
   expected numbers (important to not crash). If anything then I can
   think of a regression where it would skip a numa node that it
   would need to access. But I have seen none such case in my tests, so
   I hope it will be fine.

[Other Info]

 * n/a

---

== Comment: #0 - SANTWANA SAMANTRAY <email address hidden> - 2019-02-20 23:48:43 ==
---Problem Description---
numastat doesn't display correct information for kvm guests.
The guest is configured with vcpupin from the node8:
Snippet of the guest XML :
<vcpu placement='static' current='4'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='64'/>
    <vcpupin vcpu='1' cpuset='65'/>
    <vcpupin vcpu='2' cpuset='66'/>
    <vcpupin vcpu='3' cpuset='67'/>
    <emulatorpin cpuset='64-67'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='8'/>
  </numatune>

== Host Details ==
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 6
Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):

After starting the guest, the numastat doesn't display the guest process to be running in node8.
# numastat -c qemu-system-ppc64

Per-node process memory usage (in MBs) for PID 5738 (qemu-system-ppc)
         Node 0 Node 8 Node 252 Node 253 Node 254 Node 255 Total
         ------ ------ -------- -------- -------- -------- -----
Huge 0 0 0 0 0 0 0
Heap 0 0 0 0 0 0 14
Stack 14 0 0 0 0 0 0
Private 936 0 0 0 0 0 3064
------- ------ ------ -------- -------- -------- -------- -----
Total 936 0 0 0 0 0 3079

# service numad status
* numad.service - numad - The NUMA daemon that manages application locality.
   Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-02-21 00:43:54 EST; 20s ago
     Docs: man:numad
  Process: 8091 ExecStart=/usr/bin/numad $DAEMON_ARGS -i 15 (code=exited, status=0/SUCCESS)
 Main PID: 8092 (numad)
    Tasks: 2 (limit: 19660)
   CGroup: /system.slice/numad.service
           `-8092 /usr/bin/numad -i 15

Feb 21 00:43:54 ltcgen3 systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Feb 21 00:43:54 ltcgen3 systemd[1]: Started numad - The NUMA daemon that manages application locality..

---uname output---
Linux ltcgen3 4.15.0-1016-ibm-gt #18-Ubuntu SMP Thu Feb 7 16:58:31 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = Witherspoon

== Versions Installed ==
qemu 1:2.11+dfsg-1ubuntu7.8-1ibm3
qemu-kvm 1:2.11+dfsg-1ubuntu7.9
qemu-system-ppc 1:2.11+dfsg-1ubuntu7.8-1ibm3
libvirt0:ppc64el 4.0.0-1ubuntu8.6
libnuma-dev:ppc64el 2.0.11-2.1
libnuma1:ppc64el 2.0.11-2.1
numactl 2.0.11-2.1
numad 0.5+20150602-5

---Debugger---
A debugger is not configured

---Steps to Reproduce---
1. Configure the guest with vcpu pinning and memory binding to one of the numa node.
<cputune>
    <vcpupin vcpu='0' cpuset='64'/>
    <vcpupin vcpu='1' cpuset='65'/>
    <vcpupin vcpu='2' cpuset='66'/>
    <vcpupin vcpu='3' cpuset='67'/>
    <emulatorpin cpuset='64-67'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='8'/>
  </numatune>
2. Start the guest.
3. Check the "numastat" for the guest details.
4. The output doesn't display any data for desired node.

Contact Information = Santwana <email address hidden>

== Comment: #1 - SANTWANA SAMANTRAY <email address hidden> - 2019-02-20 23:49:58 ==

== Comment: #4 - SEETEENA THOUFEEK <email address hidden> - 2019-02-22 00:40:37 ==
I am not able to replicate in upstream code

We need to cherry pick this patch in numactl 2.0.11-2.1 level.

https://github.com/numactl/numactl/commit/b608687037d873ad82d6318f231b3d6612e8601d

root@ltcgen3:~/numactl# ./numastat -c qemu-system-ppc64

Per-node process memory usage (in MBs) for PID 53294 (qemu-system-ppc)
         Node 0 Node 8 Node 252 Node 253 Node 254 Node 255 Total
         ------ ------ -------- -------- -------- -------- ------
Huge 0 0 0 0 0 0 0
Heap 0 28 0 0 0 0 28
Stack 0 0 0 0 0 0 0
Private 0 103727 0 0 0 0 103727
------- ------ ------ -------- -------- -------- -------- ------
Total 0 103755 0 0 0 0 103755

---------------------------------------------------------------------------------------------------

Related branches

bugproxy (bugproxy) wrote : Guest XML

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-175674 severity-high targetmilestone-inin---

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → numactl (Ubuntu)
Andrew Cloke (andrew-cloke) wrote :

Updating title to remove genesis, as this issue is associated with the Ubuntu distro.

summary: - genesis : "numastat" doesn't display correct information for the guest.
+ "numastat" doesn't display correct information for the guest.
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Server Team (canonical-server)
Changed in numactl (Ubuntu):
status: New → Fix Released

I just wanted to note that this would have been fixed if the testing/verification on bug 1650493 would not have failed for you. Allright, lets give this another try then ... since I already evaluated and backported the patch in the past I just need to re-polish some branches and should have a PPA to test soon.

Changed in numactl (Ubuntu Cosmic):
status: New → Triaged
Changed in numactl (Ubuntu Bionic):
status: New → Triaged
Changed in ubuntu-power-systems:
status: New → In Progress

In the meantime numactl grew an FTBFS error - not due to the change requested here, but due to some other change in the build environment - yay why couldn't this be easy :-)
I'll get back to you once resolved ...

Bionic arm build failure is bug 1711478 - cherry picked and working on Bionic.
Cosmic is something else, looking at that now ...

Changed in numactl (Ubuntu Bionic):
status: Triaged → In Progress
Changed in numactl (Ubuntu Cosmic):
status: Triaged → In Progress
Changed in numactl (Ubuntu Bionic):
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in numactl (Ubuntu Cosmic):
assignee: nobody → Christian Ehrhardt  (paelzer)

The other one seems clear as well (hope that was all of them) and handled in bug 1818057

Please test the PPA [1] if it would suit your needs.
While you are doing so I have two MPs up for fellow packagers to review [2][3]

Bug status incomplete until testing on PPA confirmed to be working.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3671
[2]: https://code.launchpad.net/~paelzer/ubuntu/+source/numactl/+git/numactl/+merge/363778
[3]: https://code.launchpad.net/~paelzer/ubuntu/+source/numactl/+git/numactl/+merge/363777

description: updated
description: updated
Changed in numactl (Ubuntu Bionic):
status: In Progress → Incomplete
assignee: Christian Ehrhardt  (paelzer) → nobody
Changed in numactl (Ubuntu Cosmic):
assignee: Christian Ehrhardt  (paelzer) → nobody
status: In Progress → Incomplete

Uploaded to -unapproved to be as ready as possible and so that other uploaders see/know there is something in case they modify numactl.

@SRU Team - please wait with acceptance until we see a confirmation by IBM on the PPA here.
Once that is done I'll set the status accordingly and this is meant to be going on normally.

@IBM - to unblock this is all on you now, please test the PPA at
  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3671

Changed in numactl (Ubuntu):
importance: Undecided → High
Changed in numactl (Ubuntu Bionic):
importance: Undecided → High
Changed in numactl (Ubuntu Cosmic):
importance: Undecided → High

------- Comment From <email address hidden> 2019-03-05 03:47 EDT-------
(In reply to comment #14)
> Uploaded to -unapproved to be as ready as possible and so that other
> uploaders see/know there is something in case they modify numactl.
>
> @SRU Team - please wait with acceptance until we see a confirmation by IBM
> on the PPA here.
> Once that is done I'll set the status accordingly and this is meant to be
> going on normally.
>
> @IBM - to unblock this is all on you now, please test the PPA at
> https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3671

Hi,

I validated this issue with the PPA at https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3671 and it solves the issue. Now the correct node information is displayed for the guest.
# numastat -c qemu-system-ppc64

Per-node process memory usage (in MBs) for PID 10739 (qemu-system-ppc)
Node 0 Node 8 Node 252 Node 253 Node 254 Node 255 Total
------ ------ -------- -------- -------- -------- ------
Huge 0 0 0 0 0 0 0
Heap 0 28 0 0 0 0 28
Stack 0 0 0 0 0 0 0
Private 0 103722 0 0 0 0 103722
------- ------ ------ -------- -------- -------- -------- ------
Total 0 103750 0 0 0 0 103750

However, the package version for numactl from the PPA lists as "2.0.11-2.1ubuntu0.2" instead of "2.0.11-2.2ubuntu0.2 ",
Package: numactl
Versions:
2.0.11-2.1ubuntu0.2 (/var/lib/apt/lists/ppa.launchpad.net_ci-train-ppa-service_3671_ubuntu_dists_bionic_main_binary-ppc64el_Packages)

Thanks,
Santwana

Hi Santwana,
thanks for the check.

Versions currently are:
Bionic: 2.0.11-2.1
Cosmic: 2.0.11-2.2

Versions in the SRU are:
Bionic: 2.0.11-2.1ubuntu0.1
Cosmic: 2.0.11-2.2ubuntu0.1

Versions in the PPA are ending with 0.2 as I respun them once, but that does not affect the SRU in any case - you tested the right version already.

That said - the code is in the SRU queue in Bionic-/Cosmic-unapproved and ready to be reviewed and accepted by the SRU Team.
Once they do they will call for a final verification which I hope you can help with again @Santwana

Changed in numactl (Ubuntu Bionic):
status: Incomplete → Triaged
Changed in numactl (Ubuntu Cosmic):
status: Incomplete → Triaged

Hello bugproxy, or anyone else affected,

Accepted numactl into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/numactl/2.0.11-2.2ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in numactl (Ubuntu Cosmic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Changed in numactl (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed-bionic
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted numactl into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/numactl/2.0.11-2.1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ubuntu-power-systems:
status: In Progress → Fix Committed

@Bugproxy - you have the right setup - could you please verify this on Bionic and Cosmic?

@SRU Team - There are unrelated autopkgtest failures in some kernel tests.
Given the history of those [1][2][3] I could now fall into a retry-mania.
But TBH that would be a waste of my time and test ressources - can we ignore those and go on (once verified by IBM that the actual fix works)?

[1]: http://autopkgtest.ubuntu.com/packages/l/linux-oracle/bionic/amd64
[2]: http://autopkgtest.ubuntu.com/packages/l/linux-gcp-edge/bionic/amd64
[3]: http://autopkgtest.ubuntu.com/packages/l/linux/cosmic/ppc64el

------- Comment From <email address hidden> 2019-03-15 01:06 EDT-------
Santwana, please verify

bugproxy (bugproxy) on 2019-03-19
tags: added: targetmilestone-inin18041
removed: targetmilestone-inin---
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-03-21 00:24 EDT-------
(In reply to comment #24)
> Santwana,
>
> Could you please verify this one?

Hello Leonardo,

I had validated this issue with the PPA at https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3671 and it solves the issue. The correct node information is displayed for the guest.
# numastat -c qemu-system-ppc64

Per-node process memory usage (in MBs) for PID 10739 (qemu-system-ppc)
Node 0 Node 8 Node 252 Node 253 Node 254 Node 255 Total
------ ------ -------- -------- -------- -------- ------
Huge 0 0 0 0 0 0 0
Heap 0 28 0 0 0 0 28
Stack 0 0 0 0 0 0 0
Private 0 103722 0 0 0 0 103722
------- ------ ------ -------- -------- -------- -------- ------
Total 0 103750 0 0 0 0 103750

Versions:
2.0.11-2.1ubuntu0.2 (/var/lib/apt/lists/ppa.launchpad.net_ci-train-ppa-service_3671_ubuntu_dists_bionic_main_binary-ppc64el_Packages)

Thanks,
Santwana

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-03-22 09:50 EDT-------
*** Bug 175675 has been marked as a duplicate of this bug. ***

Per former comment (almost) tested as requested.

*sigh*
The PPA content really is identical to the content in proposed, given how long this took I'm not going to re-request this test and hope that will also be sufficient in the SRU teams POV.

tags: added: verification-done verification-done-bionic verification-done-cosmic
removed: verification-needed verification-needed-bionic verification-needed-cosmic

Looking at the autopkgtest failures and especially the history of them indicates that the remaining few (most worked well) can be ignored.
Here references to the remaining issues and their history:
http://autopkgtest.ubuntu.com/packages/l/linux/bionic/i386
http://autopkgtest.ubuntu.com/packages/l/linux-gcp-edge/bionic/amd64
http://autopkgtest.ubuntu.com/packages/l/linux-oracle/bionic/amd64

@SRU Team - do you want me to get in touch with the kernel Team to mask these tests or can/will you ignore them as-is for the SRU release of this change?

Add the following to the same list (for Cosmic)
http://autopkgtest.ubuntu.com/packages/l/linux/cosmic/ppc64el

FYI: I opened [1] for a discussion on the tests with the Kernel Team

[1]: https://code.launchpad.net/~paelzer/britney/hints-ubuntu-bionic-linux-fails-march-2019/+merge/365045

The pending-SRU page lists both bugs as verified (1817258 1818057) and we eliminated the related test issues by retrying and hinting them - anything else blocking that one from being released?.

hmm in detail it is a bit more complex
To state that more in detail:
- Bugs are verified in both releases
- in Cosmic the test issues are fully resolved
- in Bionic the MP to hint them is up but did not yet get a response

... I'll ping the kernel team again

Brian Murray (brian-murray) wrote :

Regarding "The PPA content really is identical to the content in proposed, given how long this took I'm not going to re-request this test and hope that will also be sufficient in the SRU teams POV."

I believe we talked about this on irc but I don't feel great about releasing this as is, that being said given the size of the patch and the low regression potential if another SRU team member thought it was okay then I'd release it.

Łukasz Zemczak (sil2100) wrote :

I don't know the code-base, but normally the patched change would feel a bit risky - mostly because if something goes wrong and there is no node_num in node_ix_map I'd be worried we go out-of-bounds for the array. That being said, I *guess* this is an impossible scenario (maybe it would be obvious with proper code-context), especially that their github master branch still has the same change.

I'll act as the second SRU member and release it.

The verification of the Stable Release Update for numactl has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package numactl - 2.0.11-2.2ubuntu0.1

---------------
numactl (2.0.11-2.2ubuntu0.1) cosmic; urgency=medium

  * d/p/lp1817258-Segment-fault-when-numa-nodes-not-sequential-or-cont.patch:
    fix segfault on uncommon numa node setups (LP: #1817258)
  * Fix FTBFS in Cosmic (LP: #1818057)
    - d/p/FTBFS-deprecated-use-readdir-3-instead.patch
    - d/p/FTBFS-include-sys-sysmacros.h-for-major-minor.patch

 -- Christian Ehrhardt <email address hidden> Wed, 20 Jun 2018 16:36:20 +0200

Changed in numactl (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package numactl - 2.0.11-2.1ubuntu0.1

---------------
numactl (2.0.11-2.1ubuntu0.1) bionic; urgency=medium

  * d/p/lp1817258-Segment-fault-when-numa-nodes-not-sequential-or-cont.patch:
    fix segfault on uncommon numa node setups (LP: #1817258)
  * debian/patches/Allow-building-on-ARM-systems.patch:
    - add __arm__ to avoid failure due to missing syscalls.
    - return -1 and set errno to ENOSYS on migrate_pages function
      if __NR_migrate_pages is undefined, thanks Uwe Kleine-König
      and Tiago Stürmer Daitx (LP: #1711478).

 -- Christian Ehrhardt <email address hidden> Wed, 20 Jun 2018 16:36:20 +0200

Changed in numactl (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers