numastat <pid> fails with double free or corruption

Bug #1650493 reported by bugproxy on 2016-12-16
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
Canonical Server Team
numactl (Ubuntu)
Status tracked in Disco
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Medium
Canonical Server Team
Disco
Undecided
Unassigned

Bug Description

while trying to get stat of the guest process (configured with hugepages), numastat fails

====================
Environment details
====================
# uname -a
Linux lep8b 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:46 UTC 2016 ppc64le ppc64le ppc64le GNU/Linu

=====
Issue
=====
2016-12-14 07:02:56,396 process L0368 INFO | Running 'numastat 61257'
2016-12-14 07:02:56,402 process L0462 DEBUG| [stderr] *** Error in `numastat': double free or corruption (out): 0x00000100265005a0 ***
2016-12-14 07:02:56,403 process L0462 DEBUG| [stdout]
2016-12-14 07:02:56,403 process L0482 INFO | Command 'numastat 61257' finished with -6 after 0.00309896469116s
2016-12-14 07:02:56,403 process L0462 DEBUG| [stdout] Per-node process memory usage (in MBs) for PID 61257 (qemu-system-ppc)
2016-12-14 07:02:56,404 process L0462 DEBUG| [stderr] ======= Backtrace: =========
2016-12-14 07:02:56,404 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(+0x86d54)[0x3fff9a736d54]
2016-12-14 07:02:56,404 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(+0x93c30)[0x3fff9a743c30]
2016-12-14 07:02:56,404 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(cfree+0x68)[0x3fff9a748218]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(fclose+0x1c8)[0x3fff9a727d68]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] numastat(+0x7aa4)[0x401d7aa4]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] numastat(+0x2388)[0x401d2388]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(+0x2291c)[0x3fff9a6d291c]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] /lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fff9a6d2b18]
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] ======= Memory map: ========
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] 401d0000-401e0000 r-xp 00000000 08:92 40325510 /usr/bin/numastat
2016-12-14 07:02:56,405 process L0462 DEBUG| [stderr] 401e0000-401f0000 r--p 00000000 08:92 40325510 /usr/bin/numastat
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 401f0000-40200000 rw-p 00010000 08:92 40325510 /usr/bin/numastat
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 10026500000-10026530000 rw-p 00000000 00:00 0 [heap]
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 3fff9a6b0000-3fff9a860000 r-xp 00000000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 3fff9a860000-3fff9a870000 ---p 001b0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 3fff9a870000-3fff9a880000 r--p 001b0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 3fff9a880000-3fff9a890000 rw-p 001c0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
2016-12-14 07:02:56,406 process L0462 DEBUG| [stderr] 3fff9a8b0000-3fff9a8c0000 rw-p 00000000 00:00 0
2016-12-14 07:02:56,407 process L0462 DEBUG| [stderr] 3fff9a8c0000-3fff9a8e0000 r-xp 00000000 00:00 0 [vdso]
2016-12-14 07:02:56,407 process L0462 DEBUG| [stderr] 3fff9a8e0000-3fff9a920000 r-xp 00000000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
2016-12-14 07:02:56,407 process L0462 DEBUG| [stderr] 3fff9a920000-3fff9a930000 r--p 00030000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
2016-12-14 07:02:56,407 process L0462 DEBUG| [stderr] 3fff9a930000-3fff9a940000 rw-p 00040000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
2016-12-14 07:02:56,407 process L0462 DEBUG| [stderr] 3fffdd320000-3fffdd350000 rw-p 00000000 00:00 0 [stack]

=============
Recreation Steps
=============
1. Configure host with hugepages
2. Start a guest and attach following memory device xml,
<?xml version='1.0' encoding='UTF-8'?>
<memory model="dimm"><target><size unit="KiB">8388608</size><node>0</node></target><source><pagesize unit="KiB">16384</pagesize><nodemask>0</nodemask></source></memory>
3. Set the rules in guest
4. execute numastat of guest pid

Expected Result :
Provide PID numastat

# numastat 55119

Per-node process memory usage (in MBs) for PID 55119 (qemu-system-ppc)
                           Node 0 Node 1 Node 16
                  --------------- --------------- ---------------
Huge 0.00 0.00 0.00
Heap 2.00 0.38 0.00
Stack 0.00 0.00 0.00
Private 31800.12 183.06 0.00
---------------- --------------- --------------- ---------------
Total 31802.12 183.44 0.00

                          Node 17 Total
                  --------------- ---------------
Huge 0.00 0.00
Heap 0.00 15.25
Stack 0.00 0.06
Private 0.00 33169.31
---------------- --------------- ---------------
Total 0.00 34345.00
*** Error in `numastat': free(): invalid next size (fast): 0x000001003f2c0580 ***
======= Backtrace: =========
/lib/powerpc64le-linux-gnu/libc.so.6(+0x86d54)[0x3fff82866d54]
/lib/powerpc64le-linux-gnu/libc.so.6(+0x93c30)[0x3fff82873c30]
/lib/powerpc64le-linux-gnu/libc.so.6(cfree+0x68)[0x3fff82878218]
numastat(+0x4244)[0x5adc4244]
numastat(+0x7d24)[0x5adc7d24]
numastat(+0x2388)[0x5adc2388]
/lib/powerpc64le-linux-gnu/libc.so.6(+0x2291c)[0x3fff8280291c]
/lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fff82802b18]
======= Memory map: ========
5adc0000-5add0000 r-xp 00000000 08:92 40325510 /usr/bin/numastat
5add0000-5ade0000 r--p 00000000 08:92 40325510 /usr/bin/numastat
5ade0000-5adf0000 rw-p 00010000 08:92 40325510 /usr/bin/numastat
1003f2c0000-1003f2f0000 rw-p 00000000 00:00 0 [heap]
3fff827e0000-3fff82990000 r-xp 00000000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
3fff82990000-3fff829a0000 ---p 001b0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
3fff829a0000-3fff829b0000 r--p 001b0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
3fff829b0000-3fff829c0000 rw-p 001c0000 08:92 25745199 /lib/powerpc64le-linux-gnu/libc-2.24.so
3fff829e0000-3fff829f0000 rw-p 00000000 00:00 0
3fff829f0000-3fff82a10000 r-xp 00000000 00:00 0 [vdso]
3fff82a10000-3fff82a50000 r-xp 00000000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
3fff82a50000-3fff82a60000 r--p 00030000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
3fff82a60000-3fff82a70000 rw-p 00040000 08:92 25745195 /lib/powerpc64le-linux-gnu/ld-2.24.so
3fffc3b90000-3fffc3bc0000 rw-p 00000000 00:00 0 [stack]
Aborted

== Comment: #2 - SEETEENA THOUFEEK <email address hidden> - 2016-12-15 03:31:48 ==
root@lep8b:/proc# dpkg -l | grep numa
ii libnuma1:ppc64el 2.0.11-1ubuntu2 ppc64el Libraries for controlling NUMA policy
ii numactl 2.0.11-1ubuntu2 ppc64el NUMA scheduling and memory placement tool
root@lep8b:/proc# uname -r
4.8.0-30-generic

Added the numa version and kernel version.
.
Mirroring to Ubuntu team to cherry pick this patch.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-149964 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → numactl (Ubuntu)
tags: added: patch server-next

Hi,
thanks for providing the patch!
I think we can look at integrating that.

The release cadence of numactl is rather slow, but getting it accepted https://github.com/numactl/numactl will help still.

If you could also open up an issue there and link it here that might be helpful to reason at least the later following SRU processing.

Changed in numactl (Ubuntu):
status: New → Triaged
importance: Undecided → Medium

Since this seems to have worked all the time - could you also share how common this issue (non contiguous nodes) is?
That will help to assign a higher prio if needed - for now I set medium.

bugproxy (bugproxy) wrote :

Default Comment by Bridge

bugproxy (bugproxy) wrote : dmesg
  • dmesg Edit (703.4 KiB, application/octet-stream)

Default Comment by Bridge

Manoj Iyer (manjo) wrote :

Since this bug is already on Canonical-server's radar, I am replacing taco screen team with canonical-server.

Changed in numactl (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Server Team (canonical-server)

------- Comment on attachment From <email address hidden> 2017-02-14 03:47 EDT-------

With minor changes, I am re-attaching the same patch.

Thanks for updating the patch, is there any progress/feedback in upstreaming the change yet?

Also I asked in #3 on the likeliness to adapt the priority - reading your patch I almost assume that that this hits "any process with huge pages" - I mean must it be a guest to cause this?

Changed in numactl (Ubuntu):
status: Triaged → Incomplete

------- Comment From <email address hidden> 2017-03-15 00:53 EDT-------
(In reply to comment #21)
> Thanks for updating the patch, is there any progress/feedback in upstreaming
> the change yet?
>
> Also I asked in #3 on the likeliness to adapt the priority - reading your
> patch I almost assume that that this hits "any process with huge pages" - I
> mean must it be a guest to cause this?

I am still waiting to get the patch accepted upstream. yes. this hits any process with huge pages.

Thanks for the feedback, I still wonder how common that case of them being non-contiguous actually is.
Or vice versa - for testing and confirming the fix what a good way would be to create such situation.
If no special HW is available, is there a good way to e.g. create a qemu guest with numa config in a way to represent this case and confirm the fix?
If you could provide that, that would be great!

Until then for your testing I created a ppa to verify it will build with the fix, but also for you to test it in a the properly packaged form.
If we do that right I'll have to learn how to do comments on cdbs to add proper headers to add you as the source, but for now the test should be good on zesty.
Check out => https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2704/

bugproxy (bugproxy) on 2017-04-06
tags: removed: bugnameltc-149964 patch server-next severity-medium
bugproxy (bugproxy) on 2017-06-16
tags: added: bugnameltc-149964 severity-medium
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-06-16 13:42 EDT-------
*** Bug 155779 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-26 03:48 EDT-------
Harish, Any update here ?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-16 09:01 EDT-------
https://github.com/numactl/numactl/commit/b608687037d873ad82d6318f231b3d6612e8601d

Hi,
it seems no one else worked on this and I come by late to check dormant bugs.
I apologize and beg your pardon on this.

OTOH the reason at least I was waiting was that I asked for somebody with an affected system to actually confirm the fix to be working since I have zero chance to confirm on my own - see comment #10.

I found some repro steps in the upstream patch now, but would recommend you ensure that it really works on the real thing.

OTOH of course nobody wants you to test on Zesty (this is what the old ppa has) these days, so I respinned it for Cosmic. Same ppa as before [1], please confirm the fix works as expected.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2704/+packages

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-21 03:38 EDT-------
(In reply to comment #33)
> Hi,
> it seems no one else worked on this and I come by late to check dormant bugs.
> I apologize and beg your pardon on this.
>
> OTOH the reason at least I was waiting was that I asked for somebody with an
> affected system to actually confirm the fix to be working since I have zero
> chance to confirm on my own - see comment #10.
>
> I found some repro steps in the upstream patch now, but would recommend you
> ensure that it really works on the real thing.
>
> OTOH of course nobody wants you to test on Zesty (this is what the old ppa
> has) these days, so I respinned it for Cosmic. Same ppa as before [1],
> please confirm the fix works as expected.
>
> [1]:
> https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2704/+packages

Harish S , please test and let us know

Changed in ubuntu-power-systems:
status: New → Incomplete
importance: Undecided → Medium
assignee: nobody → Canonical Server Team (canonical-server)
tags: added: triage-g
Manoj Iyer (manjo) wrote :

IBM, 18.10 feature freeze is on the 23rd of Aug, could you please post the test results on the PPA package to this bug report?

bugproxy (bugproxy) on 2018-09-24
tags: added: targetmilestone-inin1810
removed: targetmilestone-inin---
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-09-25 00:51 EDT-------
Sorry for not being able to verify for a long time. This was my older ID and I did not receive any notification w.r.t this bug.

I have tried to recreate this issue on a Ubuntu 18.10 host and guest and was not able to reproduce.

Guest
--------
# uname -a
Linux ubuntu1810 4.18.0-7-generic #8-Ubuntu SMP Tue Aug 28 18:20:56 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Host
------
Linux lep8d 4.18.0-7-generic #8-Ubuntu SMP Tue Aug 28 18:20:56 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

# dpkg -l | grep numac
ii numactl 2.0.11-2.1 ppc64el NUMA scheduling and memory placement tool

Thanks.

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Andreas Hasenack (ahasenack) wrote :

Did you have to use the PPA that Christian provided and rebuilt for cosmic, or was that a plain 18.10 host?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-10-02 12:26 EDT-------
(In reply to comment #41)
> Did you have to use the PPA that Christian provided and rebuilt for cosmic,
> or was that a plain 18.10 host?

It was a plain 18.10 host.

So to summarize:
- per latest IBM comment the version in 18.10 2.0.11-2.1 works
- This is also the version in 18.04, so that works as well
- But the referred fix (pointed out in comment #13 that is actually supposed in Upstream version
  2.12) is NOT in any of those package versions

I'm marking the bug tasks accordingly for 18.04/18.10 as fixed due to that, but clearly there is some confusion going on - so I keep the main task on incomplete to reflect that.

Either the fix mentioned was not the actual fix needed or the recent tests were not testing the case correctly.

Given this confusion at the current state no SRUs are planned, so I'll mark Xenial as Won't Fix to make that clear as well.

Changed in numactl (Ubuntu Cosmic):
status: Incomplete → Fix Released
Changed in numactl (Ubuntu Bionic):
status: New → Fix Released
Changed in numactl (Ubuntu Xenial):
status: New → Fix Released
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Andrew Cloke (andrew-cloke) wrote :

Marking the ubuntu-power-systems series as fix released as it is Fix Released in all released versions.

Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers