numad crashes while running kvm guest
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
Low
|
Unassigned | ||
numad (Debian) |
Fix Released
|
Unknown
|
|||
numad (Ubuntu) |
Fix Released
|
Low
|
Unassigned | ||
Bionic |
Won't Fix
|
Low
|
Unassigned | ||
Cosmic |
Won't Fix
|
Low
|
Unassigned | ||
Disco |
Won't Fix
|
Low
|
Unassigned | ||
Eoan |
Fix Released
|
Low
|
Unassigned | ||
Focal |
Fix Released
|
Low
|
Unassigned |
Bug Description
[Impact]
* The numad code never considered that node IDs could not be sequential
and creates an out of array access.
* Fix the array index usage to not hit that
[Test Case]
0. The most important and least available ingredient to this issue are sparse Numa nodes. Usually on your laptop you just have one, on your usual x86 server you might have more but you usually have 0,1,2,...
On powerpc commonly people disable SMT (as that was a KVM requirement up to p8). This (or other cpu offlining) can lead to numa nodes like:
1,16,30 being the only one left. Only with a setup like that you can follow and trigger the case.
1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
and installed build-essential
(my guetss are Bionic and that didn't have stressapptest packaged yet)
Built and installed the tool
5. ran the stress in both guests as mentioned
$ stressapptest -s 200
=> This will trigger the crash
[Regression Potential]
* Without the fix it is severely broken on systems with sparse numa
nodes. I imagine you can (with some effort or bad luck) also create
such a case on x86, it is not ppc64 specific in general.
The code before the fix just works by accident for cpu~=nodid.
* Obviously the most likely potential regression would be to trigger
issues when parsing these arrays on systems that formerly run fine
not affected by the sparse node issue. But for non-sparse systems not
a lot should change the new code will find for example
cpu=1 mapped to node=1 instead of just assuming cpu=1 IS node=1.
Therefore I obviously hope for no regression, but that is the one I'd
expect if any.
[Other Info]
* I have submitted this upstream, but upstream seems somewhat dead :-/
* do not get crazy when reading the code, nodeid && cpudid are used
somewhat interchangeably which might make you go nuts at first (it
did for me); but I kept the upstream names as-is for less patch size.
----
== Comment: #0 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:42:23 ==
---Problem Description---
while running KVM guests, we are observing numad crashes on host.
Contact Information = <email address hidden>
---uname output---
Linux ltcgen6 4.15.0-1016-ibm-gt #18-Ubuntu SMP Thu Feb 7 16:58:31 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = witherspoon
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. check status of numad, if stopped start it
2. start a kvm guest
3. Run some memory tests inside guest
On the host after few minutes we see numad crashing. I had enabled debug log for numad, seeing below messages in numad.log before it crashes:
8870669: PID 88781: (qemu-system-ppc), Threads 6, MBs_size 15871, MBs_used 11262, CPUs_used 400, Magnitude 4504800, Nodes: 0,8
Thu Feb 21 00:12:10 2019: PICK NODES FOR: PID: 88781, CPUs 470, MBs 18671
Thu Feb 21 00:12:10 2019: PROCESS_MBs[0]: 9201
Thu Feb 21 00:12:10 2019: Node[0]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[1]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[2]: mem: 1878026 cpu: 4666
Thu Feb 21 00:12:10 2019: Node[3]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[4]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[5]: mem: 2194058 cpu: 4728
Thu Feb 21 00:12:10 2019: Totmag[0]: 94112134
Thu Feb 21 00:12:10 2019: Totmag[1]: 109211855
Thu Feb 21 00:12:10 2019: Totmag[2]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[3]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[4]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[5]: 2990058
Thu Feb 21 00:12:10 2019: best_node_ix: 1
Thu Feb 21 00:12:10 2019: Node: 8 Dist: 10 Magnitude: 10373506224
Thu Feb 21 00:12:10 2019: Node: 0 Dist: 40 Magnitude: 8762869316
Thu Feb 21 00:12:10 2019: Node: 253 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 254 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 252 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 255 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: MBs: 18671, CPUs: 470
Thu Feb 21 00:12:10 2019: Assigning resources from node 5
Thu Feb 21 00:12:10 2019: Node[0]: mem: 2007348 cpu: 1908
Thu Feb 21 00:12:10 2019: MBs: 0, CPUs: 0
Thu Feb 21 00:12:10 2019: Assigning resources from node 2
Thu Feb 21 00:12:10 2019: Process 88781 already 100 percent localized to target nodes.
On syslog we see sig 11:
[88726.086144] numad[88879]: unhandled signal 11 at 000000e38fe72688 nip 0000782ce4dcac20 lr 0000782ce4dcf85c code 1
Stack trace output:
no
Oops output:
no
System Dump Info:
The system was configured to capture a dump, however a dump was not produced.
*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.
== Comment: #2 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:44:38 ==
== Comment: #3 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:48:20 ==
I was using stressapptest to run memory workload inside the guest
`stressapptest -s 200`
== Comment: #5 - Brian J. King <email address hidden> - 2019-03-08 09:17:29 ==
Any update on this?
== Comment: #6 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 11:59:16 ==
Yes, I have been working on this for a while.
After a suggestion of @lagarcia, I tested the bug on the same machine, booted on default kernel (4.15.0-45-generic) and also booted the vm with the same generic kernel.
Results are that the bug also happens with 4.15.0-45-generic. So, it may not be a problem of the changes included on kernel 4.15.0-
A few things I noticed, that may be interesting to solve this bug:
- I had a very hard time to reproduce the bug on numad that started on boot. If I restart, or stop/start, the bug reproduces much easier.
- I debugged numad using gdb and I found out it is getting segfault on _int_malloc(), from glibc.
Attached is an occurrence of the bug, while numad was on gdb.
(systemctl start numad ; gdb /usr/bin/numad $NUMAD_PID)
== Comment: #7 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 12:00:00 ==
== Comment: #8 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-11 17:04:25 ==
I reverted the whole system to vanilla Ubuntu Bionic, and booted on 4.15.0-45-generic kernel.
Linux ltcgen6 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Then I booted the guest, also on 4.15.0-45-generic.
Linux ubuntu 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
I tried to reproduce the error, and I was able to.
It probably means this bug was not introduced by the changes on qemu/kernel, and it is present in the current repository of Ubuntu.
Next step should be doing a deeper debug on numad, in order to identify why it is getting segfault.
Related branches
- Christian Ehrhardt : Approve
- Canonical Server: Pending requested
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
- Christian Ehrhardt : Approve
- Canonical Server: Pending requested
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
- Andreas Hasenack (community): Approve
- Canonical Server: Pending requested
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
Changed in ubuntu-power-systems: | |
assignee: | nobody → Manoj Iyer (manjo) |
Changed in ubuntu-power-systems: | |
status: | New → Confirmed |
tags: | added: universe |
Changed in numad (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server) |
Changed in ubuntu-power-systems: | |
assignee: | Manoj Iyer (manjo) → Canonical Server Team (canonical-server) |
status: | Confirmed → Incomplete |
Changed in numad (Ubuntu): | |
status: | Confirmed → Incomplete |
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
Changed in numad (Ubuntu): | |
importance: | Undecided → High |
Changed in numad (Debian): | |
status: | Unknown → New |
Changed in ubuntu-power-systems: | |
status: | Incomplete → In Progress |
description: | updated |
Changed in numad (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in numad (Ubuntu Disco): | |
status: | New → In Progress |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → In Progress |
tags: | added: block-proposed |
tags: |
added: block-proposed-bionic block-proposed-disco removed: block-proposed |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
Changed in ubuntu-power-systems: | |
importance: | High → Low |
Changed in numad (Ubuntu Eoan): | |
status: | New → Incomplete |
Changed in numad (Ubuntu Disco): | |
status: | Incomplete → Won't Fix |
tags: | added: hwe-long-running |
Changed in numad (Ubuntu Eoan): | |
status: | Invalid → Won't Fix |
Changed in numad (Ubuntu Bionic): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Cosmic): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Disco): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Eoan): | |
importance: | Undecided → Low |
tags: | removed: block-proposed-disco verification-done verification-done-disco |
Changed in numad (Debian): | |
status: | New → Fix Released |
Changed in numad (Ubuntu): | |
assignee: | bugproxy (bugproxy) → nobody |
Changed in numad (Ubuntu Focal): | |
assignee: | bugproxy (bugproxy) → nobody |
Changed in ubuntu-power-systems: | |
status: | Incomplete → Fix Released |
Changed in numad (Ubuntu Bionic): | |
status: | Incomplete → Won't Fix |
Default Comment by Bridge