numad crashes while running kvm guest

Bug #1832915 reported by bugproxy
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Low
Unassigned
numad (Debian)
Fix Released
Unknown
numad (Ubuntu)
Fix Released
Low
Unassigned
Bionic
Won't Fix
Low
Unassigned
Cosmic
Won't Fix
Low
Unassigned
Disco
Won't Fix
Low
Unassigned
Eoan
Fix Released
Low
Unassigned
Focal
Fix Released
Low
Unassigned

Bug Description

[Impact]

 * The numad code never considered that node IDs could not be sequential
    and creates an out of array access.

 * Fix the array index usage to not hit that

[Test Case]

0. The most important and least available ingredient to this issue are sparse Numa nodes. Usually on your laptop you just have one, on your usual x86 server you might have more but you usually have 0,1,2,...
On powerpc commonly people disable SMT (as that was a KVM requirement up to p8). This (or other cpu offlining) can lead to numa nodes like:
1,16,30 being the only one left. Only with a setup like that you can follow and trigger the case.
1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
   I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
   and installed build-essential
   (my guetss are Bionic and that didn't have stressapptest packaged yet)
   Built and installed the tool
5. ran the stress in both guests as mentioned
     $ stressapptest -s 200
=> This will trigger the crash

[Regression Potential]

 * Without the fix it is severely broken on systems with sparse numa
   nodes. I imagine you can (with some effort or bad luck) also create
   such a case on x86, it is not ppc64 specific in general.
   The code before the fix just works by accident for cpu~=nodid.

 * Obviously the most likely potential regression would be to trigger
   issues when parsing these arrays on systems that formerly run fine
   not affected by the sparse node issue. But for non-sparse systems not
   a lot should change the new code will find for example
   cpu=1 mapped to node=1 instead of just assuming cpu=1 IS node=1.
   Therefore I obviously hope for no regression, but that is the one I'd
   expect if any.

[Other Info]

 * I have submitted this upstream, but upstream seems somewhat dead :-/
 * do not get crazy when reading the code, nodeid && cpudid are used
   somewhat interchangeably which might make you go nuts at first (it
   did for me); but I kept the upstream names as-is for less patch size.

----

== Comment: #0 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:42:23 ==
---Problem Description---
while running KVM guests, we are observing numad crashes on host.

Contact Information = <email address hidden>

---uname output---
Linux ltcgen6 4.15.0-1016-ibm-gt #18-Ubuntu SMP Thu Feb 7 16:58:31 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = witherspoon

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. check status of numad, if stopped start it
2. start a kvm guest
3. Run some memory tests inside guest

On the host after few minutes we see numad crashing. I had enabled debug log for numad, seeing below messages in numad.log before it crashes:

8870669: PID 88781: (qemu-system-ppc), Threads 6, MBs_size 15871, MBs_used 11262, CPUs_used 400, Magnitude 4504800, Nodes: 0,8
Thu Feb 21 00:12:10 2019: PICK NODES FOR: PID: 88781, CPUs 470, MBs 18671
Thu Feb 21 00:12:10 2019: PROCESS_MBs[0]: 9201
Thu Feb 21 00:12:10 2019: Node[0]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[1]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[2]: mem: 1878026 cpu: 4666
Thu Feb 21 00:12:10 2019: Node[3]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[4]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[5]: mem: 2194058 cpu: 4728
Thu Feb 21 00:12:10 2019: Totmag[0]: 94112134
Thu Feb 21 00:12:10 2019: Totmag[1]: 109211855
Thu Feb 21 00:12:10 2019: Totmag[2]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[3]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[4]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[5]: 2990058
Thu Feb 21 00:12:10 2019: best_node_ix: 1
Thu Feb 21 00:12:10 2019: Node: 8 Dist: 10 Magnitude: 10373506224
Thu Feb 21 00:12:10 2019: Node: 0 Dist: 40 Magnitude: 8762869316
Thu Feb 21 00:12:10 2019: Node: 253 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 254 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 252 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 255 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: MBs: 18671, CPUs: 470
Thu Feb 21 00:12:10 2019: Assigning resources from node 5
Thu Feb 21 00:12:10 2019: Node[0]: mem: 2007348 cpu: 1908
Thu Feb 21 00:12:10 2019: MBs: 0, CPUs: 0
Thu Feb 21 00:12:10 2019: Assigning resources from node 2
Thu Feb 21 00:12:10 2019: Process 88781 already 100 percent localized to target nodes.

On syslog we see sig 11:
[88726.086144] numad[88879]: unhandled signal 11 at 000000e38fe72688 nip 0000782ce4dcac20 lr 0000782ce4dcf85c code 1

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system was configured to capture a dump, however a dump was not produced.

*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #2 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:44:38 ==

== Comment: #3 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:48:20 ==
I was using stressapptest to run memory workload inside the guest
`stressapptest -s 200`

== Comment: #5 - Brian J. King <email address hidden> - 2019-03-08 09:17:29 ==
Any update on this?

== Comment: #6 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 11:59:16 ==
Yes, I have been working on this for a while.

After a suggestion of @lagarcia, I tested the bug on the same machine, booted on default kernel (4.15.0-45-generic) and also booted the vm with the same generic kernel.
Results are that the bug also happens with 4.15.0-45-generic. So, it may not be a problem of the changes included on kernel 4.15.0-1016.18-fix1-ibm-gt.

A few things I noticed, that may be interesting to solve this bug:
- I had a very hard time to reproduce the bug on numad that started on boot. If I restart, or stop/start, the bug reproduces much easier.
- I debugged numad using gdb and I found out it is getting segfault on _int_malloc(), from glibc.

Attached is an occurrence of the bug, while numad was on gdb.
(systemctl start numad ; gdb /usr/bin/numad $NUMAD_PID)

== Comment: #7 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 12:00:00 ==

== Comment: #8 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-11 17:04:25 ==
I reverted the whole system to vanilla Ubuntu Bionic, and booted on 4.15.0-45-generic kernel.
Linux ltcgen6 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

Then I booted the guest, also on 4.15.0-45-generic.
Linux ubuntu 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

I tried to reproduce the error, and I was able to.
It probably means this bug was not introduced by the changes on qemu/kernel, and it is present in the current repository of Ubuntu.

Next step should be doing a deeper debug on numad, in order to identify why it is getting segfault.

Related branches

Revision history for this message
bugproxy (bugproxy) wrote : sosreport on host

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-175673 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : gdb output

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → numad (Ubuntu)
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: nobody → Manoj Iyer (manjo)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On a fresh Bionic running with the latest 4.15.0-51-generic I did the following trying to reproduce this issues.
Note: My Host has 128G mem and 40 cores (SMT off)

1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
   I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
   and installed build-essential
   (my guetss are Bionic and that didn't have stressapptest packaged yet)
   Built and installed the tool
5. ran the stress in both guests as mentioned
     $ stressapptest -s 200

Well actually I was just about to start that load (not yet happened) when I realized my numad process has already died:

● numad.service - numad - The NUMA daemon that manages application locality.
   Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
   Active: failed (Result: core-dump) since Mon 2019-06-17 06:12:31 UTC; 2min 23s ago
     Docs: man:numad
  Process: 119546 ExecStart=/usr/bin/numad $DAEMON_ARGS -i 15 (code=exited, status=0/SUCCESS)
 Main PID: 119547 (code=dumped, signal=SEGV)

Jun 17 06:00:28 dradis systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Jun 17 06:00:28 dradis systemd[1]: Started numad - The NUMA daemon that manages application locality..
Jun 17 06:12:31 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jun 17 06:12:31 dradis systemd[1]: numad.service: Failed with result 'core-dump'.

So the mem-stress load might help to trigger it, but isn't necessarily required.
After restarting the numad daemon I started the guest load and got the crash again.

While I have no idea yet what exactly is going on lets set this to confirmed at least.

[1]: https://github.com/stressapptest/stressapptest

Initially had PID 119547 and no odd entries in the log.

Changed in numad (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.9 KiB)

With verbose my numad log file is:

Mon Jun 17 06:22:53 2019: Nodes: 2
Min CPUs free: 1416, Max CPUs: 1423, Avg CPUs: 1419, StdDev: 3.53553
Min MBs free: 12869, Max MBs: 13756, Avg MBs: 13312, StdDev: 443.5
Node 0: MBs_total 65266, MBs_free 12869, CPUs_total 2000, CPUs_free 1416, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76
Node 1: MBs_total 65337, MBs_free 13756, CPUs_total 2000, CPUs_free 1423, Distance: 40 10 CPUs: 80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156
Mon Jun 17 06:22:53 2019: Processes: 1563
Mon Jun 17 06:22:53 2019: Candidates: 2
101867853: PID 120072: (qemu-system-ppc), Threads 23, MBs_size 55763, MBs_used 50509, CPUs_used 876, Magnitude 44245884, Nodes: 0,8
101867853: PID 120206: (qemu-system-ppc), Threads 23, MBs_size 55821, MBs_used 23699, CPUs_used 279, Magnitude 6612021, Nodes: 0,8
Mon Jun 17 06:22:53 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8)

With debug the dying message looked like:

Another run #2:
Mon Jun 17 06:25:08 2019: Nodes: 2
Min CPUs free: 302, Max CPUs: 439, Avg CPUs: 370, StdDev: 68.5018
Min MBs free: 1597, Max MBs: 4548, Avg MBs: 3072, StdDev: 1475.5
Node 0: MBs_total 65266, MBs_free 1597, CPUs_total 2000, CPUs_free 302, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76
Node 1: MBs_total 65337, MBs_free 4548, CPUs_total 2000, CPUs_free 439, Distance: 40 10 CPUs: 80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156
Mon Jun 17 06:25:08 2019: Processes: 1572
Mon Jun 17 06:25:08 2019: Candidates: 2
101881395: PID 120072: (qemu-system-ppc), Threads 25, MBs_size 55763, MBs_used 50523, CPUs_used 1995, Magnitude 100793385, Nodes: 0,8
101881395: PID 120206: (qemu-system-ppc), Threads 25, MBs_size 55821, MBs_used 45916, CPUs_used 830, Magnitude 38110280, Nodes: 0,8
Mon Jun 17 06:25:08 2019: PICK NODES FOR: PID: 120072, CPUs 2347, MBs 59438
Mon Jun 17 06:25:08 2019: PROCESS_MBs[0]: 17481
Mon Jun 17 06:25:08 2019: Node[0]: mem: 201700 cpu: 5952
Mon Jun 17 06:25:08 2019: Node[1]: mem: 45480 cpu: 2634
Mon Jun 17 06:25:08 2019: Totmag[0]: 12080055
Mon Jun 17 06:25:08 2019: Totmag[1]: 1948267
Mon Jun 17 06:25:08 2019: best_node_ix: 0
Mon Jun 17 06:25:08 2019: Node: 0 Dist: 10 Magnitude: 1200518400
Mon Jun 17 06:25:08 2019: Node: 8 Dist: 40 Magnitude: 119794320
Mon Jun 17 06:25:08 2019: MBs: 59438, CPUs: 2347
Mon Jun 17 06:25:08 2019: Assigning resources from node 0
Mon Jun 17 06:25:08 2019: Node[0]: mem: 1000 cpu: 0
Mon Jun 17 06:25:08 2019: MBs: 39368, CPUs: 1355
Mon Jun 17 06:25:08 2019: Assigning resources from node 1
Mon Jun 17 06:25:08 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8)

Another run #3:
Mon Jun 17 06:26:46 2019: Nodes: 2
Min CPUs free: 889, Max CPUs: 1048, Avg CPUs: 968, StdDev: 79.5016
Min MBs free: 1291, Max MBs: 3484, Avg MBs: 2387, StdDev: 1096.5
Node 0: MBs_total 65266, MBs_free 1291, CPUs_total 2000, CPUs_free 889, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76
Node 1: MBs_total 65337, MBs_free 3484, CPUs_total 2000, CPUs_free 104...

Read more...

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

# get debug symbols and gdb
$ sudo apt install numad-dbgsym gdb dpkg-dev
# get source as used in the package
$ apt source numad
# I found that we will also need glibc source, so:
$ apt source glibc

It helps to add paths to gdb
(gdb) directory /home/ubuntu/numad-0.5+20150602:/home/ubuntu/glibc-2.27:/home/ubuntu/glibc-2.27/malloc

I found these backtraces:

Thread 1 "numad" received signal SIGSEGV, Segmentation fault.
tcache_get (tc_idx=<optimized out>) at malloc.c:2943
2943 malloc.c: No such file or directory.
(gdb) bt
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2943
#1 __GI___libc_malloc (bytes=16) at malloc.c:3050
#2 0x00000d9b7ec780dc in bind_process_and_migrate_memory (p=0xd9b843b0f70) at numad.c:993
#3 0x00000d9b7ec7d148 in manage_loads () at numad.c:2225
#4 0x00000d9b7ec734dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654

(gdb) bt
#0 0x00000fb6cd2779f4 in bind_process_and_migrate_memory (p=0xfb6fc1e0f70) at numad.c:998
#1 0x00000fb6cd27d148 in manage_loads () at numad.c:2225
#2 0x00000fb6cd2734dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654

(gdb) bt
#0 0x000001c757da79f4 in bind_process_and_migrate_memory (p=0x1c758a60f70) at numad.c:998
#1 0x000001c757dad148 in manage_loads () at numad.c:2225
#2 0x000001c757da34dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

One fail was at:
CLEAR_CPU_LIST(cpu_bind_list_p);
The next two at:
OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);

The common denominator here is cpu_bind_list_p but that is a static local:
  static id_list_p cpu_bind_list_p;

The function is defined as:
#define OR_LISTS( or_list_p, list_1_p, list_2_p) CPU_OR_S( or_list_p->bytes, or_list_p->set_p, list_1_p->set_p, list_2_p->set_p)

That translates into
CPU_OR_S( cpu_bind_list_p->bytes, cpu_bind_list_p->set_p, cpu_bind_list_p->set_p, node[node_id].cpu_list_p->set_p)

CPU_OR_S is from sched.h and will make it to:
 - operate on the dynamically allocated CPU set(s) whose size is setsize bytes. (due to _S)
 - Store the union of the sets cpu_bind_list_p->set_p and node[node_id].cpu_list_p->set_p in destset
 - explicitly says dest "may be one of the source sets"

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

(gdb) p cpu_bind_list_p->bytes
$5 = 24
(gdb) p *(cpu_bind_list_p->set_p)
$7 = {__bits = {1229782938247303441, 4369, 0, 49, 1955697441360, 274, 303148778372988952, 139284342967816, 48, 337, 1955697440432, 1955697400112, 1955697400080, 1955697400048,
    1955697400016, 1955697400512}}
(gdb) p sizeof(*(cpu_bind_list_p->set_p))
$8 = 128

See the size mismatch?
It will allocate 24 bytes and needs 128.

I think this is a bad intialization.
We have these operations in the code on cpu_bind_list_p.

static id_list_p cpu_bind_list_p;
CLEAR_CPU_LIST(cpu_bind_list_p);
OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);

Now CLEAR_CPU_LIST has some init code, but only if == NULL.
#define CLEAR_CPU_LIST(list_p) \
    if (list_p == NULL) { \
        INIT_ID_LIST(list_p, num_cpus); \
    } \
    CPU_ZERO_S(list_p->bytes, list_p->set_p)

Since we can't rely on data in that static var "by accident" it might have stale old data.

Note: The other chance of errors is the 40 active CPUs vs the 160 potential CPUs (SMT off) that I have in my system.
The size is from num_cpu - if that detection is off then it might fail as well.
But at least in all my crashes that was ok.
(gdb) p num_cpus
$10 = 160

So lets assume it is the lack of (re)initialization for now.
Other structures of type "id_list_p" are all initialized with NULL btw.
Like:
  id_list_p all_cpus_list_p = NULL;
  id_list_p all_nodes_list_p = NULL;
  id_list_p reserved_cpu_mask_list_p = NULL;

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note:
- numad is "only" in universe in all releases
- nothing depends on it
- it is on 0.5+20150602-5 which seems rather old
- But upstream commits [1] since 2015 are minimal

TL;DR no (somewhat dead) upstream fix for that available to cherry-pick

Hmm, the above might have been a red herring.
I get 24 bytes size even in cases where the initial value was null.

Maybe the pure size of the set_p isn't waht matters.
In any case lets do a non-optimized build to get around debugging issues like:

(gdb) p node[node_id].cpu_list_pnode[node_id].cpu_list_p->bytes
value has been optimized out

Also I realized that this is a static var not only to be local but in the common sense to keep content across function calls. So initializing on init will be wrong :-)

[1] https://pagure.io/numad

Frank Heimes (fheimes)
tags: added: universe
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While I built a proper PPA in [1] this seems so trivial that we can rebuild locally with just

$ cc -std=gnu99 -I. -D__thread="" -c -o numad.o numad.c
$ cc numad.o -lpthread -lrt -lm -o numad
$ mv numad /usr/bin/numad

That should allow quick iterations.

With debug enabled I found that the second set is actually null.
So the red herring assumption above was correct.

996 while (nodes) {
997 if (ID_IS_IN_LIST(node_id, p->node_list_p)) {
998 OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);
(gdb) p node[node_id].cpu_list_p
$4 = (id_list_p) 0x0

The arg is
(gdb) p *(p->node_list_p)
$7 = {set_p = 0x304a3a418d0, bytes = 8}

This delivers "1"
int nodes = NUM_IDS_IN_LIST(p->node_list_p);
(gdb) p nodes
$5 = 1

That is:
#define NUM_IDS_IN_LIST(list_p) CPU_COUNT_S(list_p->bytes, list_p->set_p)

Per [2] this counts the cpus in the cpu_set.

So the TL;DR of this loop
while (nodes) {
  nodes -= 1;
is that it iterates over all CPUs

On the each iteration it checks
  if (ID_IS_IN_LIST(node_id, p->node_list_p)) {

node_id starts at zero and is incremented each iteration.

I must admit the usage of the term "node" for cpus here is very misleading.

"node" is a global data structure

typedef struct node_data {
    uint64_t node_id;
    uint64_t MBs_total;
    uint64_t MBs_free;
    uint64_t CPUs_total; // scaled * ONE_HUNDRED
    uint64_t CPUs_free; // scaled * ONE_HUNDRED
    uint64_t magnitude; // hack: MBs * CPUs
    uint8_t *distance;
    id_list_p cpu_list_p;
} node_data_t, *node_data_p;
node_data_p node = NULL;

Due to the misperception of "node" actually being CPUs the indexing here is off IMHO.

(gdb) p node[0]
$13 = {node_id = 0, MBs_total = 65266, MBs_free = 1510, CPUs_total = 2000, CPUs_free = 1144, magnitude = 1727440, distance = 0x304a3a41850 "\n(\032\n\244~", cpu_list_p = 0x304a3a41810}
(gdb) p node[1]
$14 = {node_id = 8, MBs_total = 65337, MBs_free = 1734, CPUs_total = 2000, CPUs_free = 1049, magnitude = 1818966, distance = 0x304a3a418b0 "(\n\032\n\244~", cpu_list_p = 0x304a3a41870}

My CPUs are 0,4,8,... and so is the indexing here as despite the node name it is actually based on CPUs.

Summary:
- The code checks for each CPU as counted by NUM_IDS_IN_LIST
- It will increase the ID until it found a hit in ID_IS_IN_LIST(node_id, p->node_list_p)
- that will skip empty CPUs as in my SMT case
- Once it found a cpu that is in the set it will OR_LISTS
  node[node_id].cpu_list_p

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The problem is that node[node_id].cpu_list_p is wrong.
When you look at the array again it has two real entries and nothing more:

(gdb) p node[0]
$20 = {node_id = 0, MBs_total = 65266, MBs_free = 1510, CPUs_total = 2000, CPUs_free = 1144, magnitude = 1727440, distance = 0x304a3a41850 "\n(\032\n\244~", cpu_list_p = 0x304a3a41810}
(gdb) p node[1]
$21 = {node_id = 8, MBs_total = 65337, MBs_free = 1734, CPUs_total = 2000, CPUs_free = 1049, magnitude = 1818966, distance = 0x304a3a418b0 "(\n\032\n\244~", cpu_list_p = 0x304a3a41870}
(gdb) p node[2]
$22 = {node_id = 1820693536, MBs_total = 33, MBs_free = 3318460192688, CPUs_total = 24, CPUs_free = 1839495593, magnitude = 33,
  distance = 0x1111111111111111 <error: Cannot access memory at address 0x1111111111111111>, cpu_list_p = 0x1111111111111111}
(gdb) p node[3]
$23 = {node_id = 286331153, MBs_total = 33, MBs_free = 3318460192752, CPUs_total = 8, CPUs_free = 1842299472, magnitude = 33,
  distance = 0x101 <error: Cannot access memory at address 0x101>, cpu_list_p = 0x7ea40a1a0e08 <main_arena+96>}
(gdb) p node[4]
$24 = {node_id = 1867659328, MBs_total = 33, MBs_free = 3318460192816, CPUs_total = 24, CPUs_free = 1882184320, magnitude = 33,
  distance = 0x1111111111111111 <error: Cannot access memory at address 0x1111111111111111>, cpu_list_p = 0x1111}
(gdb) p node[5]
$25 = {node_id = 0, MBs_total = 33, MBs_free = 139243009222666, CPUs_total = 139243009216008, CPUs_free = 1898775676, magnitude = 33, distance = 0x304a3a41890 "", cpu_list_p = 0x18}
(gdb) p node[6]
$26 = {node_id = 3689421645304561696, MBs_total = 33, MBs_free = 0, CPUs_total = 1229782938247299072, CPUs_free = 286331153, magnitude = 33,
  distance = 0x7ea40a1a0a28 <_IO_wide_data_2+264> "", cpu_list_p = 0x7ea40a1a0e08 <main_arena+96>}
(gdb) p node[7]
$27 = {node_id = 3546150882158837792, MBs_total = 33, MBs_free = 257, CPUs_total = 267, CPUs_free = 288230377091498008, magnitude = 33, distance = 0x304a3a41910 "\001\001",
  cpu_list_p = 0x8}
(gdb) p node[8]
$28 = {node_id = 303211223003168792, MBs_total = 33, MBs_free = 257, CPUs_total = 265, CPUs_free = 288230377024389144, magnitude = 33,
  distance = 0x2f69 <error: Cannot access memory at address 0x2f69>, cpu_list_p = 0x0}

We essentially do an out of bounds to the array at index [8] where cpu_list_p = 0x0 and that triggers the SEGV

We actually do NOT want node[node_id]

Instead we'd need to iterate the node array entries, and pick that entry which has nodes[x].node_id == node_id.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Chances are that without the odd SMT=off numbering on ppc things would work.
That might explain why this didn't fail more often or on other architectures so far.

But disabling subset of CPUs is allowed, so this needs to be fixed for all - no matter how "often" an issue occurs on one of the architectures.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Rebuild via:
rm numad
cc -g -O0 -fstack-protector-strong -std=gnu99 -I. -D__thread="" -Wdate-time -D_FORTIFY_SOURCE=2 -c -o numad.o numad.c cc -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now numad.o -lpthread -lrt -lm -o numad
ls -laF numad
sudo mv numad /usr/bin/numad

My current config triggering this has a pretty common CPU list on ppc64el:

CPU(s): 160
On-line CPU(s) list: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156
Off-line CPU(s) list: 1-3,5-7,9-11,13-15,17-19,21-23,25-27,29-31,33-35,37-39,41-43,45-47,49-51,53-55,57-59,61-63,65-67,69-71,73-75,77-79,81-83,85-87,89-91,93-95,97-99,101-103,105-107,109-111,113-115,117-119,121-123,125-127,129-131,133-135,137-139,141-143,145-147,149-151,153-155,157-159

The assumption seems to be correct, it was due to that cpu/node mismatch assuming always linear CPUs with cpu-number == index-in-array.

With the following change the breakage no more happens in my setup:

--- numad.c.orig 2019-06-17 09:27:49.783712059 +0000
+++ numad.c 2019-06-17 10:11:00.619113441 +0000
@@ -995,7 +995,18 @@
     int node_id = 0;
     while (nodes) {
         if (ID_IS_IN_LIST(node_id, p->node_list_p)) {
- OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);
+ int id = -1;
+ for (int node_ix = 0; (node_ix < num_nodes); node_ix++) {
+ if (node[node_ix].node_id == node_id) {
+ id = node_ix;
+ break;
+ }
+ }
+ if (id == -1) {
+ numad_log(LOG_CRIT, "Node %d is requested, but unknown\n", node_id);
+ exit(EXIT_FAILURE);
+ }
+ OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[id].cpu_list_p);
             nodes -= 1;
         }
         node_id += 1;

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While this numbering is pretty common at power (all non SMT systems) and s390x (scaling #cpus on load) it is uncommon on x86. Never the less in theory the issue should exist there as well
But I tried this for an hour and it didn't trigger (plenty of assigns happened)

Repro (x86)
1. Get a KVM guest with numa memory nodes
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>
2. disable some cpus in the mid
  $ echo 0 | sudo tee /sys/bus/cpu/devices/cpu1/online
  $ echo 0 | sudo tee /sys/bus/cpu/devices/cpu2/online
  $ lscpu
  CPU(s): 4
  On-line CPU(s) list: 0,3
  Off-line CPU(s) list: 1,2
3. install, start and follow the log of numad
  $ sudo apt install numad
  $ sudo systemctl start numad
  $ journalctl -f -u numad
4. run some memory load that will make numad assign processes
  $ sudo apt install stress-ng
  $ stress-ng --vm 2 --vm-bytes 90% -t 5m

If we follow the log of numad with verbose enabled we will after a while see numa assignments like:
Mon Jun 17 10:32:05 2019: Advising pid 3416 (stress-ng-vm) move from nodes (0-1) to nodes (0)
Mon Jun 17 10:32:23 2019: Advising pid 3417 (stress-ng-vm) move from nodes (0-1) to nodes (1)

Maybe on ppc also the numa node numbering is non linear, I remember working on fixes for numactl in that regard - and maybe that is important as well.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I have made a test build with the fix available at PPA [1]. It resolves the issue for me, but before going further please give that a try with your setups as well.

Further I opened a PR for upstream at [2] to discuss it there as well.
Feel free to chime in and give it a +1 there if it works well for you.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1832915-numad-debugging
[2]: https://pagure.io/numad/pull-request/3

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@JFH/Manjo - the bug assignment is odd can you please set it up the way you need it to reflect that we are waiting on Upstream (ack on PR) and IBM (test PPA) ?

Manoj Iyer (manjo)
Changed in numad (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server)
Changed in ubuntu-power-systems:
assignee: Manoj Iyer (manjo) → Canonical Server Team (canonical-server)
status: Confirmed → Incomplete
Changed in numad (Ubuntu):
status: Confirmed → Incomplete
Changed in ubuntu-power-systems:
importance: Undecided → High
Changed in numad (Ubuntu):
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Reported to Debian (linked above) and prepared an MP for Eoan for team review.

But still waiting for your ok @IBM that this solves your case.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-06-20 07:00 EDT-------
(In reply to comment #22)
> Reported to Debian (linked above) and prepared an MP for Eoan for team
> review.
>
> But still waiting for your ok @IBM that this solves your case.

It does not solve the issue.

> Updated numad from ppa:

# dpkg -l | grep numad
ii numad 0.5+20150602-5ubuntu0.1~ppa2 ppc64el User-level daemon that monitors NUMA topology and usage

# service numad status
? numad.service - numad - The NUMA daemon that manages application locality.
Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-06-20 06:50:13 EDT; 1s ago
Docs: man:numad
Process: 13844 ExecStart=/usr/bin/numad $DAEMON_ARGS -i 15 (code=exited, status=0/SUCCESS)
Main PID: 13845 (numad)
CGroup: /system.slice/numad.service
??13845 /usr/bin/numad -i 15

> now started a KVM guest, ran `stress-ng -c 14 -vm 10` inside guest

> in few minutes numad crashed on host

Jun 20 06:56:15 localhost kernel: [ 2916.371332] numad[13845]: segfault (11) at 1b1ea0308 nip 7fffb56cac20 lr 7fffb56cf85c code 1 in libc-2.27.so[7fffb5620000+210000]
Jun 20 06:56:15 localhost kernel: [ 2916.371352] numad[13845]: code: 3be30058 38c30010 f821ffc1 91230008 38000000 7c2004ac 7d2030a9 7c0031ad
Jun 20 06:56:15 localhost kernel: [ 2916.371354] numad[13845]: code: 40c2fff8 4c00012c 2fa90000 419e0144 <e9090008> 550ae13e 394afffe 794a1f48
Jun 20 06:56:16 localhost systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jun 20 06:56:16 localhost systemd[1]: numad.service: Failed with result 'core-dump'.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting, for me the issue was no more reproducible with the fix applied.

Maybe there is another bug in the same code that you hit now.
Could you tell me all details about the involved setup in triggering this crash still?

Further this should have created a crash dump /var/crash/.
Probably best is to clean this up, let it crash again and then attach the crash here so I can take a look at where/why it might crash still for you.

Changed in numad (Debian):
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
any updates on this one?

All I could reproduce would be fixed with the suggested change, but since according to you that isn't sufficient I now need you to debug your case and/suggest add whatever change on top that you need.

After fixing the bug that I could identify I'd hate if this goes into "waiting forever" for some extra issue that you have with it.

This is incomplete until further info is proviede, you can
a) provide info how this might be reproducible for me as well
b) provide patches that fix your issue
c) like the fix that I have to at least fix some issues - we can push that and you can spawn another bug for the follow on issue that you have identified.

Please let me know what you need/prefer.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-07-12 04:50 EDT-------
Since the machine which I had is being used for other testing.. I setup another machine to test this again with all latest level + the numad from ppa..

# dpkg -l | grep numad
ii numad 0.5+20150602-5ubuntu0.1~ppa2 ppc64el User-level daemon that monitors NUMA topology and usage
# uname -r
4.15.0-54-generic

Numad crash issue is fixed. I am not able to hit any crashes now...

Revision history for this message
Frank Heimes (fheimes) wrote :

@bssrikanth many thanks for testing and feedback!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, thanks bssrikanth!

That means we can go on with the SRU.

I'm still sort of frightened by upstream numad seeming dead, but the fix seems clear and now is confirmed to work for you which allows us to go on.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Uploaded to Eoan ...

Changed in numad (Ubuntu Cosmic):
status: New → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package numad - 0.5+20150602-5ubuntu1

---------------
numad (0.5+20150602-5ubuntu1) eoan; urgency=medium

  * d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
    (LP: #1832915)

 -- Christian Ehrhardt <email address hidden> Wed, 19 Jun 2019 13:05:33 +0200

Changed in numad (Ubuntu Eoan):
status: Incomplete → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → In Progress
description: updated
Changed in numad (Ubuntu Bionic):
status: New → In Progress
Changed in numad (Ubuntu Disco):
status: New → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

MP reviews complete, uploaded to Bionic/Disco unapproved

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted numad into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/numad/0.5+20150602-5ubuntu0.19.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in numad (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Changed in numad (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted numad into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/numad/0.5+20150602-5ubuntu0.18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Took a P9 system which has spares nodes:
$ ll /sys/bus/node/devices/node*
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node0 -> ../../../devices/system/node/node0/
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node8 -> ../../../devices/system/node/node8/

Install and start numad
$ apt install numad
$ systemctl start numad

Start a KVM guest with 100 CPUs and 64G memory
 $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=ppc64el label=daily release=eoan
 $ uvt-kvm create --memory $((64*1024)) --cpu 100 --password ubuntu eoan arch=ppc64el release=eoan label=daily

Even without putting pressure on the memory we see the expected crash:

Jul 17 08:57:51 dradis kernel: numad[8341]: unhandled signal 11 at 0000712686320e90 nip 000071268451058c lr 00007126845132c0 code 1
Jul 17 08:57:52 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jul 17 08:57:52 dradis systemd[1]: numad.service: Failed with result 'core-dump'.

Installing from proposed.
numad/bionic-proposed 0.5+20150602-5ubuntu0.18.04.1 ppc64el [upgradable from: 0.5+20150602-5]

Starting the numad service again and tracking the logs.

1. start guest
2. While that is going on putting some memory pressure on the guest with stressapptest

This time I was able to again trigger a crash with this setup despite using proposed.
Maybe I hit what you had when testing the PPA before.
It seems to occur more rarely but still reliable enough, but I'll try to collect debug data - maybe we find the further issue that is in here as well.

Lets call this verification failed for now, debug and potentially respin the fix to an extended one in Eoan and then reconsider.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

New crash that as found is:
#0 0x000002375f1bd2c4 in pick_numa_nodes (pid=<optimized out>, cpus=<optimized out>, mbs=<optimized out>, assume_enough_cpus=<optimized out>) at numad.c:1796
  1791: numad_log(LOG_DEBUG, "Interleaved MBs: %ld\n", ix, p->process_MBs[ix]);
  1792: } else {
  1793: numad_log(LOG_DEBUG, "PROCESS_MBs[%d]: %ld\n", ix, p->process_MBs[ix]);
  1794: }
  1795: }
  1796: if (ID_IS_IN_LIST(ix, p->node_list_p)) {
  1797: proc_avg_node_CPUs_free += node[ix].CPUs_free;
  1798: }
  1799: }
  1800: proc_avg_node_CPUs_free /= NUM_IDS_IN_LIST(p->node_list_p);
  1801: if ((process_has_interleaved_memory) && (keep_interleaved_memory)) {
#1 0x0000000000000000 in ?? ()

That already smells like a different symptom due to the same root cause (sparse node IDs)
Most likely the node[ix] access.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since this is constructed like:
  ADD_ID_TO_LIST(node[0].node_id, target_node_list_p);
I guess this delivers 0 and then 8 in my system
== the node_id instead of the index.

1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {
1797 proc_avg_node_CPUs_free += node[ix].CPUs_free;
1798 }

While the indexes are 0, 1

I think we'd want to convert our node_id->array-index mapper to a function and use it in the place we fixed first and this one. To then retest this.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, no this must be different.

This is doing:
for (int ix = 0; (ix <= num_nodes); ix++) {

which essentially is 0,1,2
The 2 is odd here, but it seems to break already at
1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {

and the latter array access would be fine as ix is currently zero
1797 proc_avg_node_CPUs_free += node[ix].CPUs_free;

I need to disable optimization again to make more sense of it ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hit another crash:
    static id_list_p cpu_bind_list_p;
    CLEAR_CPU_LIST(cpu_bind_list_p);

But this is a malloc.c(16) it seems this system currently is broken in general.
tcache_get really shouldn't fail here.
Also I have seen hang_checks in dmesg.

I'll redeploy and give all of this a new try.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems on the former deployment I hit some memory bug which broke and stalled quite some allocations. While I haven't found what was causing that (would be an interesting bug report) the renewed systems seems good.

And in that environment I was able to verify the fix just as expected.
Sorry for the noise before, but I try to take these verification serious :-/

With the fix in place I see a correct movement to node 8 for example:
Wed Jul 17 12:33:16 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (0,8) to nodes (8)
Wed Jul 17 12:33:16 2019: PID 47693 moved to node(s) 8 in 0.0 seconds
Wed Jul 17 12:38:21 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (8) to nodes (8)
Wed Jul 17 12:38:21 2019: PID 47693 moved to node(s) 8 in 0.0 seconds
Wed Jul 17 12:43:26 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (8) to nodes (8)
Wed Jul 17 12:43:26 2019: PID 47693 moved to node(s) 8 in 0.0 seconds

Set verification ok for Bionic, upgrading to Disco for the verification there

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The service worked fine even through a full release upgrade from Bionic to Debian I saw it moving processes just fine.

When on Disco I pushed some load in the guest to get more movements but things worked fine still.

Setting verified for Disco.

P.S. I also think I have found the "other" crash that I have seen, it seems to be triggered by restarting numad while a numad-guided process is active. So in the scenario here get your guest first and then restart numad. I'm filing a bug 1836913 for it so that someone can take a look at that later ...
To be sure that this isn't caused by this update (already quite sure since the place is different) I downgraded to
  sudo apt install numad=0.5+20150602-5
And there this issue is triggered as well on restart.

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Summarizing the state:
- numad is universe only and IMHO in a rather bad state
  - upstream seems dead for quite some time and does not respond to my patches
- the bug reported here is fixed and verified
- numad seems to have issues on service restart (unrelated to this update)
  -> the upgrades to numad in this SRU will trigger a service restart
  -> this might trigger bug 1836913 in the wild.

@SRU team: is the insight that restart is potentially bad (already before this update) and might be triggered a reason to stop the SRU?

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → In Progress
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I had to sit down and think about this for a moment. The bug with the service restart seems to only happen on ppc64el, which means the issue the package upgrade might trigger might have limited impact. On the other hand, the main target of this bugfix are ppc64el platforms, as those were the most likely to exhibit the original bug.

Before we release this, I would probably feel safer if we know how reproducible bug LP: #1836913 is on ppc64el, i.e. if this is only limited to this one particular device? How frequently does this happen?
Also questions like: how hard would it be to fix it?

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Because why I'm worried is that the original bug was only causing issues for numad under certain conditions, but the package upgrade will trigger a restart for *all* the instances of using numad. So if numad restart will cause trouble on all ppc64el cases, I'm worried we might cause more harm with the update than we have without it. Of course it all depends on my earlier questions, maybe it's not that bad.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Lukasz:
Thanks Lukasz for your thoughts - you confirm my concerns.

Trying to answer your question:
- reproducible
  - it failed on our P9 machine at 100%
  - I don't have another P9 to check if it is specific to "that" machine or P9 in general
  - I was deploying a P8 system to have some comparison
    - bug 1839065 blocked me from using the same workload, so the results are unreliable at
      best
- How frequently
  - in the affected system on every restart of the service (while huge guest was active)
- easy (or not) to fix:
  - from what I saw in the traces it looked like more out of bounds access.
    if that was right then it would be (too) much changes and realyl complex as that kind was in
    many places; but towards the end I got convinced that it might have been a red herring after
    all. Never the less unless it is further debugged the complexity is somewhere between rather-
    complex and unknown

IBM was reporting this bug here for P9, maybe they are the ones with the P9 machine park (different configurations). If it is important to them they can asses and let us know about details for bug 1836913 and we might hold back this SRU for now (being unsure how often we might trigger this).
OTOH any numad service restart will trigger it, it is not that we'd add the bug with this proposed update.

For now I'd say leave it in proposed and we wait if there is IBM feedback on bug 1839065 or bug 1836913.

tags: added: block-proposed
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as incomplete while awaiting resolution to bug 1839065 or bug 1836913.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
Manoj Iyer (manjo) wrote :

Wichita was updated with the latest Power8 firmware from IBM and is ready for your testing needs.

Current firwmare version :
P side : FW860.70 (SV860_205)
T side : FW860.70 (SV860_205)
Boot side : FW860.70 (SV860_205)

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
tags: added: block-proposed-bionic block-proposed-disco
removed: block-proposed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yeah this is still broken on both machines, sometimes faster sometimes slower to reproduce.
So to summarize we have bug 1832915 reported and a fix created.
But we also have bug 1836913 and potentially a whole set of bugs due to the same conceptual mismatches (assumption in code: numa zones would be linearly indexed, but that isn't true on power).

And all of this on a project that seems sort of dead upstream.
We will keep things as-is for there are systems not affected by this.
going forward we will carry the patches for this bug, but knowing that there is more that will affect power systems with their numa setup.

The SRUs will go to Incomplete - as we'd need to really fix the extended issues to make the backport worth anything.

To do so one would need to spend a significant upstream dev effort on bug 1836913.
That would (if anyone) in my POV be the HW-enablement Team of the ppc64 platform.
So that would be inside IBM I guess?

Changed in numad (Ubuntu Bionic):
status: Fix Committed → Incomplete
Changed in numad (Ubuntu Disco):
status: Fix Committed → Incomplete
Changed in numad (Ubuntu Eoan):
status: Fix Released → Incomplete
assignee: Canonical Server Team (canonical-server) → nobody
Changed in ubuntu-power-systems:
assignee: Canonical Server Team (canonical-server) → nobody
no longer affects: numad (Ubuntu Eoan)
Changed in numad (Ubuntu):
assignee: nobody → bugproxy (bugproxy)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Frank - could you make sure in the next calls that the status on these two issues is clear?

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as incomplete while awaiting for numad upstream Power porting work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

TBH, I'd not mark this as prio high from our POV.
It is high to "know if something will come back on this" but not the actual issue.

For the wider Ubuntu community this is just a rarely used universe package with a somewhat dead upstream - nothing to stress out for IMHO.

It is somewhat important to us, if it is important to IBM.
The reason it might be important is that all of this started with bugs reported against it by IBM.
The reasons could be
a) IBM uses it (or plans to) somewhere for production then it should be important to them
b) some odd testcase was run, maybe on an outdated test definition and actually nobody cares, then I guess everyone is fine to close this as won't fix.

Changed in numad (Ubuntu Focal):
importance: High → Low
Changed in ubuntu-power-systems:
importance: High → Low
Changed in numad (Ubuntu Eoan):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-02-24 02:11 EDT-------
root@ws-g48-2d81-host:~# uname -a
Linux ws-g48-2d81-host 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:47:13 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux

root@ws-g48-2d81-host:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Focal Fossa (development branch)"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

root@ws-g48-2d81-host:~# service numad status
? numad.service - numad - The NUMA daemon that manages application locality.
Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Mon 2020-02-24 01:08:37 EST; 51min ago
Docs: man:numad
Process: 458646 ExecStart=/usr/bin/numad $DAEMON_ARGS -i 15 (code=exited, status=0/SUCCESS)
Main PID: 458655 (code=dumped, signal=SEGV)

Feb 23 13:13:09 ws-g48-2d81-host systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Feb 23 13:13:09 ws-g48-2d81-host systemd[1]: Started numad - The NUMA daemon that manages application locality..
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Failed with result 'core-dump'.

Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990410] numad[458655]: segfault (11) at 2cd49ad1990 nip 7bafaaf6253c lr 2ccc1308580 code 1 in libc-2.30.so[7bafaaeb0000+210000]
Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990424] numad[458655]: code: 60420000 7ba70fa4 7d0a3a2e 2c280000 4182ff14 7ba91f24 3908ffff eba10028
Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990427] numad[458655]: code: ebe10038 7d2a4a14 38c00000 ebc90080 <e8be0000> f8a90080 7d0a3b2e f8de0008
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Failed with result 'core-dump'.

Frank Heimes (fheimes)
Changed in numad (Ubuntu Disco):
status: Incomplete → Won't Fix
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-03-20 16:35 EDT-------
Hello Canonical,

So, this is still an issue in Ubuntu 20.04, as the last test results shows. Is this something you would be willing to fix?

tags: added: targetmilestone-inin2004
removed: targetmilestone-inin---
Revision history for this message
Frank Heimes (fheimes) wrote :

Hi, this bug has a 'sister' bug: LP 1836913
The outcome on the numad discussion in the interlock calls with IBM (based on these two bugs) was that proper upstream support and fixing from IBM is needed especially for Power.
Some structural issues where identified that can't be easily fixed, there is more to do.
Please see: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1836913/comments/8 etc.
So we believed that there is already some upstream work going on by the IBM Power team.
An upstream accepted version or patches can again be considered to be SRUed back to Ubuntu.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-04-06 18:06 EDT-------
Reclassifying as P3/low to match 'numad' classification.

tags: added: severity-low
removed: severity-high
Frank Heimes (fheimes)
tags: added: hwe-long-running
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package numad - 0.5+20150602-6

---------------
numad (0.5+20150602-6) unstable; urgency=medium

  [ Christian Ehrhardt ]
  * d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
    (LP: #1832915)(Closes: #930725)

  [ gustavo panizzo ]
  * [0b4115] add patch from upstream repo
  * [d97937] update homepage
  * [1b3223] update vcs-* urls to point to salsa.d.o
  * [767be9] do not require root to build
  * [b6b360] use the latest debhelper-compat
  * [2eee3c] guessing a debian/watch file
  * [cecd01] update the d/gbp.conf file
  * [953d8c] increase standards version to 4.5.0
  * [094901] remove trailing spaces and comments from d/rules
  * [a267dd] use a secure uri for the copyright format
  * [577311] install a logrotate file
  * [47d07d] no longer install upstream changelog

 -- gustavo panizzo <email address hidden> Fri, 20 Nov 2020 22:22:20 +0000

Changed in numad (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Eoan is now EOL. Marking as "won't fix".

Changed in numad (Ubuntu Eoan):
status: Incomplete → Invalid
Frank Heimes (fheimes)
Changed in numad (Ubuntu Eoan):
status: Invalid → Won't Fix
Mathew Hodson (mhodson)
Changed in numad (Ubuntu Bionic):
importance: Undecided → Low
Changed in numad (Ubuntu Cosmic):
importance: Undecided → Low
Changed in numad (Ubuntu Disco):
importance: Undecided → Low
Changed in numad (Ubuntu Eoan):
importance: Undecided → Low
tags: removed: block-proposed-disco verification-done verification-done-disco
Revision history for this message
Mathew Hodson (mhodson) wrote :

Setting package status based on what was released.
---

numad (0.5+20150602-5ubuntu1) eoan; urgency=medium

  * d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
    (LP: #1832915)

 -- Christian Ehrhardt <email address hidden> Wed, 19 Jun 2019 13:05:33 +0200

Changed in numad (Ubuntu Focal):
status: Incomplete → Fix Released
Changed in numad (Ubuntu Eoan):
status: Won't Fix → Fix Released
Changed in numad (Debian):
status: New → Fix Released
Frank Heimes (fheimes)
Changed in numad (Ubuntu):
assignee: bugproxy (bugproxy) → nobody
Changed in numad (Ubuntu Focal):
assignee: bugproxy (bugproxy) → nobody
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Proposed package removed from archive

The version of numad in the proposed pocket of Bionic that was purported to fix this bug report has been removed because the target series has reached its End of Life.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : sosreport on host

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Frank Heimes (fheimes)
Changed in numad (Ubuntu Bionic):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.