numad crashes while running kvm guest
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| The Ubuntu-power-systems project |
Low
|
Unassigned | ||
| numad (Debian) |
Fix Released
|
Unknown
|
||
| numad (Ubuntu) |
Low
|
bugproxy | ||
| Bionic |
Low
|
Unassigned | ||
| Cosmic |
Low
|
Unassigned | ||
| Disco |
Low
|
Unassigned | ||
| Eoan |
Low
|
Unassigned | ||
| Focal |
Low
|
bugproxy |
Bug Description
[Impact]
* The numad code never considered that node IDs could not be sequential
and creates an out of array access.
* Fix the array index usage to not hit that
[Test Case]
0. The most important and least available ingredient to this issue are sparse Numa nodes. Usually on your laptop you just have one, on your usual x86 server you might have more but you usually have 0,1,2,...
On powerpc commonly people disable SMT (as that was a KVM requirement up to p8). This (or other cpu offlining) can lead to numa nodes like:
1,16,30 being the only one left. Only with a setup like that you can follow and trigger the case.
1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
and installed build-essential
(my guetss are Bionic and that didn't have stressapptest packaged yet)
Built and installed the tool
5. ran the stress in both guests as mentioned
$ stressapptest -s 200
=> This will trigger the crash
[Regression Potential]
* Without the fix it is severely broken on systems with sparse numa
nodes. I imagine you can (with some effort or bad luck) also create
such a case on x86, it is not ppc64 specific in general.
The code before the fix just works by accident for cpu~=nodid.
* Obviously the most likely potential regression would be to trigger
issues when parsing these arrays on systems that formerly run fine
not affected by the sparse node issue. But for non-sparse systems not
a lot should change the new code will find for example
cpu=1 mapped to node=1 instead of just assuming cpu=1 IS node=1.
Therefore I obviously hope for no regression, but that is the one I'd
expect if any.
[Other Info]
* I have submitted this upstream, but upstream seems somewhat dead :-/
* do not get crazy when reading the code, nodeid && cpudid are used
somewhat interchangeably which might make you go nuts at first (it
did for me); but I kept the upstream names as-is for less patch size.
----
== Comment: #0 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:42:23 ==
---Problem Description---
while running KVM guests, we are observing numad crashes on host.
Contact Information = <email address hidden>
---uname output---
Linux ltcgen6 4.15.0-1016-ibm-gt #18-Ubuntu SMP Thu Feb 7 16:58:31 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = witherspoon
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. check status of numad, if stopped start it
2. start a kvm guest
3. Run some memory tests inside guest
On the host after few minutes we see numad crashing. I had enabled debug log for numad, seeing below messages in numad.log before it crashes:
8870669: PID 88781: (qemu-system-ppc), Threads 6, MBs_size 15871, MBs_used 11262, CPUs_used 400, Magnitude 4504800, Nodes: 0,8
Thu Feb 21 00:12:10 2019: PICK NODES FOR: PID: 88781, CPUs 470, MBs 18671
Thu Feb 21 00:12:10 2019: PROCESS_MBs[0]: 9201
Thu Feb 21 00:12:10 2019: Node[0]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[1]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[2]: mem: 1878026 cpu: 4666
Thu Feb 21 00:12:10 2019: Node[3]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[4]: mem: 0 cpu: 6
Thu Feb 21 00:12:10 2019: Node[5]: mem: 2194058 cpu: 4728
Thu Feb 21 00:12:10 2019: Totmag[0]: 94112134
Thu Feb 21 00:12:10 2019: Totmag[1]: 109211855
Thu Feb 21 00:12:10 2019: Totmag[2]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[3]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[4]: 2990058
Thu Feb 21 00:12:10 2019: Totmag[5]: 2990058
Thu Feb 21 00:12:10 2019: best_node_ix: 1
Thu Feb 21 00:12:10 2019: Node: 8 Dist: 10 Magnitude: 10373506224
Thu Feb 21 00:12:10 2019: Node: 0 Dist: 40 Magnitude: 8762869316
Thu Feb 21 00:12:10 2019: Node: 253 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 254 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 252 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: Node: 255 Dist: 80 Magnitude: 0
Thu Feb 21 00:12:10 2019: MBs: 18671, CPUs: 470
Thu Feb 21 00:12:10 2019: Assigning resources from node 5
Thu Feb 21 00:12:10 2019: Node[0]: mem: 2007348 cpu: 1908
Thu Feb 21 00:12:10 2019: MBs: 0, CPUs: 0
Thu Feb 21 00:12:10 2019: Assigning resources from node 2
Thu Feb 21 00:12:10 2019: Process 88781 already 100 percent localized to target nodes.
On syslog we see sig 11:
[88726.086144] numad[88879]: unhandled signal 11 at 000000e38fe72688 nip 0000782ce4dcac20 lr 0000782ce4dcf85c code 1
Stack trace output:
no
Oops output:
no
System Dump Info:
The system was configured to capture a dump, however a dump was not produced.
*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.
== Comment: #2 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:44:38 ==
== Comment: #3 - SRIKANTH AITHAL <email address hidden> - 2019-02-20 23:48:20 ==
I was using stressapptest to run memory workload inside the guest
`stressapptest -s 200`
== Comment: #5 - Brian J. King <email address hidden> - 2019-03-08 09:17:29 ==
Any update on this?
== Comment: #6 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 11:59:16 ==
Yes, I have been working on this for a while.
After a suggestion of @lagarcia, I tested the bug on the same machine, booted on default kernel (4.15.0-45-generic) and also booted the vm with the same generic kernel.
Results are that the bug also happens with 4.15.0-45-generic. So, it may not be a problem of the changes included on kernel 4.15.0-
A few things I noticed, that may be interesting to solve this bug:
- I had a very hard time to reproduce the bug on numad that started on boot. If I restart, or stop/start, the bug reproduces much easier.
- I debugged numad using gdb and I found out it is getting segfault on _int_malloc(), from glibc.
Attached is an occurrence of the bug, while numad was on gdb.
(systemctl start numad ; gdb /usr/bin/numad $NUMAD_PID)
== Comment: #7 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-08 12:00:00 ==
== Comment: #8 - Leonardo Bras Soares Passos <email address hidden> - 2019-03-11 17:04:25 ==
I reverted the whole system to vanilla Ubuntu Bionic, and booted on 4.15.0-45-generic kernel.
Linux ltcgen6 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Then I booted the guest, also on 4.15.0-45-generic.
Linux ubuntu 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:27:02 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
I tried to reproduce the error, and I was able to.
It probably means this bug was not introduced by the changes on qemu/kernel, and it is present in the current repository of Ubuntu.
Next step should be doing a deeper debug on numad, in order to identify why it is getting segfault.
Related branches
- Christian Ehrhardt : Approve on 2019-07-16
- Canonical Server Team: Pending requested 2019-07-12
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
- Christian Ehrhardt : Approve on 2019-07-16
- Canonical Server Team: Pending requested 2019-07-12
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
- Andreas Hasenack (community): Approve on 2019-06-19
- Canonical Server Team: Pending requested 2019-06-19
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/lp-1832915-fix-sparse-node-ids.patch (+60/-0)
debian/patches/series (+1/-0)
bugproxy (bugproxy) wrote : sosreport on host | #1 |
tags: | added: architecture-ppc64le bugnameltc-175673 severity-high targetmilestone-inin--- |
bugproxy (bugproxy) wrote : gdb output | #2 |
Default Comment by Bridge
Changed in ubuntu: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
affects: | ubuntu → numad (Ubuntu) |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Manoj Iyer (manjo) |
Christian Ehrhardt (paelzer) wrote : | #3 |
On a fresh Bionic running with the latest 4.15.0-51-generic I did the following trying to reproduce this issues.
Note: My Host has 128G mem and 40 cores (SMT off)
1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
and installed build-essential
(my guetss are Bionic and that didn't have stressapptest packaged yet)
Built and installed the tool
5. ran the stress in both guests as mentioned
$ stressapptest -s 200
Well actually I was just about to start that load (not yet happened) when I realized my numad process has already died:
● numad.service - numad - The NUMA daemon that manages application locality.
Loaded: loaded (/lib/systemd/
Active: failed (Result: core-dump) since Mon 2019-06-17 06:12:31 UTC; 2min 23s ago
Docs: man:numad
Process: 119546 ExecStart=
Main PID: 119547 (code=dumped, signal=SEGV)
Jun 17 06:00:28 dradis systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Jun 17 06:00:28 dradis systemd[1]: Started numad - The NUMA daemon that manages application locality..
Jun 17 06:12:31 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jun 17 06:12:31 dradis systemd[1]: numad.service: Failed with result 'core-dump'.
So the mem-stress load might help to trigger it, but isn't necessarily required.
After restarting the numad daemon I started the guest load and got the crash again.
While I have no idea yet what exactly is going on lets set this to confirmed at least.
[1]: https:/
Initially had PID 119547 and no odd entries in the log.
Changed in numad (Ubuntu): | |
status: | New → Confirmed |
Christian Ehrhardt (paelzer) wrote : | #4 |
With verbose my numad log file is:
Mon Jun 17 06:22:53 2019: Nodes: 2
Min CPUs free: 1416, Max CPUs: 1423, Avg CPUs: 1419, StdDev: 3.53553
Min MBs free: 12869, Max MBs: 13756, Avg MBs: 13312, StdDev: 443.5
Node 0: MBs_total 65266, MBs_free 12869, CPUs_total 2000, CPUs_free 1416, Distance: 10 40 CPUs: 0,4,8,12,
Node 1: MBs_total 65337, MBs_free 13756, CPUs_total 2000, CPUs_free 1423, Distance: 40 10 CPUs: 80,84,88,
Mon Jun 17 06:22:53 2019: Processes: 1563
Mon Jun 17 06:22:53 2019: Candidates: 2
101867853: PID 120072: (qemu-system-ppc), Threads 23, MBs_size 55763, MBs_used 50509, CPUs_used 876, Magnitude 44245884, Nodes: 0,8
101867853: PID 120206: (qemu-system-ppc), Threads 23, MBs_size 55821, MBs_used 23699, CPUs_used 279, Magnitude 6612021, Nodes: 0,8
Mon Jun 17 06:22:53 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8)
With debug the dying message looked like:
Another run #2:
Mon Jun 17 06:25:08 2019: Nodes: 2
Min CPUs free: 302, Max CPUs: 439, Avg CPUs: 370, StdDev: 68.5018
Min MBs free: 1597, Max MBs: 4548, Avg MBs: 3072, StdDev: 1475.5
Node 0: MBs_total 65266, MBs_free 1597, CPUs_total 2000, CPUs_free 302, Distance: 10 40 CPUs: 0,4,8,12,
Node 1: MBs_total 65337, MBs_free 4548, CPUs_total 2000, CPUs_free 439, Distance: 40 10 CPUs: 80,84,88,
Mon Jun 17 06:25:08 2019: Processes: 1572
Mon Jun 17 06:25:08 2019: Candidates: 2
101881395: PID 120072: (qemu-system-ppc), Threads 25, MBs_size 55763, MBs_used 50523, CPUs_used 1995, Magnitude 100793385, Nodes: 0,8
101881395: PID 120206: (qemu-system-ppc), Threads 25, MBs_size 55821, MBs_used 45916, CPUs_used 830, Magnitude 38110280, Nodes: 0,8
Mon Jun 17 06:25:08 2019: PICK NODES FOR: PID: 120072, CPUs 2347, MBs 59438
Mon Jun 17 06:25:08 2019: PROCESS_MBs[0]: 17481
Mon Jun 17 06:25:08 2019: Node[0]: mem: 201700 cpu: 5952
Mon Jun 17 06:25:08 2019: Node[1]: mem: 45480 cpu: 2634
Mon Jun 17 06:25:08 2019: Totmag[0]: 12080055
Mon Jun 17 06:25:08 2019: Totmag[1]: 1948267
Mon Jun 17 06:25:08 2019: best_node_ix: 0
Mon Jun 17 06:25:08 2019: Node: 0 Dist: 10 Magnitude: 1200518400
Mon Jun 17 06:25:08 2019: Node: 8 Dist: 40 Magnitude: 119794320
Mon Jun 17 06:25:08 2019: MBs: 59438, CPUs: 2347
Mon Jun 17 06:25:08 2019: Assigning resources from node 0
Mon Jun 17 06:25:08 2019: Node[0]: mem: 1000 cpu: 0
Mon Jun 17 06:25:08 2019: MBs: 39368, CPUs: 1355
Mon Jun 17 06:25:08 2019: Assigning resources from node 1
Mon Jun 17 06:25:08 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8)
Another run #3:
Mon Jun 17 06:26:46 2019: Nodes: 2
Min CPUs free: 889, Max CPUs: 1048, Avg CPUs: 968, StdDev: 79.5016
Min MBs free: 1291, Max MBs: 3484, Avg MBs: 2387, StdDev: 1096.5
Node 0: MBs_total 65266, MBs_free 1291, CPUs_total 2000, CPUs_free 889, Distance: 10 40 CPUs: 0,4,8,12,
Node 1: MBs_total 65337, MBs_free 3484, CPUs_total 2000, CPUs_free 104...
Changed in ubuntu-power-systems: | |
status: | New → Confirmed |
Christian Ehrhardt (paelzer) wrote : | #5 |
# get debug symbols and gdb
$ sudo apt install numad-dbgsym gdb dpkg-dev
# get source as used in the package
$ apt source numad
# I found that we will also need glibc source, so:
$ apt source glibc
It helps to add paths to gdb
(gdb) directory /home/ubuntu/
I found these backtraces:
Thread 1 "numad" received signal SIGSEGV, Segmentation fault.
tcache_get (tc_idx=<optimized out>) at malloc.c:2943
2943 malloc.c: No such file or directory.
(gdb) bt
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2943
#1 __GI___libc_malloc (bytes=16) at malloc.c:3050
#2 0x00000d9b7ec780dc in bind_process_
#3 0x00000d9b7ec7d148 in manage_loads () at numad.c:2225
#4 0x00000d9b7ec734dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654
(gdb) bt
#0 0x00000fb6cd2779f4 in bind_process_
#1 0x00000fb6cd27d148 in manage_loads () at numad.c:2225
#2 0x00000fb6cd2734dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654
(gdb) bt
#0 0x000001c757da79f4 in bind_process_
#1 0x000001c757dad148 in manage_loads () at numad.c:2225
#2 0x000001c757da34dc in main (argc=<optimized out>, argv=<optimized out>) at numad.c:2654
Christian Ehrhardt (paelzer) wrote : | #6 |
One fail was at:
CLEAR_CPU_
The next two at:
OR_LISTS(
The common denominator here is cpu_bind_list_p but that is a static local:
static id_list_p cpu_bind_list_p;
The function is defined as:
#define OR_LISTS( or_list_p, list_1_p, list_2_p) CPU_OR_S( or_list_p->bytes, or_list_p->set_p, list_1_p->set_p, list_2_p->set_p)
That translates into
CPU_OR_S( cpu_bind_
CPU_OR_S is from sched.h and will make it to:
- operate on the dynamically allocated CPU set(s) whose size is setsize bytes. (due to _S)
- Store the union of the sets cpu_bind_
- explicitly says dest "may be one of the source sets"
Christian Ehrhardt (paelzer) wrote : | #7 |
(gdb) p cpu_bind_
$5 = 24
(gdb) p *(cpu_bind_
$7 = {__bits = {12297829382473
1955697400016, 1955697400512}}
(gdb) p sizeof(
$8 = 128
See the size mismatch?
It will allocate 24 bytes and needs 128.
I think this is a bad intialization.
We have these operations in the code on cpu_bind_list_p.
static id_list_p cpu_bind_list_p;
CLEAR_CPU_
OR_LISTS(
Now CLEAR_CPU_LIST has some init code, but only if == NULL.
#define CLEAR_CPU_
if (list_p == NULL) { \
} \
CPU_
Since we can't rely on data in that static var "by accident" it might have stale old data.
Note: The other chance of errors is the 40 active CPUs vs the 160 potential CPUs (SMT off) that I have in my system.
The size is from num_cpu - if that detection is off then it might fail as well.
But at least in all my crashes that was ok.
(gdb) p num_cpus
$10 = 160
So lets assume it is the lack of (re)initialization for now.
Other structures of type "id_list_p" are all initialized with NULL btw.
Like:
id_list_p all_cpus_list_p = NULL;
id_list_p all_nodes_list_p = NULL;
id_list_p reserved_
Christian Ehrhardt (paelzer) wrote : | #8 |
Note:
- numad is "only" in universe in all releases
- nothing depends on it
- it is on 0.5+20150602-5 which seems rather old
- But upstream commits [1] since 2015 are minimal
TL;DR no (somewhat dead) upstream fix for that available to cherry-pick
Hmm, the above might have been a red herring.
I get 24 bytes size even in cases where the initial value was null.
Maybe the pure size of the set_p isn't waht matters.
In any case lets do a non-optimized build to get around debugging issues like:
(gdb) p node[node_
value has been optimized out
Also I realized that this is a static var not only to be local but in the common sense to keep content across function calls. So initializing on init will be wrong :-)
tags: | added: universe |
Christian Ehrhardt (paelzer) wrote : | #9 |
While I built a proper PPA in [1] this seems so trivial that we can rebuild locally with just
$ cc -std=gnu99 -I. -D__thread="" -c -o numad.o numad.c
$ cc numad.o -lpthread -lrt -lm -o numad
$ mv numad /usr/bin/numad
That should allow quick iterations.
With debug enabled I found that the second set is actually null.
So the red herring assumption above was correct.
996 while (nodes) {
997 if (ID_IS_
998 OR_LISTS(
(gdb) p node[node_
$4 = (id_list_p) 0x0
The arg is
(gdb) p *(p->node_list_p)
$7 = {set_p = 0x304a3a418d0, bytes = 8}
This delivers "1"
int nodes = NUM_IDS_
(gdb) p nodes
$5 = 1
That is:
#define NUM_IDS_
Per [2] this counts the cpus in the cpu_set.
So the TL;DR of this loop
while (nodes) {
nodes -= 1;
is that it iterates over all CPUs
On the each iteration it checks
if (ID_IS_
node_id starts at zero and is incremented each iteration.
I must admit the usage of the term "node" for cpus here is very misleading.
"node" is a global data structure
typedef struct node_data {
uint64_t node_id;
uint64_t MBs_total;
uint64_t MBs_free;
uint64_t CPUs_total; // scaled * ONE_HUNDRED
uint64_t CPUs_free; // scaled * ONE_HUNDRED
uint64_t magnitude; // hack: MBs * CPUs
uint8_t *distance;
id_list_p cpu_list_p;
} node_data_t, *node_data_p;
node_data_p node = NULL;
Due to the misperception of "node" actually being CPUs the indexing here is off IMHO.
(gdb) p node[0]
$13 = {node_id = 0, MBs_total = 65266, MBs_free = 1510, CPUs_total = 2000, CPUs_free = 1144, magnitude = 1727440, distance = 0x304a3a41850 "\n(\032\n\244~", cpu_list_p = 0x304a3a41810}
(gdb) p node[1]
$14 = {node_id = 8, MBs_total = 65337, MBs_free = 1734, CPUs_total = 2000, CPUs_free = 1049, magnitude = 1818966, distance = 0x304a3a418b0 "(\n\032\n\244~", cpu_list_p = 0x304a3a41870}
My CPUs are 0,4,8,... and so is the indexing here as despite the node name it is actually based on CPUs.
Summary:
- The code checks for each CPU as counted by NUM_IDS_IN_LIST
- It will increase the ID until it found a hit in ID_IS_IN_
- that will skip empty CPUs as in my SMT case
- Once it found a cpu that is in the set it will OR_LISTS
node[
Christian Ehrhardt (paelzer) wrote : | #10 |
Christian Ehrhardt (paelzer) wrote : | #11 |
The problem is that node[node_
When you look at the array again it has two real entries and nothing more:
(gdb) p node[0]
$20 = {node_id = 0, MBs_total = 65266, MBs_free = 1510, CPUs_total = 2000, CPUs_free = 1144, magnitude = 1727440, distance = 0x304a3a41850 "\n(\032\n\244~", cpu_list_p = 0x304a3a41810}
(gdb) p node[1]
$21 = {node_id = 8, MBs_total = 65337, MBs_free = 1734, CPUs_total = 2000, CPUs_free = 1049, magnitude = 1818966, distance = 0x304a3a418b0 "(\n\032\n\244~", cpu_list_p = 0x304a3a41870}
(gdb) p node[2]
$22 = {node_id = 1820693536, MBs_total = 33, MBs_free = 3318460192688, CPUs_total = 24, CPUs_free = 1839495593, magnitude = 33,
distance = 0x1111111111111111 <error: Cannot access memory at address 0x1111111111111
(gdb) p node[3]
$23 = {node_id = 286331153, MBs_total = 33, MBs_free = 3318460192752, CPUs_total = 8, CPUs_free = 1842299472, magnitude = 33,
distance = 0x101 <error: Cannot access memory at address 0x101>, cpu_list_p = 0x7ea40a1a0e08 <main_arena+96>}
(gdb) p node[4]
$24 = {node_id = 1867659328, MBs_total = 33, MBs_free = 3318460192816, CPUs_total = 24, CPUs_free = 1882184320, magnitude = 33,
distance = 0x1111111111111111 <error: Cannot access memory at address 0x1111111111111
(gdb) p node[5]
$25 = {node_id = 0, MBs_total = 33, MBs_free = 139243009222666, CPUs_total = 139243009216008, CPUs_free = 1898775676, magnitude = 33, distance = 0x304a3a41890 "", cpu_list_p = 0x18}
(gdb) p node[6]
$26 = {node_id = 368942164530456
distance = 0x7ea40a1a0a28 <_IO_wide_
(gdb) p node[7]
$27 = {node_id = 354615088215883
cpu_list_p = 0x8}
(gdb) p node[8]
$28 = {node_id = 303211223003168792, MBs_total = 33, MBs_free = 257, CPUs_total = 265, CPUs_free = 288230377024389144, magnitude = 33,
distance = 0x2f69 <error: Cannot access memory at address 0x2f69>, cpu_list_p = 0x0}
We essentially do an out of bounds to the array at index [8] where cpu_list_p = 0x0 and that triggers the SEGV
We actually do NOT want node[node_id]
Instead we'd need to iterate the node array entries, and pick that entry which has nodes[x].node_id == node_id.
Christian Ehrhardt (paelzer) wrote : | #12 |
Chances are that without the odd SMT=off numbering on ppc things would work.
That might explain why this didn't fail more often or on other architectures so far.
But disabling subset of CPUs is allowed, so this needs to be fixed for all - no matter how "often" an issue occurs on one of the architectures.
Christian Ehrhardt (paelzer) wrote : | #13 |
Rebuild via:
rm numad
cc -g -O0 -fstack-
ls -laF numad
sudo mv numad /usr/bin/numad
My current config triggering this has a pretty common CPU list on ppc64el:
CPU(s): 160
On-line CPU(s) list: 0,4,8,12,
Off-line CPU(s) list: 1-3,5-7,
The assumption seems to be correct, it was due to that cpu/node mismatch assuming always linear CPUs with cpu-number == index-in-array.
With the following change the breakage no more happens in my setup:
--- numad.c.orig 2019-06-17 09:27:49.783712059 +0000
+++ numad.c 2019-06-17 10:11:00.619113441 +0000
@@ -995,7 +995,18 @@
int node_id = 0;
while (nodes) {
if (ID_IS_
- OR_LISTS(
+ int id = -1;
+ for (int node_ix = 0; (node_ix < num_nodes); node_ix++) {
+ if (node[node_
+ id = node_ix;
+ break;
+ }
+ }
+ if (id == -1) {
+ numad_log(LOG_CRIT, "Node %d is requested, but unknown\n", node_id);
+ exit(EXIT_FAILURE);
+ }
+ OR_LISTS(
nodes -= 1;
}
node_id += 1;
Christian Ehrhardt (paelzer) wrote : | #14 |
While this numbering is pretty common at power (all non SMT systems) and s390x (scaling #cpus on load) it is uncommon on x86. Never the less in theory the issue should exist there as well
But I tried this for an hour and it didn't trigger (plenty of assigns happened)
Repro (x86)
1. Get a KVM guest with numa memory nodes
<memory unit='KiB'
<currentMemory unit='KiB'
<vcpu placement=
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
<cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
</numa>
</cpu>
2. disable some cpus in the mid
$ echo 0 | sudo tee /sys/bus/
$ echo 0 | sudo tee /sys/bus/
$ lscpu
CPU(s): 4
On-line CPU(s) list: 0,3
Off-line CPU(s) list: 1,2
3. install, start and follow the log of numad
$ sudo apt install numad
$ sudo systemctl start numad
$ journalctl -f -u numad
4. run some memory load that will make numad assign processes
$ sudo apt install stress-ng
$ stress-ng --vm 2 --vm-bytes 90% -t 5m
If we follow the log of numad with verbose enabled we will after a while see numa assignments like:
Mon Jun 17 10:32:05 2019: Advising pid 3416 (stress-ng-vm) move from nodes (0-1) to nodes (0)
Mon Jun 17 10:32:23 2019: Advising pid 3417 (stress-ng-vm) move from nodes (0-1) to nodes (1)
Maybe on ppc also the numa node numbering is non linear, I remember working on fixes for numactl in that regard - and maybe that is important as well.
Christian Ehrhardt (paelzer) wrote : | #15 |
I have made a test build with the fix available at PPA [1]. It resolves the issue for me, but before going further please give that a try with your setups as well.
Further I opened a PR for upstream at [2] to discuss it there as well.
Feel free to chime in and give it a +1 there if it works well for you.
[1]: https:/
[2]: https:/
Christian Ehrhardt (paelzer) wrote : | #16 |
@JFH/Manjo - the bug assignment is odd can you please set it up the way you need it to reflect that we are waiting on Upstream (ack on PR) and IBM (test PPA) ?
Changed in numad (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server) |
Changed in ubuntu-power-systems: | |
assignee: | Manoj Iyer (manjo) → Canonical Server Team (canonical-server) |
status: | Confirmed → Incomplete |
Changed in numad (Ubuntu): | |
status: | Confirmed → Incomplete |
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
Changed in numad (Ubuntu): | |
importance: | Undecided → High |
Christian Ehrhardt (paelzer) wrote : | #17 |
Reported to Debian (linked above) and prepared an MP for Eoan for team review.
But still waiting for your ok @IBM that this solves your case.
------- Comment From <email address hidden> 2019-06-20 07:00 EDT-------
(In reply to comment #22)
> Reported to Debian (linked above) and prepared an MP for Eoan for team
> review.
>
> But still waiting for your ok @IBM that this solves your case.
It does not solve the issue.
> Updated numad from ppa:
# dpkg -l | grep numad
ii numad 0.5+20150602-
# service numad status
? numad.service - numad - The NUMA daemon that manages application locality.
Loaded: loaded (/lib/systemd/
Active: active (running) since Thu 2019-06-20 06:50:13 EDT; 1s ago
Docs: man:numad
Process: 13844 ExecStart=
Main PID: 13845 (numad)
CGroup: /system.
??13845 /usr/bin/numad -i 15
> now started a KVM guest, ran `stress-ng -c 14 -vm 10` inside guest
> in few minutes numad crashed on host
Jun 20 06:56:15 localhost kernel: [ 2916.371332] numad[13845]: segfault (11) at 1b1ea0308 nip 7fffb56cac20 lr 7fffb56cf85c code 1 in libc-2.
Jun 20 06:56:15 localhost kernel: [ 2916.371352] numad[13845]: code: 3be30058 38c30010 f821ffc1 91230008 38000000 7c2004ac 7d2030a9 7c0031ad
Jun 20 06:56:15 localhost kernel: [ 2916.371354] numad[13845]: code: 40c2fff8 4c00012c 2fa90000 419e0144 <e9090008> 550ae13e 394afffe 794a1f48
Jun 20 06:56:16 localhost systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jun 20 06:56:16 localhost systemd[1]: numad.service: Failed with result 'core-dump'.
Christian Ehrhardt (paelzer) wrote : | #19 |
Interesting, for me the issue was no more reproducible with the fix applied.
Maybe there is another bug in the same code that you hit now.
Could you tell me all details about the involved setup in triggering this crash still?
Further this should have created a crash dump /var/crash/.
Probably best is to clean this up, let it crash again and then attach the crash here so I can take a look at where/why it might crash still for you.
Changed in numad (Debian): | |
status: | Unknown → New |
Christian Ehrhardt (paelzer) wrote : | #20 |
Hi,
any updates on this one?
All I could reproduce would be fixed with the suggested change, but since according to you that isn't sufficient I now need you to debug your case and/suggest add whatever change on top that you need.
After fixing the bug that I could identify I'd hate if this goes into "waiting forever" for some extra issue that you have with it.
This is incomplete until further info is proviede, you can
a) provide info how this might be reproducible for me as well
b) provide patches that fix your issue
c) like the fix that I have to at least fix some issues - we can push that and you can spawn another bug for the follow on issue that you have identified.
Please let me know what you need/prefer.
bugproxy (bugproxy) wrote : | #21 |
------- Comment From <email address hidden> 2019-07-12 04:50 EDT-------
Since the machine which I had is being used for other testing.. I setup another machine to test this again with all latest level + the numad from ppa..
# dpkg -l | grep numad
ii numad 0.5+20150602-
# uname -r
4.15.0-54-generic
Numad crash issue is fixed. I am not able to hit any crashes now...
Frank Heimes (fheimes) wrote : | #22 |
@bssrikanth many thanks for testing and feedback!
Christian Ehrhardt (paelzer) wrote : | #23 |
Ok, thanks bssrikanth!
That means we can go on with the SRU.
I'm still sort of frightened by upstream numad seeming dead, but the fix seems clear and now is confirmed to work for you which allows us to go on.
Christian Ehrhardt (paelzer) wrote : | #24 |
Uploaded to Eoan ...
Changed in numad (Ubuntu Cosmic): | |
status: | New → Won't Fix |
Christian Ehrhardt (paelzer) wrote : | #25 |
Two new MPs for Bionic/Disco uploads:
- https:/
- https:/
Launchpad Janitor (janitor) wrote : | #26 |
This bug was fixed in the package numad - 0.5+20150602-
---------------
numad (0.5+20150602-
* d/p/lp-
(LP: #1832915)
-- Christian Ehrhardt <email address hidden> Wed, 19 Jun 2019 13:05:33 +0200
Changed in numad (Ubuntu Eoan): | |
status: | Incomplete → Fix Released |
Changed in ubuntu-power-systems: | |
status: | Incomplete → In Progress |
description: | updated |
Changed in numad (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in numad (Ubuntu Disco): | |
status: | New → In Progress |
Christian Ehrhardt (paelzer) wrote : | #27 |
MP reviews complete, uploaded to Bionic/Disco unapproved
Hello bugproxy, or anyone else affected,
Accepted numad into disco-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in numad (Ubuntu Disco): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed verification-needed-disco |
Changed in numad (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed-bionic |
Brian Murray (brian-murray) wrote : | #29 |
Hello bugproxy, or anyone else affected,
Accepted numad into bionic-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
Christian Ehrhardt (paelzer) wrote : | #30 |
Took a P9 system which has spares nodes:
$ ll /sys/bus/
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/
Install and start numad
$ apt install numad
$ systemctl start numad
Start a KVM guest with 100 CPUs and 64G memory
$ uvt-simplestrea
$ uvt-kvm create --memory $((64*1024)) --cpu 100 --password ubuntu eoan arch=ppc64el release=eoan label=daily
Even without putting pressure on the memory we see the expected crash:
Jul 17 08:57:51 dradis kernel: numad[8341]: unhandled signal 11 at 0000712686320e90 nip 000071268451058c lr 00007126845132c0 code 1
Jul 17 08:57:52 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jul 17 08:57:52 dradis systemd[1]: numad.service: Failed with result 'core-dump'.
Installing from proposed.
numad/bionic-
Starting the numad service again and tracking the logs.
1. start guest
2. While that is going on putting some memory pressure on the guest with stressapptest
This time I was able to again trigger a crash with this setup despite using proposed.
Maybe I hit what you had when testing the PPA before.
It seems to occur more rarely but still reliable enough, but I'll try to collect debug data - maybe we find the further issue that is in here as well.
Lets call this verification failed for now, debug and potentially respin the fix to an extended one in Eoan and then reconsider.
Christian Ehrhardt (paelzer) wrote : | #31 |
New crash that as found is:
#0 0x000002375f1bd2c4 in pick_numa_nodes (pid=<optimized out>, cpus=<optimized out>, mbs=<optimized out>, assume_
1791: numad_log(
1792: } else {
1793: numad_log(
1794: }
1795: }
1796: if (ID_IS_IN_LIST(ix, p->node_list_p)) {
1797: proc_avg_
1798: }
1799: }
1800: proc_avg_
1801: if ((process_
#1 0x0000000000000000 in ?? ()
That already smells like a different symptom due to the same root cause (sparse node IDs)
Most likely the node[ix] access.
Christian Ehrhardt (paelzer) wrote : | #32 |
Since this is constructed like:
ADD_ID_
I guess this delivers 0 and then 8 in my system
== the node_id instead of the index.
1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {
1797 proc_avg_
1798 }
While the indexes are 0, 1
I think we'd want to convert our node_id-
Christian Ehrhardt (paelzer) wrote : | #33 |
Hmm, no this must be different.
This is doing:
for (int ix = 0; (ix <= num_nodes); ix++) {
which essentially is 0,1,2
The 2 is odd here, but it seems to break already at
1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {
and the latter array access would be fine as ix is currently zero
1797 proc_avg_
I need to disable optimization again to make more sense of it ...
Christian Ehrhardt (paelzer) wrote : | #34 |
Hit another crash:
static id_list_p cpu_bind_list_p;
CLEAR_
But this is a malloc.c(16) it seems this system currently is broken in general.
tcache_get really shouldn't fail here.
Also I have seen hang_checks in dmesg.
I'll redeploy and give all of this a new try.
Christian Ehrhardt (paelzer) wrote : | #35 |
It seems on the former deployment I hit some memory bug which broke and stalled quite some allocations. While I haven't found what was causing that (would be an interesting bug report) the renewed systems seems good.
And in that environment I was able to verify the fix just as expected.
Sorry for the noise before, but I try to take these verification serious :-/
With the fix in place I see a correct movement to node 8 for example:
Wed Jul 17 12:33:16 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (0,8) to nodes (8)
Wed Jul 17 12:33:16 2019: PID 47693 moved to node(s) 8 in 0.0 seconds
Wed Jul 17 12:38:21 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (8) to nodes (8)
Wed Jul 17 12:38:21 2019: PID 47693 moved to node(s) 8 in 0.0 seconds
Wed Jul 17 12:43:26 2019: Advising pid 47693 (qemu-system-ppc) move from nodes (8) to nodes (8)
Wed Jul 17 12:43:26 2019: PID 47693 moved to node(s) 8 in 0.0 seconds
Set verification ok for Bionic, upgrading to Disco for the verification there
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Christian Ehrhardt (paelzer) wrote : | #36 |
The service worked fine even through a full release upgrade from Bionic to Debian I saw it moving processes just fine.
When on Disco I pushed some load in the guest to get more movements but things worked fine still.
Setting verified for Disco.
P.S. I also think I have found the "other" crash that I have seen, it seems to be triggered by restarting numad while a numad-guided process is active. So in the scenario here get your guest first and then restart numad. I'm filing a bug 1836913 for it so that someone can take a look at that later ...
To be sure that this isn't caused by this update (already quite sure since the place is different) I downgraded to
sudo apt install numad=0.
And there this issue is triggered as well on restart.
tags: |
added: verification-done verification-done-disco removed: verification-needed verification-needed-disco |
Christian Ehrhardt (paelzer) wrote : | #37 |
Summarizing the state:
- numad is universe only and IMHO in a rather bad state
- upstream seems dead for quite some time and does not respond to my patches
- the bug reported here is fixed and verified
- numad seems to have issues on service restart (unrelated to this update)
-> the upgrades to numad in this SRU will trigger a service restart
-> this might trigger bug 1836913 in the wild.
@SRU team: is the insight that restart is potentially bad (already before this update) and might be triggered a reason to stop the SRU?
Changed in ubuntu-power-systems: | |
status: | Fix Committed → In Progress |
Łukasz Zemczak (sil2100) wrote : | #38 |
I had to sit down and think about this for a moment. The bug with the service restart seems to only happen on ppc64el, which means the issue the package upgrade might trigger might have limited impact. On the other hand, the main target of this bugfix are ppc64el platforms, as those were the most likely to exhibit the original bug.
Before we release this, I would probably feel safer if we know how reproducible bug LP: #1836913 is on ppc64el, i.e. if this is only limited to this one particular device? How frequently does this happen?
Also questions like: how hard would it be to fix it?
Łukasz Zemczak (sil2100) wrote : | #39 |
Because why I'm worried is that the original bug was only causing issues for numad under certain conditions, but the package upgrade will trigger a restart for *all* the instances of using numad. So if numad restart will cause trouble on all ppc64el cases, I'm worried we might cause more harm with the update than we have without it. Of course it all depends on my earlier questions, maybe it's not that bad.
Christian Ehrhardt (paelzer) wrote : | #40 |
@Lukasz:
Thanks Lukasz for your thoughts - you confirm my concerns.
Trying to answer your question:
- reproducible
- it failed on our P9 machine at 100%
- I don't have another P9 to check if it is specific to "that" machine or P9 in general
- I was deploying a P8 system to have some comparison
- bug 1839065 blocked me from using the same workload, so the results are unreliable at
best
- How frequently
- in the affected system on every restart of the service (while huge guest was active)
- easy (or not) to fix:
- from what I saw in the traces it looked like more out of bounds access.
if that was right then it would be (too) much changes and realyl complex as that kind was in
many places; but towards the end I got convinced that it might have been a red herring after
all. Never the less unless it is further debugged the complexity is somewhere between rather-
complex and unknown
IBM was reporting this bug here for P9, maybe they are the ones with the P9 machine park (different configurations). If it is important to them they can asses and let us know about details for bug 1836913 and we might hold back this SRU for now (being unsure how often we might trigger this).
OTOH any numad service restart will trigger it, it is not that we'd add the bug with this proposed update.
For now I'd say leave it in proposed and we wait if there is IBM feedback on bug 1839065 or bug 1836913.
tags: | added: block-proposed |
Andrew Cloke (andrew-cloke) wrote : | #41 |
Marking as incomplete while awaiting resolution to bug 1839065 or bug 1836913.
Changed in ubuntu-power-systems: | |
status: | In Progress → Incomplete |
Manoj Iyer (manjo) wrote : | #42 |
Wichita was updated with the latest Power8 firmware from IBM and is ready for your testing needs.
Current firwmare version :
P side : FW860.70 (SV860_205)
T side : FW860.70 (SV860_205)
Boot side : FW860.70 (SV860_205)
Changed in ubuntu-power-systems: | |
status: | Incomplete → Triaged |
tags: |
added: block-proposed-bionic block-proposed-disco removed: block-proposed |
Christian Ehrhardt (paelzer) wrote : | #43 |
Yeah this is still broken on both machines, sometimes faster sometimes slower to reproduce.
So to summarize we have bug 1832915 reported and a fix created.
But we also have bug 1836913 and potentially a whole set of bugs due to the same conceptual mismatches (assumption in code: numa zones would be linearly indexed, but that isn't true on power).
And all of this on a project that seems sort of dead upstream.
We will keep things as-is for there are systems not affected by this.
going forward we will carry the patches for this bug, but knowing that there is more that will affect power systems with their numa setup.
The SRUs will go to Incomplete - as we'd need to really fix the extended issues to make the backport worth anything.
To do so one would need to spend a significant upstream dev effort on bug 1836913.
That would (if anyone) in my POV be the HW-enablement Team of the ppc64 platform.
So that would be inside IBM I guess?
Changed in numad (Ubuntu Bionic): | |
status: | Fix Committed → Incomplete |
Changed in numad (Ubuntu Disco): | |
status: | Fix Committed → Incomplete |
Changed in numad (Ubuntu Eoan): | |
status: | Fix Released → Incomplete |
assignee: | Canonical Server Team (canonical-server) → nobody |
Changed in ubuntu-power-systems: | |
assignee: | Canonical Server Team (canonical-server) → nobody |
no longer affects: | numad (Ubuntu Eoan) |
Changed in numad (Ubuntu): | |
assignee: | nobody → bugproxy (bugproxy) |
Christian Ehrhardt (paelzer) wrote : | #44 |
@Frank - could you make sure in the next calls that the status on these two issues is clear?
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
Andrew Cloke (andrew-cloke) wrote : | #45 |
Marking as incomplete while awaiting for numad upstream Power porting work.
Christian Ehrhardt (paelzer) wrote : | #46 |
TBH, I'd not mark this as prio high from our POV.
It is high to "know if something will come back on this" but not the actual issue.
For the wider Ubuntu community this is just a rarely used universe package with a somewhat dead upstream - nothing to stress out for IMHO.
It is somewhat important to us, if it is important to IBM.
The reason it might be important is that all of this started with bugs reported against it by IBM.
The reasons could be
a) IBM uses it (or plans to) somewhere for production then it should be important to them
b) some odd testcase was run, maybe on an outdated test definition and actually nobody cares, then I guess everyone is fine to close this as won't fix.
Changed in numad (Ubuntu Focal): | |
importance: | High → Low |
Changed in ubuntu-power-systems: | |
importance: | High → Low |
Changed in numad (Ubuntu Eoan): | |
status: | New → Incomplete |
------- Comment From <email address hidden> 2020-02-24 02:11 EDT-------
root@ws-
Linux ws-g48-2d81-host 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:47:13 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux
root@ws-
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Focal Fossa (development branch)"
VERSION_ID="20.04"
HOME_URL="https:/
SUPPORT_URL="https:/
BUG_REPORT_URL="https:/
PRIVACY_
VERSION_
UBUNTU_
root@ws-
? numad.service - numad - The NUMA daemon that manages application locality.
Loaded: loaded (/lib/systemd/
Active: failed (Result: core-dump) since Mon 2020-02-24 01:08:37 EST; 51min ago
Docs: man:numad
Process: 458646 ExecStart=
Main PID: 458655 (code=dumped, signal=SEGV)
Feb 23 13:13:09 ws-g48-2d81-host systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Feb 23 13:13:09 ws-g48-2d81-host systemd[1]: Started numad - The NUMA daemon that manages application locality..
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Failed with result 'core-dump'.
Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990410] numad[458655]: segfault (11) at 2cd49ad1990 nip 7bafaaf6253c lr 2ccc1308580 code 1 in libc-2.
Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990424] numad[458655]: code: 60420000 7ba70fa4 7d0a3a2e 2c280000 4182ff14 7ba91f24 3908ffff eba10028
Feb 24 01:08:37 ws-g48-2d81-host kernel: [420098.990427] numad[458655]: code: ebe10038 7d2a4a14 38c00000 ebc90080 <e8be0000> f8a90080 7d0a3b2e f8de0008
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Feb 24 01:08:37 ws-g48-2d81-host systemd[1]: numad.service: Failed with result 'core-dump'.
Changed in numad (Ubuntu Disco): | |
status: | Incomplete → Won't Fix |
bugproxy (bugproxy) wrote : | #48 |
------- Comment From <email address hidden> 2020-03-20 16:35 EDT-------
Hello Canonical,
So, this is still an issue in Ubuntu 20.04, as the last test results shows. Is this something you would be willing to fix?
tags: |
added: targetmilestone-inin2004 removed: targetmilestone-inin--- |
Frank Heimes (fheimes) wrote : | #49 |
Hi, this bug has a 'sister' bug: LP 1836913
The outcome on the numad discussion in the interlock calls with IBM (based on these two bugs) was that proper upstream support and fixing from IBM is needed especially for Power.
Some structural issues where identified that can't be easily fixed, there is more to do.
Please see: https:/
So we believed that there is already some upstream work going on by the IBM Power team.
An upstream accepted version or patches can again be considered to be SRUed back to Ubuntu.
bugproxy (bugproxy) wrote : | #50 |
------- Comment From <email address hidden> 2020-04-06 18:06 EDT-------
Reclassifying as P3/low to match 'numad' classification.
tags: |
added: severity-low removed: severity-high |
tags: | added: hwe-long-running |
Launchpad Janitor (janitor) wrote : | #51 |
This bug was fixed in the package numad - 0.5+20150602-6
---------------
numad (0.5+20150602-6) unstable; urgency=medium
[ Christian Ehrhardt ]
* d/p/lp-
(LP: #1832915)(Closes: #930725)
[ gustavo panizzo ]
* [0b4115] add patch from upstream repo
* [d97937] update homepage
* [1b3223] update vcs-* urls to point to salsa.d.o
* [767be9] do not require root to build
* [b6b360] use the latest debhelper-compat
* [2eee3c] guessing a debian/watch file
* [cecd01] update the d/gbp.conf file
* [953d8c] increase standards version to 4.5.0
* [094901] remove trailing spaces and comments from d/rules
* [a267dd] use a secure uri for the copyright format
* [577311] install a logrotate file
* [47d07d] no longer install upstream changelog
-- gustavo panizzo <email address hidden> Fri, 20 Nov 2020 22:22:20 +0000
Changed in numad (Ubuntu): | |
status: | Incomplete → Fix Released |
Andrew Cloke (andrew-cloke) wrote : | #52 |
Eoan is now EOL. Marking as "won't fix".
Changed in numad (Ubuntu Eoan): | |
status: | Incomplete → Invalid |
Changed in numad (Ubuntu Eoan): | |
status: | Invalid → Won't Fix |
Changed in numad (Ubuntu Bionic): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Cosmic): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Disco): | |
importance: | Undecided → Low |
Changed in numad (Ubuntu Eoan): | |
importance: | Undecided → Low |
tags: | removed: block-proposed-disco verification-done verification-done-disco |
Mathew Hodson (mhodson) wrote : | #53 |
Setting package status based on what was released.
---
numad (0.5+20150602-
* d/p/lp-
(LP: #1832915)
-- Christian Ehrhardt <email address hidden> Wed, 19 Jun 2019 13:05:33 +0200
Changed in numad (Ubuntu Focal): | |
status: | Incomplete → Fix Released |
Changed in numad (Ubuntu Eoan): | |
status: | Won't Fix → Fix Released |
Changed in numad (Debian): | |
status: | New → Fix Released |
Default Comment by Bridge