Comment 30 for bug 1832915

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Took a P9 system which has spares nodes:
$ ll /sys/bus/node/devices/node*
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node0 -> ../../../devices/system/node/node0/
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node8 -> ../../../devices/system/node/node8/

Install and start numad
$ apt install numad
$ systemctl start numad

Start a KVM guest with 100 CPUs and 64G memory
 $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=ppc64el label=daily release=eoan
 $ uvt-kvm create --memory $((64*1024)) --cpu 100 --password ubuntu eoan arch=ppc64el release=eoan label=daily

Even without putting pressure on the memory we see the expected crash:

Jul 17 08:57:51 dradis kernel: numad[8341]: unhandled signal 11 at 0000712686320e90 nip 000071268451058c lr 00007126845132c0 code 1
Jul 17 08:57:52 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jul 17 08:57:52 dradis systemd[1]: numad.service: Failed with result 'core-dump'.

Installing from proposed.
numad/bionic-proposed 0.5+20150602-5ubuntu0.18.04.1 ppc64el [upgradable from: 0.5+20150602-5]

Starting the numad service again and tracking the logs.

1. start guest
2. While that is going on putting some memory pressure on the guest with stressapptest

This time I was able to again trigger a crash with this setup despite using proposed.
Maybe I hit what you had when testing the PPA before.
It seems to occur more rarely but still reliable enough, but I'll try to collect debug data - maybe we find the further issue that is in here as well.

Lets call this verification failed for now, debug and potentially respin the fix to an extended one in Eoan and then reconsider.