Comment 3 for bug 1832915

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On a fresh Bionic running with the latest 4.15.0-51-generic I did the following trying to reproduce this issues.
Note: My Host has 128G mem and 40 cores (SMT off)

1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since there was no particular guest config mentioned I didn't configure anything special)
   I used uvtool to get the latest cloud image
4. cloned stressapptest from git [1] in the guests
   and installed build-essential
   (my guetss are Bionic and that didn't have stressapptest packaged yet)
   Built and installed the tool
5. ran the stress in both guests as mentioned
     $ stressapptest -s 200

Well actually I was just about to start that load (not yet happened) when I realized my numad process has already died:

● numad.service - numad - The NUMA daemon that manages application locality.
   Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
   Active: failed (Result: core-dump) since Mon 2019-06-17 06:12:31 UTC; 2min 23s ago
     Docs: man:numad
  Process: 119546 ExecStart=/usr/bin/numad $DAEMON_ARGS -i 15 (code=exited, status=0/SUCCESS)
 Main PID: 119547 (code=dumped, signal=SEGV)

Jun 17 06:00:28 dradis systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Jun 17 06:00:28 dradis systemd[1]: Started numad - The NUMA daemon that manages application locality..
Jun 17 06:12:31 dradis systemd[1]: numad.service: Main process exited, code=dumped, status=11/SEGV
Jun 17 06:12:31 dradis systemd[1]: numad.service: Failed with result 'core-dump'.

So the mem-stress load might help to trigger it, but isn't necessarily required.
After restarting the numad daemon I started the guest load and got the crash again.

While I have no idea yet what exactly is going on lets set this to confirmed at least.

[1]: https://github.com/stressapptest/stressapptest

Initially had PID 119547 and no odd entries in the log.