With verbose my numad log file is: Mon Jun 17 06:22:53 2019: Nodes: 2 Min CPUs free: 1416, Max CPUs: 1423, Avg CPUs: 1419, StdDev: 3.53553 Min MBs free: 12869, Max MBs: 13756, Avg MBs: 13312, StdDev: 443.5 Node 0: MBs_total 65266, MBs_free 12869, CPUs_total 2000, CPUs_free 1416, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 Node 1: MBs_total 65337, MBs_free 13756, CPUs_total 2000, CPUs_free 1423, Distance: 40 10 CPUs: 80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156 Mon Jun 17 06:22:53 2019: Processes: 1563 Mon Jun 17 06:22:53 2019: Candidates: 2 101867853: PID 120072: (qemu-system-ppc), Threads 23, MBs_size 55763, MBs_used 50509, CPUs_used 876, Magnitude 44245884, Nodes: 0,8 101867853: PID 120206: (qemu-system-ppc), Threads 23, MBs_size 55821, MBs_used 23699, CPUs_used 279, Magnitude 6612021, Nodes: 0,8 Mon Jun 17 06:22:53 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8) With debug the dying message looked like: Another run #2: Mon Jun 17 06:25:08 2019: Nodes: 2 Min CPUs free: 302, Max CPUs: 439, Avg CPUs: 370, StdDev: 68.5018 Min MBs free: 1597, Max MBs: 4548, Avg MBs: 3072, StdDev: 1475.5 Node 0: MBs_total 65266, MBs_free 1597, CPUs_total 2000, CPUs_free 302, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 Node 1: MBs_total 65337, MBs_free 4548, CPUs_total 2000, CPUs_free 439, Distance: 40 10 CPUs: 80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156 Mon Jun 17 06:25:08 2019: Processes: 1572 Mon Jun 17 06:25:08 2019: Candidates: 2 101881395: PID 120072: (qemu-system-ppc), Threads 25, MBs_size 55763, MBs_used 50523, CPUs_used 1995, Magnitude 100793385, Nodes: 0,8 101881395: PID 120206: (qemu-system-ppc), Threads 25, MBs_size 55821, MBs_used 45916, CPUs_used 830, Magnitude 38110280, Nodes: 0,8 Mon Jun 17 06:25:08 2019: PICK NODES FOR: PID: 120072, CPUs 2347, MBs 59438 Mon Jun 17 06:25:08 2019: PROCESS_MBs[0]: 17481 Mon Jun 17 06:25:08 2019: Node[0]: mem: 201700 cpu: 5952 Mon Jun 17 06:25:08 2019: Node[1]: mem: 45480 cpu: 2634 Mon Jun 17 06:25:08 2019: Totmag[0]: 12080055 Mon Jun 17 06:25:08 2019: Totmag[1]: 1948267 Mon Jun 17 06:25:08 2019: best_node_ix: 0 Mon Jun 17 06:25:08 2019: Node: 0 Dist: 10 Magnitude: 1200518400 Mon Jun 17 06:25:08 2019: Node: 8 Dist: 40 Magnitude: 119794320 Mon Jun 17 06:25:08 2019: MBs: 59438, CPUs: 2347 Mon Jun 17 06:25:08 2019: Assigning resources from node 0 Mon Jun 17 06:25:08 2019: Node[0]: mem: 1000 cpu: 0 Mon Jun 17 06:25:08 2019: MBs: 39368, CPUs: 1355 Mon Jun 17 06:25:08 2019: Assigning resources from node 1 Mon Jun 17 06:25:08 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8) Another run #3: Mon Jun 17 06:26:46 2019: Nodes: 2 Min CPUs free: 889, Max CPUs: 1048, Avg CPUs: 968, StdDev: 79.5016 Min MBs free: 1291, Max MBs: 3484, Avg MBs: 2387, StdDev: 1096.5 Node 0: MBs_total 65266, MBs_free 1291, CPUs_total 2000, CPUs_free 889, Distance: 10 40 CPUs: 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 Node 1: MBs_total 65337, MBs_free 3484, CPUs_total 2000, CPUs_free 1048, Distance: 40 10 CPUs: 80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156 Mon Jun 17 06:26:46 2019: Processes: 1546 Mon Jun 17 06:26:46 2019: Candidates: 2 101891156: PID 120072: (qemu-system-ppc), Threads 23, MBs_size 55763, MBs_used 50593, CPUs_used 1437, Magnitude 72702141, Nodes: 0,8 101891156: PID 120206: (qemu-system-ppc), Threads 23, MBs_size 55821, MBs_used 48065, CPUs_used 613, Magnitude 29463845, Nodes: 0,8 Mon Jun 17 06:26:46 2019: PICK NODES FOR: PID: 120072, CPUs 1690, MBs 59521 Mon Jun 17 06:26:46 2019: PROCESS_MBs[0]: 17527 Mon Jun 17 06:26:46 2019: Node[0]: mem: 199130 cpu: 8316 Mon Jun 17 06:26:46 2019: Node[1]: mem: 34840 cpu: 6288 Mon Jun 17 06:26:46 2019: Totmag[0]: 16559650 Mon Jun 17 06:26:46 2019: Totmag[1]: 2190739 Mon Jun 17 06:26:46 2019: best_node_ix: 0 Mon Jun 17 06:26:46 2019: Node: 0 Dist: 10 Magnitude: 1655965080 Mon Jun 17 06:26:46 2019: Node: 8 Dist: 40 Magnitude: 219073920 Mon Jun 17 06:26:46 2019: MBs: 59521, CPUs: 1690 Mon Jun 17 06:26:46 2019: Assigning resources from node 0 Mon Jun 17 06:26:46 2019: Node[0]: mem: 1000 cpu: 0 Mon Jun 17 06:26:46 2019: MBs: 39708, CPUs: 304 Mon Jun 17 06:26:46 2019: Assigning resources from node 1 Mon Jun 17 06:26:46 2019: Advising pid 120072 (qemu-system-ppc) move from nodes (0,8) to nodes (0,8) Your crash was around: Thu Feb 21 00:12:10 2019: Assigning resources from node 5 Thu Feb 21 00:12:10 2019: Assigning resources from node 2 Thu Feb 21 00:12:10 2019: Process 88781 already 100 percent localized to target nodes. Mine seems to be as soon as it hits "Assigning resources" as well. This is something the daemon will do anyway, but obviously more often with actual memory load. So far all fits together, lets try to find what it accesses when failing.