While this numbering is pretty common at power (all non SMT systems) and s390x (scaling #cpus on load) it is uncommon on x86. Never the less in theory the issue should exist there as well
But I tried this for an hour and it didn't trigger (plenty of assigns happened)
Repro (x86)
1. Get a KVM guest with numa memory nodes
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static'>4</vcpu>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
<cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
</numa>
</cpu>
2. disable some cpus in the mid
$ echo 0 | sudo tee /sys/bus/cpu/devices/cpu1/online
$ echo 0 | sudo tee /sys/bus/cpu/devices/cpu2/online
$ lscpu
CPU(s): 4
On-line CPU(s) list: 0,3
Off-line CPU(s) list: 1,2
3. install, start and follow the log of numad
$ sudo apt install numad
$ sudo systemctl start numad
$ journalctl -f -u numad
4. run some memory load that will make numad assign processes
$ sudo apt install stress-ng
$ stress-ng --vm 2 --vm-bytes 90% -t 5m
If we follow the log of numad with verbose enabled we will after a while see numa assignments like:
Mon Jun 17 10:32:05 2019: Advising pid 3416 (stress-ng-vm) move from nodes (0-1) to nodes (0)
Mon Jun 17 10:32:23 2019: Advising pid 3417 (stress-ng-vm) move from nodes (0-1) to nodes (1)
Maybe on ppc also the numa node numbering is non linear, I remember working on fixes for numactl in that regard - and maybe that is important as well.
While this numbering is pretty common at power (all non SMT systems) and s390x (scaling #cpus on load) it is uncommon on x86. Never the less in theory the issue should exist there as well
But I tried this for an hour and it didn't trigger (plenty of assigns happened)
Repro (x86) >4194304< /memory> >4194304< /currentMemory> 'static' >4</vcpu> cpu/devices/ cpu1/online cpu/devices/ cpu2/online
1. Get a KVM guest with numa memory nodes
<memory unit='KiB'
<currentMemory unit='KiB'
<vcpu placement=
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
<cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
</numa>
</cpu>
2. disable some cpus in the mid
$ echo 0 | sudo tee /sys/bus/
$ echo 0 | sudo tee /sys/bus/
$ lscpu
CPU(s): 4
On-line CPU(s) list: 0,3
Off-line CPU(s) list: 1,2
3. install, start and follow the log of numad
$ sudo apt install numad
$ sudo systemctl start numad
$ journalctl -f -u numad
4. run some memory load that will make numad assign processes
$ sudo apt install stress-ng
$ stress-ng --vm 2 --vm-bytes 90% -t 5m
If we follow the log of numad with verbose enabled we will after a while see numa assignments like:
Mon Jun 17 10:32:05 2019: Advising pid 3416 (stress-ng-vm) move from nodes (0-1) to nodes (0)
Mon Jun 17 10:32:23 2019: Advising pid 3417 (stress-ng-vm) move from nodes (0-1) to nodes (1)
Maybe on ppc also the numa node numbering is non linear, I remember working on fixes for numactl in that regard - and maybe that is important as well.