About this latest comment. So, CPU #0 has crashed at pc = c008000013566eb8, its ctr and r12 match, same as usual, it was called by __bpf_prog_run_save_cb as the BPF JITed program. Dumping the program from CPU #0 perspective, it has traps at that address.
It turns out the JIT fills up a whole page with traps and puts the JITed BPF program on a random offset of that page (look at kernel/bpf/core.c:bpf_jit_binary_alloc).
When we go to the hotplugged CPU, however, CPU #9f (159), that same page looks different, with the code placed where it was expected.
Still, it looks like fp->aux->jit_data is NULL on both CPUs, which is not as expected.
I am wondering if either the icache is not being flushed properly, or RCU is not operating correctly. As other issues are not seen, more likely something related to the icache. But I don't see any IPIs involved when flushing the icache, so possibly firmware or micro-architecture related?
About this latest comment. So, CPU #0 has crashed at pc = c008000013566eb8, its ctr and r12 match, same as usual, it was called by __bpf_prog_ run_save_ cb as the BPF JITed program. Dumping the program from CPU #0 perspective, it has traps at that address.
It turns out the JIT fills up a whole page with traps and puts the JITed BPF program on a random offset of that page (look at kernel/ bpf/core. c:bpf_jit_ binary_ alloc).
When we go to the hotplugged CPU, however, CPU #9f (159), that same page looks different, with the code placed where it was expected.
Still, it looks like fp->aux->jit_data is NULL on both CPUs, which is not as expected.
I am wondering if either the icache is not being flushed properly, or RCU is not operating correctly. As other issues are not seen, more likely something related to the icache. But I don't see any IPIs involved when flushing the icache, so possibly firmware or micro-architecture related?
Cascardo.