smp_call_function_many
csd_lock_wait
smp_cond_load_acquire
cpu_relax (called in loop from smp_cond_load_acquire)
rep_nop
asm volatile("rep; nop" ::: "memory") <-- smp_call_function_many+0x248
smp_call_function_many
csd_lock_wait
smp_cond_load_acquire
READ_ONCE(x) (called in loop from smp_cond_load_acquire)
__READ_ONCE(x, 1)
__read_once_size
__READ_ONCE_SIZE <-- smp_call_function_many+0x24a
Both are within the tight loop in smp_cond_load_acquire waiting on the per-cpu csd locks
Basically, smp_call_function_many() executes a function on each cpu via IPI.
When wait=true, it runs synchronously, with the cpu that runs smp_call_function_many() waiting for each of the other cpus to report they have run the function as indicated by releasing their per-cpu csd lock.
The CSD_FLAG_SYNCHRONOUS flag determines the order of the func and unlock in flush_smp_call_function_queue()
This leads me to believe the issue here deals with IPIs being dropped and thus the CPU calling smp_call_function_many() deadlocks waiting on a cpu that has dropped the IPI and therefore will not unlock the csd lock.
Doing some code investigation
http:// elixir. free-electrons. com/linux/ v4.13.12/ source/ kernel/ smp.c#L401
The two RIP addresses in my 3 dmesg logs go here
smp_call_ function_ many load_acquire load_acquire) function_ many+0x248
csd_lock_wait
smp_cond_
cpu_relax (called in loop from smp_cond_
rep_nop
asm volatile("rep; nop" ::: "memory") <-- smp_call_
smp_call_ function_ many load_acquire load_acquire) function_ many+0x24a
csd_lock_wait
smp_cond_
READ_ONCE(x) (called in loop from smp_cond_
__READ_ONCE(x, 1)
__read_once_size
__READ_ONCE_SIZE <-- smp_call_
Both are within the tight loop in smp_cond_ load_acquire waiting on the per-cpu csd locks
Basically, smp_call_ function_ many() executes a function on each cpu via IPI. function_ many() waiting for each of the other cpus to report they have run the function as indicated by releasing their per-cpu csd lock.
When wait=true, it runs synchronously, with the cpu that runs smp_call_
The CSD_FLAG_ SYNCHRONOUS flag determines the order of the func and unlock in flush_smp_ call_function_ queue()
http:// elixir. free-electrons. com/linux/ v4.13.11/ source/ kernel/ smp.c#L242
This leads me to believe the issue here deals with IPIs being dropped and thus the CPU calling smp_call_ function_ many() deadlocks waiting on a cpu that has dropped the IPI and therefore will not unlock the csd lock.