Comment 156 for bug 1690085

Revision history for this message
In , spartacus06 (spartacus06-linux-kernel-bugs) wrote :

Doing some code investigation

http://elixir.free-electrons.com/linux/v4.13.12/source/kernel/smp.c#L401

The two RIP addresses in my 3 dmesg logs go here

smp_call_function_many
csd_lock_wait
smp_cond_load_acquire
cpu_relax (called in loop from smp_cond_load_acquire)
rep_nop
asm volatile("rep; nop" ::: "memory") <-- smp_call_function_many+0x248

smp_call_function_many
csd_lock_wait
smp_cond_load_acquire
READ_ONCE(x) (called in loop from smp_cond_load_acquire)
__READ_ONCE(x, 1)
__read_once_size
__READ_ONCE_SIZE <-- smp_call_function_many+0x24a

Both are within the tight loop in smp_cond_load_acquire waiting on the per-cpu csd locks

Basically, smp_call_function_many() executes a function on each cpu via IPI.
When wait=true, it runs synchronously, with the cpu that runs smp_call_function_many() waiting for each of the other cpus to report they have run the function as indicated by releasing their per-cpu csd lock.

The CSD_FLAG_SYNCHRONOUS flag determines the order of the func and unlock in flush_smp_call_function_queue()

http://elixir.free-electrons.com/linux/v4.13.11/source/kernel/smp.c#L242

This leads me to believe the issue here deals with IPIs being dropped and thus the CPU calling smp_call_function_many() deadlocks waiting on a cpu that has dropped the IPI and therefore will not unlock the csd lock.