For the last little while I've been working on this special branch [1] (test.2013.01.30a), prepared specifically for this problem by Paul M. Instead of dealing with 5 different trace (see comment #6), this special branch yields only one trace [2] and once again, the behavior goes away if RCU_FAST_NO_HZ is switched off. Fixing the problem on this branch will very likely fix the problem on mainline as well.
From the trace in [2] and quick glace at the code, one would be lead to beleive the running thread has been scheduled on the wrong CPU. But that would be a false deduction as reading the mpidr register yields the same processor as expected by td->cpu.
The problem is with 'smp_processor_id()', which really is a pointer to the current 'thread_info'.
'smpboot_thread_fn' is called by 'kthread' where it is disguised under 'threadfn()'. Just before calling 'threadfn' the current 'thread_info' holds the correct '->cpu' value but for some reason that value has changed when dereferenced from 'smp_processor_id()'.
A closer inspection of the situation reveals that the address of the 'thread_info' in 'kthread' is different than the one in 'smpboot_thread_fn'.
Since 'kthread' simply calls 'threadfn()' one would expect the address of the current 'thread_info' to be the same from both 'kthread()' and 'smpboot_thread_fn()'.
For the last little while I've been working on this special branch [1] (test.2013.01.30a), prepared specifically for this problem by Paul M. Instead of dealing with 5 different trace (see comment #6), this special branch yields only one trace [2] and once again, the behavior goes away if RCU_FAST_NO_HZ is switched off. Fixing the problem on this branch will very likely fix the problem on mainline as well.
From the trace in [2] and quick glace at the code, one would be lead to beleive the running thread has been scheduled on the wrong CPU. But that would be a false deduction as reading the mpidr register yields the same processor as expected by td->cpu.
The problem is with 'smp_processor_ id()', which really is a pointer to the current 'thread_info'.
'smpboot_thread_fn' is called by 'kthread' where it is disguised under 'threadfn()'. Just before calling 'threadfn' the current 'thread_info' holds the correct '->cpu' value but for some reason that value has changed when dereferenced from 'smp_processor_ id()'.
A closer inspection of the situation reveals that the address of the 'thread_info' in 'kthread' is different than the one in 'smpboot_ thread_ fn'.
Since 'kthread' simply calls 'threadfn()' one would expect the address of the current 'thread_info' to be the same from both 'kthread()' and 'smpboot_ thread_ fn()'.
The investitation continues.
[1]. git.kernel. org/pub/ scm/linux/ kernel/ git/paulmck/ linux-rcu. git /pastebin. linaro. org/1648/
[2]. https:/