On testing precise-30.47 on highbank machine, we found that tasks hang or slower much then it was. The root cause is one patch from 3.2.27 stable update
The open-coded mutex implementation for ARMv6+ cores suffers from a
severe lack of barriers, so in the uncontended case we don't actually
protect any accesses performed during the critical section.
Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
code but optimised to remove a branch instruction, as the mutex fastpath
was previously inlined. Now that this is executed out-of-line, we can
reuse the atomic access code for the locking (in fact, we use the xchg
code as this produces shorter critical sections).
This patch uses the generic xchg based implementation for mutexes on
ARMv6+, which introduces barriers to the lock/unlock operations and also
has the benefit of removing a fair amount of inline assembly code.
And Will Deacon also have a fix for this regression
commit 0bce9c46bf3b15f485d82d7e81dabed6ebcc24b1
Author: Will Deacon <email address hidden>
Date: Fri Aug 10 15:22:09 2012 +0100
mutex: Place lock in contended state after fastpath_lock failure
ARM recently moved to asm-generic/mutex-xchg.h for its mutex
implementation after the previous implementation was found to be missing
some crucial memory barriers. However, this has revealed some problems
running hackbench on SMP platforms due to the way in which the
MUTEX_SPIN_ON_OWNER code operates.
The symptoms are that a bunch of hackbench tasks are left waiting on an
unlocked mutex and therefore never get woken up to claim it. This boils
down to the following sequence of events:
At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.
This patch fixes the problem by ensuring we put the lock into the
contended state if we fail to acquire it on the fastpath, ensuring that
any blocked waiters are woken up when the mutex is released.
which already in mainline kernel and CC'ed stable.
Either to revert the first patch or to apply the second patch can solve the regression.
On testing precise-30.47 on highbank machine, we found that tasks hang or slower much then it was. The root cause is one patch from 3.2.27 stable update
commit 43d4dac961083f3 74f13cacc48cdb4 9767d922dd
Author: Will Deacon <email address hidden>
Date: Fri Jul 13 19:15:40 2012 +0100
ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
BugLink: http:// bugs.launchpad. net/bugs/ 1035435
commit a76d7bd96d65fa5 119adba97e1b58d 95f2e78829 upstream.
The open-coded mutex implementation for ARMv6+ cores suffers from a
severe lack of barriers, so in the uncontended case we don't actually
protect any accesses performed during the critical section.
Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
code but optimised to remove a branch instruction, as the mutex fastpath
was previously inlined. Now that this is executed out-of-line, we can
reuse the atomic access code for the locking (in fact, we use the xchg
code as this produces shorter critical sections).
This patch uses the generic xchg based implementation for mutexes on
ARMv6+, which introduces barriers to the lock/unlock operations and also
has the benefit of removing a fair amount of inline assembly code.
And Will Deacon also have a fix for this regression
commit 0bce9c46bf3b15f 485d82d7e81dabe d6ebcc24b1
Author: Will Deacon <email address hidden>
Date: Fri Aug 10 15:22:09 2012 +0100
mutex: Place lock in contended state after fastpath_lock failure
ARM recently moved to asm-generic/ mutex-xchg. h for its mutex SPIN_ON_ OWNER code operates.
implementation after the previous implementation was found to be missing
some crucial memory barriers. However, this has revealed some problems
running hackbench on SMP platforms due to the way in which the
MUTEX_
The symptoms are that a bunch of hackbench tasks are left waiting on an
unlocked mutex and therefore never get woken up to claim it. This boils
down to the following sequence of events:
Task A Task B Task C Lock value
0 1
1 lock() 0
2 lock() 0
3 spin(A) 0
4 unlock() 1
5 lock() 0
6 cmpxchg(1,0) 0
7 contended() -1
8 lock() 0
9 spin(C) 0
10 unlock() 1
11 cmpxchg(1,0) 0
12 unlock() 1
At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.
This patch fixes the problem by ensuring we put the lock into the
contended state if we fail to acquire it on the fastpath, ensuring that
any blocked waiters are woken up when the mutex is released.
which already in mainline kernel and CC'ed stable.
Either to revert the first patch or to apply the second patch can solve the regression.