Ubuntu
linux package

Bug #1036581
Comment #3

Comment 3 for bug 1036581

Revision history for this message

Ike Panhc (ikepanhc) wrote on 2012-08-24:

On testing precise-30.47 on highbank machine, we found that tasks hang or slower much then it was. The root cause is one patch from 3.2.27 stable update

commit 43d4dac961083f374f13cacc48cdb49767d922dd
Author: Will Deacon <email address hidden>
Date: Fri Jul 13 19:15:40 2012 +0100

ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+

BugLink: http://bugs.launchpad.net/bugs/1035435

commit a76d7bd96d65fa5119adba97e1b58d95f2e78829 upstream.

    The open-coded mutex implementation for ARMv6+ cores suffers from a
    severe lack of barriers, so in the uncontended case we don't actually
    protect any accesses performed during the critical section.

    Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
    code but optimised to remove a branch instruction, as the mutex fastpath
    was previously inlined. Now that this is executed out-of-line, we can
    reuse the atomic access code for the locking (in fact, we use the xchg
    code as this produces shorter critical sections).

    This patch uses the generic xchg based implementation for mutexes on
    ARMv6+, which introduces barriers to the lock/unlock operations and also
    has the benefit of removing a fair amount of inline assembly code.

And Will Deacon also have a fix for this regression

commit 0bce9c46bf3b15f485d82d7e81dabed6ebcc24b1
Author: Will Deacon <email address hidden>
Date: Fri Aug 10 15:22:09 2012 +0100

mutex: Place lock in contended state after fastpath_lock failure

    ARM recently moved to asm-generic/mutex-xchg.h for its mutex
    implementation after the previous implementation was found to be missing
    some crucial memory barriers. However, this has revealed some problems
    running hackbench on SMP platforms due to the way in which the
    MUTEX_SPIN_ON_OWNER code operates.

    The symptoms are that a bunch of hackbench tasks are left waiting on an
    unlocked mutex and therefore never get woken up to claim it. This boils
    down to the following sequence of events:

            Task A Task B Task C Lock value
    0 1
    1 lock() 0
    2 lock() 0
    3 spin(A) 0
    4 unlock() 1
    5 lock() 0
    6 cmpxchg(1,0) 0
    7 contended() -1
    8 lock() 0
    9 spin(C) 0
    10 unlock() 1
    11 cmpxchg(1,0) 0
    12 unlock() 1

At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.

    This patch fixes the problem by ensuring we put the lock into the
    contended state if we fail to acquire it on the fastpath, ensuring that
    any blocked waiters are woken up when the mutex is released.

which already in mainline kernel and CC'ed stable.

Either to revert the first patch or to apply the second patch can solve the regression.

On testing precise-30.47 on highbank machine, we found that tasks hang or slower much then it was. The root cause is one patch from 3.2.27 stable update

commit 43d4dac961083f374f13cacc48cdb49767d922dd
Author: Will Deacon <will.deacon@arm.com>
Date:   Fri Jul 13 19:15:40 2012 +0100

ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
    
    BugLink: http://bugs.launchpad.net/bugs/1035435
    
    commit a76d7bd96d65fa5119adba97e1b58d95f2e78829 upstream.
    
    The open-coded mutex implementation for ARMv6+ cores suffers from a
    severe lack of barriers, so in the uncontended case we don't actually
    protect any accesses performed during the critical section.
    
    Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
    code but optimised to remove a branch instruction, as the mutex fastpath
    was previously inlined. Now that this is executed out-of-line, we can
    reuse the atomic access code for the locking (in fact, we use the xchg
    code as this produces shorter critical sections).
    
    This patch uses the generic xchg based implementation for mutexes on
    ARMv6+, which introduces barriers to the lock/unlock operations and also
    has the benefit of removing a fair amount of inline assembly code.

And Will Deacon also have a fix for this regression

commit 0bce9c46bf3b15f485d82d7e81dabed6ebcc24b1
Author: Will Deacon <will.deacon@arm.com>
Date:   Fri Aug 10 15:22:09 2012 +0100

mutex: Place lock in contended state after fastpath_lock failure
    
    ARM recently moved to asm-generic/mutex-xchg.h for its mutex
    implementation after the previous implementation was found to be missing
    some crucial memory barriers. However, this has revealed some problems
    running hackbench on SMP platforms due to the way in which the
    MUTEX_SPIN_ON_OWNER code operates.
    
    The symptoms are that a bunch of hackbench tasks are left waiting on an
    unlocked mutex and therefore never get woken up to claim it. This boils
    down to the following sequence of events:
    
            Task A        Task B        Task C        Lock value
    0                                                     1
    1       lock()                                        0
    2                     lock()                          0
    3                     spin(A)                         0
    4       unlock()                                      1
    5                                   lock()            0
    6                     cmpxchg(1,0)                    0
    7                     contended()                    -1
    8       lock()                                        0
    9       spin(C)                                       0
    10                                  unlock()          1
    11      cmpxchg(1,0)                                  0
    12      unlock()                                      1
    
    At this point, the lock is unlocked, but Task B is in an uninterruptible
    sleep with nobody to wake it up.
    
    This patch fixes the problem by ensuring we put the lock into the
    contended state if we fail to acquire it on the fastpath, ensuring that
    any blocked waiters are woken up when the mutex is released.

which already in mainline kernel and CC'ed stable.

Either to revert the first patch or to apply the second patch can solve the regression.

Ubuntulinux package

Comment 3 for bug 1036581

Ubuntu
linux package