Atomic operations not safe for ARMv7,Thumb-2 and multicore

Bug #490371 reported by Dave Martin
42
This bug affects 3 people
Affects Status Importance Assigned to Milestone
qt4-x11 (Ubuntu)
Fix Released
Medium
Jani Monoses
Natty
Fix Released
Medium
Jani Monoses

Bug Description

Problem noted in: qt4-x11_4.6.0~rc1-1ubuntu1 (lucid)

There are two problems here:

1) in configure, the build architecture baseline is detected using `uname -r`: this just gives "armel", which does not work for determining whether the baseline is armv6 or armv7 or whatever. There are some specific files for armv6, but these are not used because configure defines QT_ARCH_ARM instead of QT_ARCH_ARMV6.

It may make sense for the ubuntu build rules to pass an explicit -arch option to configure.

2) The "generic arm" atomics in src/arch/corelib/qatomic_arm.h use the swp and swpb instructions. This is incompatible with Thumb-2 and will not be multicore-safe, especially for armv7.

3) The armv6 atomics in src/arch/corelib/qatomic_armv6.h (not currently used because of the behaviour of configure; see (1)) use the new ldrex/strex instructions appropriately, but the required memory barriers for safe operation on multicore platforms are missing.

Possibly we should add a new src/arch/corelib/qatomic_armv7.h, using a copy of qatomic_armv6.h or qatomic_avr32.h (see below) as a starting point; -arch armv7 could be passed to configure by the build rules.

If at all possible, this new header should be ported to use the GCC atomic intrinsics: see
https://wiki.ubuntu.com/ARM/Thumb2 and http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf section 7.4 (Synchronization Primitives)

The qatomic_avr32.h header seems to be based on the intrinsics and may be useful as a template.
The qatomic_ia64.h header's calls to __memory_barrier() may indicate the appropriate places where an additional explicit barrier is needed.

If it is not possible to use the GCC intrinsics, a dmb instruction should be added before and after each ldrex...strex sequence where the API semantics of the atomic operations qt is trying to implement require it.

Searching for other possibly arm/armv6 specific files, if would be a good idea to review the following for implications.
See https://wiki.ubuntu.com/ARM/Thumb2 for details of things to check for.

Apart from the atomics, these look like optimisation opportunities and some relatively harmless minor things, but I haven't investigated in much detail yet.

  * include/Qt/qatomic_arm.h
  * include/QtCore/qatomic_arm.h
  * src/corelib/arch/arm/qatomic_arm.cpp
  * src/corelib/arch/qatomic_arm.h
  * include/Qt/qatomic_armv6.h
  * include/Qt/private/qdrawhelper_armv6_p.h
  * include/QtGui/private/qdrawhelper_armv6_p.h
  * include/QtCore/qatomic_armv6.h
  * src/corelib/arch/qatomic_armv6.h
  * src/corelib/arch/armv6/
  * src/gui/painting/qdrawhelper_armv6_rvct.s
  * src/gui/painting/qdrawhelper_armv6_p.h
  * src/gui/painting/qdrawhelper_armv6_rvct.inc
  * src/gui/painting/qblendfunctions_armv6_rvct.s

Revision history for this message
Paul Larson (pwlars) wrote :

Setting this at medium for now, but this might be worth revisiting when multicore chips become available if this proves to be causing big problems.

Changed in qt4-x11 (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Colin Watson (cjwatson) wrote :

We fixed the armv6 configuration part of this bug a little while ago. Changelog:

qt4-x11 (4:4.6.0-1ubuntu2) lucid; urgency=low

  * Make libqt4-dev depend on libx11-dev
  * In debian/rules Set DEB_HOST_ARCH and DEB_HOST_ARCH_OS. Configure with "-arch armv6" option on ARM

 -- Jonathan Riddell <email address hidden> Tue, 08 Dec 2009 13:30:39 +0000

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

The affected code still needs to be reviewed for SMP-safety, but this is not high priority for lucid.

Revision history for this message
Alexander Sack (asac) wrote :

from what i understand we need to make build system aware of v7 to backout the armv6 workaround.

Changed in qt4-x11 (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Michael Casadevall (mcasadevall)
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

This patch adds memory barriers to the existing armv6-based atomics implementation.

This is a quicker interim fix than porting the code to use the GCC atomics, but that could still be addressed later.

The change is conservative: a barrier is added both before and after the body of each atomic. This is suboptimal: Qt's generic atomics API (see src/corelib/thread/qatomic.cpp) defines 4 flavours of each atomic with differing barrier requirements, but they are not all implemented for ARM. The code is in need of a substantional factoring before this is attempted: it would raise the number of duplicates of each code snippet in the source from ~10 to ~40 rendering reliable maintenance all but impossible. I'm not sure whether the preprocessor is up to the job, but the amount of hard coding could be massively reduced by generating the affected files using a script.

The existing code implements the "Relaxed" (no barrier needed), "Acquire" (requires barrier after) and "Release" (requires barrier before) flavours in terms of the "Ordered" implementation (barriers required before and after).

The attached patch is required to fix the "Ordered" implementation so that it really does implement the Ordered semantics: without the barriers, this obtained behaviour is actually the "Relaxed" behaviour with respect to the memory system, which may lead to problems in SMP environments.

The proposed patch applies and builds, but I haven't been able to test it yet (huge lead time on building the qt4 packages...) I will post a follow-up when I have a full build of the affected packages.

Paul Larson (pwlars)
tags: added: patch
Changed in qt4-x11 (Ubuntu):
assignee: Michael Casadevall (mcasadevall) → nobody
Loïc Minier (lool)
tags: added: thumb
Matthias Klose (doko)
Changed in qt4-x11 (Ubuntu):
milestone: none → natty-alpha-1
Revision history for this message
Thiago Macieira (thiago-macieira) wrote : Re: [Bug 490371] [NEW] Atomic operations not safe for ARMv7, Thumb-2 and multicore

Em Quarta-feira, 17 de Novembro de 2010, às 16:18:59, ext Launchpad Bug
Tracker escreveu:
> 1) in configure, the build architecture baseline is detected using
> `uname -r`: this just gives "armel", which does not work for determining
> whether the baseline is armv6 or armv7 or whatever. There are some
> specific files for armv6, but these are not used because configure
> defines QT_ARCH_ARM instead of QT_ARCH_ARMV6.
>
> It may make sense for the ubuntu build rules to pass an explicit -arch
> option to configure.

Fixed in Qt 4.8. See patchset:

f293b98 Silence preprocessor warnings about __TARGET_ARCH_ARM not being
defined.
c7bf2d7 Fixed compile for symbian.
dc8d096 Compile when detecting ARMv5
2c73f55 Compile when detecting ARMv5
82b4fff Add support for ARMv7 atomic operations
7be2c58 Merge the armv6 and arm architectures
1c0b3237 Copy src/corelib/arch/qatomic_arm.h to
src/corelib/arch/qatomic_armv5.h
a0f69c0 Move symbian specific qatomic_generic_armv6.cpp

Qt now has one "ARM" architecture and determines which instructions to use
depending on the compiler arguments and #defines.

> 2) The "generic arm" atomics in src/arch/corelib/qatomic_arm.h use the
> swp and swpb instructions. This is incompatible with Thumb-2 and will
> not be multicore-safe, especially for armv7.

These instructions shouldn't be used at all. They were used for OABI only. In
EABI, code should be generated to kernel-provided atomics. Also, with the
patchset above, this shouldn't be used in v6 or v7 builds at all.

> 3) The armv6 atomics in src/arch/corelib/qatomic_armv6.h (not currently
> used because of the behaviour of configure; see (1)) use the new
> ldrex/strex instructions appropriately, but the required memory barriers
> for safe operation on multicore platforms are missing.

Fixed in Qt 4.8 already:
82b4fff Add support for ARMv7 atomic operations

> If at all possible, this new header should be ported to use the GCC atomic
> intrinsics: see https://wiki.ubuntu.com/ARM/Thumb2 and
> http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf section 7.4
> (Synchronization Primitives)

Aside from __sync_lock_release and __sync_lock_test_and_set, all barriers
provided by this psABI are full barriers (ordered, in Qt terms).

I don't know why Intel didn't define the full set, since that was created for
the IA-64, which is one of the few platforms where different barriers are
possible. Qt does implement the full set for both IA-64 and ARMv7, so we're
not going to switch to __sync_* until they support the full set.

> Searching for other possibly arm/armv6 specific files, if would be a good
> idea to review the following for implications. See
> https://wiki.ubuntu.com/ARM/Thumb2 for details of things to check for.

These files should be:

src/3rdparty/pixman/pixman-arm-neon-asm.S
src/3rdparty/pixman/pixman-arm-neon-asm.h

src/corelib/tools/qstring.cpp
src/gui/image/qimage_neon.cpp
src/gui/painting/qdrawhelper_neon.cpp
src/gui/painting/qdrawhelper_neon_asm.S
src/gui/painting/qdrawhelper_neon_p.h

--
Thiago Macieira - thiago.macieira (AT) nokia.com
  Senior Product Manager - Nokia, Qt Development Frameworks
     Sandakerveien 116, NO-0402 Oslo, Norway

Revision history for this message
Jammy Zhou (jammy-zhou) wrote :

Hi Thiago,

I tried to compile Qt4.8 on armv7 platform locally, but meet following new error message, do you have any suggestions?

make[1]: Entering directory `/home/jammy/qt/src/corelib'
g++ -c -include .pch/release-shared/QtCore -pipe -pthread -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -O2 -fvisibility=hidden -fvisibility-inlines-hidden -Wall -W -D_REENTRANT -fPIC -DQT_SHARED -DQT_BUILD_CORE_LIB -DQT_NO_USING_NAMESPACE -DQT_NO_CAST_TO_ASCII -DQT_ASCII_CAST_WARNINGS -DQT3_SUPPORT -DQT_MOC_COMPAT -DQT_USE_FAST_OPERATOR_PLUS -DQT_USE_FAST_CONCATENATION -DELF_INTERPRETER=\"/lib/ld-linux.so.3\" -DQLIBRARYINFO_EPOCROOT -DHB_EXPORT=Q_CORE_EXPORT -DQT_NO_DEBUG -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -I../../mkspecs/linux-g++ -I. -I../../include -I../../include/QtCore -I.rcc/release-shared -Iglobal -I../../tools/shared -I../3rdparty/harfbuzz/src -I../3rdparty/md5 -I../3rdparty/md4 -I.moc/release-shared -o .obj/release-shared/qabstractanimation.o animation/qabstractanimation.cpp
animation/qabstractanimation.cpp:491:6: warning: unused parameter ?@~Xe?@~Y
{standard input}: Assembler messages:
{standard input}:527: Error: thumb conditional instruction should be in IT block -- `strexeq r2,r5,[r6]'
{standard input}:528: Error: thumb conditional instruction should be in IT block -- `teqeq r2,#1'
make[1]: *** [.obj/release-shared/qabstractanimation.o] Error 1
make[1]: Leaving directory `/home/jammy/qt/src/corelib'
make: *** [sub-corelib-make_default-ordered] Error 2

Revision history for this message
Scott Kitterman (kitterman) wrote : Re: [Bug 490371] Re: Atomic operations not safe for ARMv7, Thumb-2 and multicore

There's a patch for that in the Ubuntu qt4-x11 package in Natty. I don't
believe it's upstream yet.

Revision history for this message
Jammy Zhou (jammy-zhou) wrote :

It seems that "-Wa,-mimplicit-it=thumb" should be added to CXXFLAGS for armv7

Revision history for this message
Scott Kitterman (kitterman) wrote :

On Wednesday, December 08, 2010 01:57:49 am you wrote:
> It seems that "-Wa,-mimplicit-it=thumb" should be added to CXXFLAGS for
> armv7

No. We've a patch to add the explicit IT instructions so this isn't needed.

Revision history for this message
Thiago Macieira (thiago-kde) wrote :

Please submit the patch upstream: http://bugreports.qt.nokia.com, product Qt, component Threading.

Revision history for this message
Jammy Zhou (jammy-zhou) wrote :

Scott,
Could you please also post the patch here?

Revision history for this message
Scott Kitterman (kitterman) wrote :

Provide Thumb2 support on armel - See LP Bug #673085 for details
 Index: qt-everywhere-opensource-src-4.7.1/src/corelib/arch/qatomic_armv6.h
 ===================================================================
 --- qt-everywhere-opensource-src-4.7.1.orig/src/corelib/arch/qatomic_armv6.h
2010-11-06 01:55:18.000000000 +0000
+++ qt-everywhere-opensource-src-4.7.1/src/corelib/arch/qatomic_armv6.h
2010-11-16 17:58:27.831286420 +0000
@@ -144,6 +144,7 @@
     asm volatile("0:\n"
                  "ldrex %[result], [%[_q_value]]\n"
                  "eors %[result], %[result], %[expectedValue]\n"
+ "itt eq\n"
                  "strexeq %[result], %[newValue], [%[_q_value]]\n"
                  "teqeq %[result], #1\n"
                  "beq 0b\n"
@@ -202,6 +203,7 @@
     asm volatile("0:\n"
                  "ldrex %[result], [%[_q_value]]\n"
                  "eors %[result], %[result], %[expectedValue]\n"
+ "itt eq\n"
                  "strexeq %[result], %[newValue], [%[_q_value]]\n"
                  "teqeq %[result], #1\n"
                  "beq 0b\n"

I'm neither the author nor the uploader of the patch, so I'm not sure who is
sending this upstream, but I agree someone should.

Changed in qt4-x11 (Ubuntu Natty):
milestone: natty-alpha-1 → natty-alpha-2
Revision history for this message
Jani Monoses (jani) wrote :

Filed in the Qt issue tracker, but since LP does not recognize it here's the link
http://bugreports.qt.nokia.com/browse/QTBUG-16402

Martin Pitt (pitti)
Changed in qt4-x11 (Ubuntu Natty):
milestone: natty-alpha-2 → natty-alpha-3
Steve Langasek (vorlon)
tags: added: arm-porting-queue
Martin Pitt (pitti)
Changed in qt4-x11 (Ubuntu Natty):
milestone: natty-alpha-3 → ubuntu-11.04-beta-1
assignee: nobody → Canonical ARM (canonical-arm)
Jani Monoses (jani)
Changed in qt4-x11 (Ubuntu Natty):
assignee: Canonical ARM (canonical-arm) → Jani Monoses (jani)
Revision history for this message
Jani Monoses (jani) wrote :

Backported memory barrier changes from Qt master branch.

Changed in qt4-x11 (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

Alternately, one could use the GCC intrinsics. They are more conservative about memory barriers, which I believe to be more correct in any case (it is not safe to let the compiler or the instruction scheduler move memory accesses into the ldrex/strex critical region). Other than memory barrier differences, the intrinsic-based implementation should be equally fast. (Although there is no equivalent of fetchAndStore, the only actual use cases store 0, and I've special-cased that in this implementation using __sync_fetch_and_and().)

Revision history for this message
Scott Kitterman (kitterman) wrote : Re: [Bug 490371] Re: Atomic operations not safe for ARMv7, Thumb-2 and multicore

Please see https://bugs.launchpad.net/ubuntu/+source/qt4-x11/+bug/490371/comments/6 - I 'm not sure if it relevant or not, but it sounds like upstream has some concerns.

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

I believe that upstream is mistaken. It is not safe to allow other memory accesses to be speculated into the ldrex/strex region, and (at least on Cortex A9/A15) the only way to prevent that is to issue full memory barriers before and after. Arguably the compiler should be allowed to move memory accesses across the whole block, but that seems like a small optimization (compared to the memory barrier overhead), and I have seen no benchmark data to support it. I don't know whether GCC's later optimization passes can be relied upon not to move opcodes into the critical region if the "memory" attribute on the assembly block were omitted.

See also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48126, which suggests a fix to GCC's compare-and-swap intrinsics to prevent them from skipping the trailing memory barrier.

Revision history for this message
Scott Kitterman (kitterman) wrote :

If upstream is mistaken, please educate them, but we should get agreement in principle before we commit to the change.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

The upstream commit looks at least semi-sane -- it looks like the memory barrier stuff is merged in lp:ubuntu/qt4-x11, is my understanding correct?

I also generally agree with upstream's view that they don't want to port to the GCC primitives: since Qt already has a pretty decent atomics API, and since the GCC atomics are crude by comparison, it probably doesn't make sense to re-port Qt to the GCC atomics for the sake of it, providing the code works.

I'm uncertain whether some of the "Ordered" primitives really are ordered though:

Compare:

src/corelib/arch/qatomic_armv6.h:429:
inline bool QBasicAtomicInt::testAndSetOrdered(int expectedValue, int newValue)
{
    Q_DATA_MEMORY_BARRIER;
    bool returnValue = testAndSetRelaxed(expectedValue, newValue);
    Q_COMPILER_MEMORY_BARRIER;
    return returnValue;
}

src/corelib/arch/qatomic_armv6.h:495:
template <typename T>
Q_INLINE_TEMPLATE bool QBasicAtomicPointer<T>::testAndSetOrdered(T *expectedValue, T *newValue)
{
    Q_DATA_MEMORY_BARRIER;
    bool returnValue = testAndSetAcquire(expectedValue, newValue);
    Q_COMPILER_MEMORY_BARRIER;
    return returnValue;
}

In QBasicAtomicInt::testAndSetOrdered(int,int), I can't see why we don't need Q_DATA_MEMORY_BARRIER after calling testAndSetRelaxed(); or otherwise why we don't need to call testAndSetAcquire() to get that barrier implicitly (as is done for QBasicAtomicPointer<T>::testAndSetOrdered(T *,T *)).

Revision history for this message
Jani Monoses (jani) wrote :

The patch in Ubuntu 11.04 is backported from upstream Qt4.8. If upstream is mistaken, one of the persons who actually have a clue about the fine details of ARMv7 memory barriers (not me) should educate upstream as was suggested in a previous comment.
The Ubuntu bug tracker may be watched but unless the discussion happens upstream there is no guarantee a consensus will be reached.

Revision history for this message
Michael K. Edwards (m-k-edwards) wrote :

It seems that the ordered primitives do need fixing, as dmart suggests; for confirmation see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48126 . I would also think that the ref()/deref() methods should have ordered semantics, since they function like locks (a caller which sees deref() return 0 is entitled to tear down the associated instance).

My personal preference is to use the GCC intrinsics, since they're more likely to get full review, and I'm skeptical of the performance benefits of the "relaxed" primitives in practice (on any system that actually has atomics and doesn't need kernel help). Particularly in the ordered cases, I see no reason whatsoever to embed inline assembly. But I suppose it is not wrong for Qt's "relaxed" primitives to contain equivalent assembly, without the memory barriers -- as long as the compiler won't insert memory operations in between the ldrex/strex pair. (The ARM folks' reading of the ARM ARM is that hardware speculation of memory accesses into the critical region should not break the local monitor.)

Another advantage of using the GCC intrinsics is that they will work for 64-bit operands once this support is in GCC, without embedding yet more complex assembly into Qt. David Gilbert has implemented 64-bit atomics for ARMv7, and I'm helping test; I intend to implement them also for 32-bit x86 (using cmpxchg8b).

Would you entertain a patch that makes the use of inline assembly for the non-ordered primitives optional at compile time? (Via a template specialization, so it still works for double-word operands if the compiler can handle them.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.