mono apps crash on omap4 due to no smp support for armel

Bug #619981 reported by Tobin Davis on 2010-08-18
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mono (Ubuntu)
High
Michael Casadevall
Nominated for Oneiric by Michael Casadevall
Maverick
High
Unassigned
Natty
High
Michael Casadevall

Bug Description

Binary package hint: banshee

Image: Alpha 3 + current banshee from universe.

Launched banshee, then accessed gnome filemanager to copy music files from external usb drive to ~/Music.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: banshee 1.7.3-2ubuntu2
Uname: Linux 2.6.34-902-omap4 armv7l
Architecture: armel
Date: Wed Aug 18 17:55:55 2010
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_GB.utf8
SourcePackage: banshee

Related branches

Tobin Davis (gruemaster) wrote :
Tobin Davis (gruemaster) wrote :

Added log from "banshee --debug" output.

Oliver Grawert (ogra) on 2010-08-20
Changed in banshee (Ubuntu Maverick):
milestone: none → ubuntu-10.10-beta
Micheal Harker (mh0) wrote :

Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in banshee (Ubuntu Maverick):
status: New → Confirmed
C de-Avillez (hggdh2) wrote :

Hi Tobin,

I cannot reproduce your issue. But I see, in the debug log, this:

[1 Warn 18:32:38.518] Service `Banshee.Gui.InterfaceActionService' not started: Extension node not found in path: /Banshee/PlaybackController/ShuffleModes
[1 Warn 18:32:38.519] Caught an exception - System.InvalidOperationException: Extension node not found in path: /Banshee/PlaybackController/ShuffleModes (in `Mono.Addins')

Can you check this addin, and try again?

Changed in banshee (Ubuntu Maverick):
status: Confirmed → Incomplete
Alexander Sack (asac) wrote :

This is a bug that might only happen on ARMEL. C de-Avillez, are you trying to reproduce on ARMEL?

Alexander Sack (asac) wrote :

assigned linaro foundations team; loic said for mono process would be that foundations would have to come up with reduced testcase and then hand over for bug fixing to toolchain WG that would then fix mono jit (if needed).

Changed in banshee (Ubuntu Maverick):
assignee: nobody → Linaro Foundations (arm-foundations)
importance: Undecided → High
assignee: Linaro Foundations (arm-foundations) → nobody
assignee: nobody → Linaro Foundations (arm-foundations)
Oliver Grawert (ogra) on 2010-08-27
Changed in banshee (Ubuntu Maverick):
milestone: ubuntu-10.10-beta → ubuntu-10.10
Tobin Davis (gruemaster) wrote :

Changing back to confirmed, as this appears to affect armel only. I am unable to reproduce this on x86 as well.

Changed in banshee (Ubuntu Maverick):
status: Incomplete → Confirmed
Oliver Grawert (ogra) on 2010-09-17
Changed in banshee (Ubuntu Maverick):
milestone: ubuntu-10.10 → none
Gabriel Burt (gabaug) wrote :

Tobin, that log doesn't show the crash. If you got the log from ~/.config/banshee-1/log, note that it is overwritten every time you launch Banshee. The InterfaceActionService not starting is a huge issue -- there's no chance the app will work with that failing. Can you try running banshee-1 again with --debug and --debug-addins ? Thanks.

Been looking into this, and discussed with Linaro. Reassigning from Linaro to myself.

This seems to be a general issue with mono on ARM (or at least Thumb2) mode. I've been running through the entire battery of tests for mono and we're getting failures so I suspect its just a matter of correcting these failures as they are in similar areas where I'm seeing crashes (specifically PInvoke).

From the mono test suite:
Testing pinvoke2.exe... failed 256 (1) signal (0).
Testing pinvoke3.exe... failed 256 (1) signal (0).
Testing pinvoke11.exe... failed 768 (3) signal (0).

Will continue to look more indepth.

affects: banshee (Ubuntu) → mono (Ubuntu)
Changed in mono (Ubuntu):
assignee: Linaro Foundations (linaro-foundations) → Michael Casadevall (mcasadevall)
Changed in mono (Ubuntu Maverick):
assignee: Linaro Foundations (linaro-foundations) → Michael Casadevall (mcasadevall)
Alexander Sack (asac) on 2011-01-03
Changed in mono (Ubuntu Natty):
milestone: none → ubuntu-11.04-beta
tags: added: omap4
Jani Monoses (jani) wrote :

I find it is not crashing with banshee installed on an ubuntu-minimal image.
Once I install gtk2-engines-murrine and rerun it crashes.
So may be a theming related issue.

Jani Monoses (jani) wrote :

although there may be more than one bug here. I can only reproduce a crash on startup, this is what the arm team has seen recently. So no opportunity for banshee to get to stay idle for a while, as it does not start.

On Fri, 2011-01-14 at 19:59 +0000, Jani Monoses wrote:
> I find it is not crashing with banshee installed on an ubuntu-minimal image.
> Once I install gtk2-engines-murrine and rerun it crashes.
> So may be a theming related issue.

I can believe this - IIRC, not all of the GTK# examples work correctly
on ARM (apt:gtk-sharp2-examples). It would also explain why I haven't
reproduced the issue - my ARM box runs XFCE4, complete with an XFCE4
theme which uses the gtk2-engines-xfce theme engine

Jani, can you please attach a log that shows the crash?

`banshee-1 --debug > log.txt` should suffice.

Gabriel Burt (gabaug) wrote :

Jani, actually you should file a new bug, since this issue is about Banshee crashing while sitting idle.

Speaking of this bug, we still don't have a full log showing the problem. Anybody able to reproduce it still?

Gabriel Burt (gabaug) wrote :

Jani, looks like your bug is already filed: bug #391588 -- unless your crash stack trace is different.

Steve Langasek (vorlon) on 2011-02-15
tags: added: arm-porting-queue
Ricardo Salveti (rsalveti) wrote :

I tested today and was able to reproduce this issue, but is more related with bug 391588 than I thought. Most of the time when testing on Panda (with 2 CPUs) I was unable to even start it, crashing in many different ways. After finally able to open it without crashing, it crashed when I added music to ~/Music, like described at the attached log.

For me it seems that Banshee and/or Mono is not thread safe in this case.

Tobin Davis (gruemaster) on 2011-02-25
summary: - Banshee crashed while sitting idle on omap4
+ mono apps crash on omap4 due to no smp support for armel

This has been fixed in git for mono 2.10 with implementation of OP_MEMORY_BLOCK in the ARM JIT. With this patch, most of the mono test suite passes with only two regressions, but due to issues with the Ubuntu toolchain, mono upstream is requesting that we test against Debian sid to disprove the Ubuntu toolchain as the source of these regressions. Since Ubuntu is shipping 2.6, as are all the support libraries, there is no sane way to test banshee against this version of mono to determine if the regression has been truly passed, but the massive improvement on test suite results is encouraging.

Currently, the in-archive mono is still affected by https://bugs.launchpad.net/ubuntu/+source/gcc-4.5/+bug/721531 - I have not been able to isolate the workaround committed to our version of mono.

Jani Monoses (jani) wrote :

Would building mono 2.6 on Ubuntu with gcc 4.4 and testing that with banshee bring some new light to the matter?

Tobin Davis (gruemaster) wrote :
Download full text (8.1 KiB)

Testing Michael's ppa with the mono smp fix backported to 2.6, f-spot & banshee still fail at roughly the same point, but the failure output is different now. See below:

f-spot:
ubuntu@panda21:~$ f-spot
[Info 17:52:15.666] Initializing Mono.Addins
Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0xe00083 (F-Spot)
Window manager warning: meta_window_activate called by a pager with a 0 timestamp; the pager needs to be fixed.

(f-spot:10958): Gtk-CRITICAL **: IA__gtk_style_detach: assertion `style->attach_count > 0' failed

(f-spot:10958): GdkPixbuf-WARNING **: GdkPixbufLoader finalized without calling gdk_pixbuf_loader_close() - this is not allowed. You must explicitly end the data stream to the loader before dropping the last reference.

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-spot:10958): GLib-GObject-CRITICAL **: g_value_get_float: assertion `G_VALUE_HOLDS_FLOAT (value)' failed

(f-s...

Read more...

Download full text (5.3 KiB)

Update on this bug:

After much debugging work with Jani and others on the ARM team, it looks like we're dealing with a race condition somewhere in the ARM JIT code (likely a missing mutex). Mono upstream has not been able to reproduce nor have I been able to successfully isolate the bug. I do however have a workaround patch which forces mono on ARM to run on a single processor which avoids the race condition exposed by banshee and f-spot. While obviously not true fix, this stabilizes mono on multicore ARM systems.

The current patch is as follows:

mcasadevall@daybreak:~/tmp/mono/natty-mono$ bzr diff
=== modified file 'debian/changelog'
--- debian/changelog 2011-02-19 16:59:03 +0000
+++ debian/changelog 2011-03-25 05:39:07 +0000
@@ -1,3 +1,16 @@
+mono (2.6.7-5ubuntu2) natty; urgency=high
+
+ * Backport ARM OP_MEMORY_BARRIER support and GCC workaround from
+ mono 2.10.1. Partial fix forLP: #619981
+ - Following revisions from git:
+ - 7bd422cfeee3622c4ebfe75ba450ca0d664fedbe - Implement
+ mono_memory_barrier () and OP_MEMORY_BARRIER for ARM.
+ - 9c868e2ee43178c8d05161c92489cc9191cc29c7 - Set cfg->uses_rgctx_reg in
+ another code path too on arm, to fix --regression generics.exe.
+ * Disable SMP on ARM by default (Works around LP: #619981)
+
+ -- Michael Casadevall <email address hidden> Tue, 08 Mar 2011 09:56:36 -0800
+
 mono (2.6.7-5ubuntu1) natty; urgency=low

   * Build packages for ppc64.

=== modified file 'debian/control'
--- debian/control 2011-02-19 16:59:03 +0000
+++ debian/control 2011-03-08 18:08:25 +0000
@@ -1,7 +1,8 @@
 Source: mono
 Section: cli-mono
 Priority: optional
-Maintainer: Debian Mono Group <email address hidden>
+Maintainer: Ubuntu Developers <email address hidden>
+XSBC-Original-Maintainer: Debian Mono Group <email address hidden>
 Uploaders: Mirco Bauer <email address hidden>, Sebastian Dröge <email address hidden>, Jo Shields <email address hidden>
 Build-Depends: debhelper (>= 7),
  dpkg-dev (>= 1.13.19),

=== modified file 'mono/arch/arm/arm-codegen.h'
--- mono/arch/arm/arm-codegen.h 2010-06-06 17:45:35 +0000
+++ mono/arch/arm/arm-codegen.h 2011-03-08 17:40:37 +0000
@@ -1084,6 +1084,16 @@
 #define ARM_MOVT_REG_IMM_COND(p, rd, imm16, cond) ARM_EMIT(p, (((cond) << 28) | (3 << 24) | (4 << 20) | ((((guint32)(imm16)) >> 12) << 16) | ((rd) << 12) | (((guint32)(imm16)) & 0xfff)))
 #define ARM_MOVT_REG_IMM(p, rd, imm16) ARM_MOVT_REG_IMM_COND ((p), (rd), (imm16), ARMCOND_AL)

+/* MCR */
+#define ARM_DEF_MCR_COND(coproc, opc1, rt, crn, crm, opc2, cond) \
+ ARM_DEF_COND ((cond)) | ((0xe << 24) | (((opc1) & 0x7) << 21) | (0 << 20) | (((crn) & 0xf) << 16) | (((rt) & 0xf) << 12) | (((coproc) & 0xf) << 8) | (((opc2) & 0x7) << 5) | (1 << 4) | (((crm) & 0xf) << 0))
+
+#define ARM_MCR_COND(p, coproc, opc1, rt, crn, crm, opc2, cond) \
+ ARM_EMIT(p, ARM_DEF_MCR_COND ((coproc), (opc1), (rt), (crn), (crm), (opc2), (cond)))
+
+#define ARM_MCR(p, coproc, opc1, rt, crn, crm, opc2) \
+ ARM_MCR_COND ((p), (coproc), (opc1), (rt), (crn), (crm), (opc2), ARMCOND_AL)
+
 #ifdef __cplusplus
 }
 #endif

=== modified file 'mono/mini/cpu-arm.md'
--- mono/mini/cpu-a...

Read more...

Jani confirmed this works properly on a clean system. Requesting a beta freeze exception so we get more widespread testing of mono on ARM.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mono - 2.6.7-5ubuntu2

---------------
mono (2.6.7-5ubuntu2) natty; urgency=high

  * Backport ARM OP_MEMORY_BARRIER support and GCC workaround from
    mono 2.10.1. Partial fix forLP: #619981
    - Following revisions from git:
      - 7bd422cfeee3622c4ebfe75ba450ca0d664fedbe - Implement
        mono_memory_barrier () and OP_MEMORY_BARRIER for ARM.
      - 9c868e2ee43178c8d05161c92489cc9191cc29c7 - Set cfg->uses_rgctx_reg in
        another code path too on arm, to fix --regression generics.exe.
  * Disable SMP on ARM by default (Works around LP: #619981)
 -- Michael Casadevall <email address hidden> Tue, 08 Mar 2011 09:56:36 -0800

Changed in mono (Ubuntu Natty):
status: Confirmed → Fix Released

On Mon, Mar 28, 2011, Michael Casadevall wrote:
> --- mono/mini/driver.c 2010-06-06 17:45:35 +0000
> +++ mono/mini/driver.c 2011-03-25 05:33:36 +0000
> @@ -1291,8 +1291,20 @@
> setlocale (LC_ALL, "");
>
> #if HAVE_SCHED_SETAFFINITY
> +
> +/**
> + * FIXME: The Mono JIT (mini) is non-SMP safe on ARM currently.
> + * Force us to be non-SMP unless a we have MONO_FORCE_SMP
> + * environmental variable set (to allow us to continue to
> + * debugging efforts
> + **/
> +#if defined(__ARM_EABI__)
> + if (!getenv ("MONO_FORCE_SMP")) {
> +#else
> if (getenv ("MONO_NO_SMP")) {
> +#endif // __ARM_EABI__
> unsigned long proc_mask = 1;
> +

 I didn't find the closing braces for the above new if () { constructs?

--
Loïc Minier

Jani Monoses (jani) wrote :

There was an if (getenv("MONO_NO_SMP") { there already, the patch only adds another ifdef branch only, the closing } is likely common.

Changed in mono (Ubuntu Natty):
status: Fix Released → Confirmed
status: Confirmed → Fix Released
Tobin Davis (gruemaster) on 2012-02-25
Changed in mono (Ubuntu Maverick):
assignee: Michael Casadevall (mcasadevall) → nobody
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers