mono segfaults on ARM

Bug #390591 reported by Dave Martin
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
mono (Ubuntu)
Fix Released
Medium
Unassigned
Jaunty
Invalid
Medium
Unassigned
Karmic
Fix Released
Medium
Unassigned

Bug Description

Mono on ARM currently appears to be unusable. Installing any *-cli package causes a large number of segfaults, as well as running any installed mono application (such as f-spot). At the very least, installation of *-cli packages should not succeed if the installation of the assembly fails.

Test Case:
1. Install f-spot on ARM (it is not a part of ubuntu-desktop on ARM)
2. Obverse of the series of segfaults installing the mono stack
3. Run f-spot

Original Bug Report:
Binary package hint: f-spot

[ dave-martin-arm removed his original incorrect description text; see the original description for details. ]

Tags: armel f-spot
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Note: the above issue applies to the following source package versions, on Jaunty:

f-spot 0.5.0.3-1ubuntu6
mono 2.0.1-4

I haven't investigated whether Karmic is affected.

Loïc Minier (lool)
Changed in f-spot (Ubuntu):
assignee: nobody → Canonical Mobile Team (canonical-mobile)
Changed in f-spot (Ubuntu):
assignee: Canonical Mobile Team (canonical-mobile) → nobody
importance: Undecided → Medium
assignee: nobody → Canonical Mobile Team (canonical-mobile)
Revision history for this message
Michael Casadevall (mcasadevall) wrote :

I'm confirming this bug, I was able to reproduce the behavior in the question.

David: Its not clear if you've not used mono or C# before, but mono uses the PE format for its executables, but instead of using i386 or other assembly code, its CIL byte code. Wikipedia has a fairly good writeup of how .NET (and thus mono) applications work http://en.wikipedia.org/wiki/.NET_Framework. I realize its confusing to see .exec and .dlls on a Linux system, but rest assured that this is proper and expected behavior.

As an added test, I installed f-spot on my ia64, and it works, so there are no architecture-specific bits. I had apport file https://bugs.edge.launchpad.net/ubuntu/+source/f-spot/+bug/390701 which should give us some idea of where the segfault exists, but it seems this is a general mono issue, as I saw segfaults installing and removing mono assemblies but dpkg completed the configuration regardless.

Changed in f-spot (Ubuntu):
status: New → Confirmed
Revision history for this message
Dave Martin (dave-martin-arm) wrote : RE: [Bug 390591] Re: f-spot seems wrongly packaged / unusable for armel

> David: Its not clear if you've not used mono or C# before,

I haven't.

> but mono uses the PE format for its executables, but instead
> of using i386 or other assembly code, its CIL byte code.
> Wikipedia has a fairly good writeup of how .NET (and thus
> mono) applications work
> http://en.wikipedia.org/wiki/.NET_Framework. I realize its
> confusing to see .exec and .dlls on a Linux system, but rest
> assured that this is proper and expected behavior.

Thanks for the clarification --- this explains a lot. I'll take a look!

I will try to post some more detailed symptom information when I have
it.

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Revision history for this message
Michael Casadevall (mcasadevall) wrote :

Based on the available evidence, this is a general mono issue vs. an issue with f-spot itself. Reassigning to mono.

summary: - f-spot seems wrongly packaged / unusable for armel
+ mono segfaults on ARM
affects: f-spot (Ubuntu) → mono (Ubuntu)
description: updated
tags: added: f-spot
Revision history for this message
Michael Casadevall (mcasadevall) wrote :

I did a hello world test on my desktop, and on my ARM board:

mcasadevall@titan:~/src/mono-test$ uname -a
Linux titan 2.6.30-2-ia64 #5~ppa1 SMP Mon Jun 8 13:00:59 EDT 2009 ia64 GNU/Linux

mcasadevall@titan:~/src/mono-test$ lsb_release -d
Description: Ubuntu karmic (development branch)

mcasadevall@titan:~/src/mono-test$ gmcs2 test.cs
mcasadevall@titan:~/src/mono-test$ cat test.cs
public class Hello1
{
   public static void Main()
   {
      System.Console.WriteLine("Hello, World!");
   }
}

mcasadevall@titan:~/src/mono-test$ ./test.exe
Hello, World!

mcasadevall@titan:~/src/mono-test$ mono --version
Mono JIT compiler version 2.4 (Debian 2.4+dfsg-4)
Copyright (C) 2002-2008 Novell, Inc and Contributors. www.mono-project.com
 TLS: __thread
 GC: Included Boehm (with typed GC)
 SIGSEGV: normal
 Notifications: epoll
 Architecture: ia64
 Disabled: none

On ARM:
mcasadevall@dawn:~/src$ uname -a
Linux dawn 2.6.28-11-imx51 #42-Ubuntu Fri Apr 17 05:50:13 UTC 2009 armv7l GNU/Linux

mcasadevall@dawn:~/src$ lsb_release -d
Description: Ubuntu karmic (development branch)

mcasadevall@dawn:~/src/thunderbird-2.0.0.22+build1+nobinonly$ mono --version
Mono JIT compiler version 2.0.1 (tarball)
Copyright (C) 2002-2008 Novell, Inc and Contributors. www.mono-project.com
 TLS: normal
 GC: Included Boehm (with typed GC)
 SIGSEGV: normal
 Notifications: epoll
 Architecture: armel,soft-float
 Disabled: none

mcasadevall@dawn:~$ gmcs2 test.cs
Segmentation fault (core dumped)

mcasadevall@dawn:~/src$ ./test.exe
Segmentation fault (core dumped)

mcasadevall@dawn:~/src$ mono ./test.exe
Segmentation fault (core dumped)

mcasadevall@dawn:~/src$ cat /proc/cpu/alignment
User: 0
System: 15122
Skipped: 0
Half: 0
Word: 0
DWord: 0
Multi: 15122
User faults: 3 (fixup+warn)

Running mono itself didn't create a crash that apport could see, but apport did get the compiler (gmcs.exe) crash, which is Bug #390802. Hopefully the retrace will be useful this time.

Revision history for this message
Michael Casadevall (mcasadevall) wrote :

Rerunning the above test with mono 2.4 from the archive passes on karmic, and I was able to run f-spot. Talking with pusling from #debian-arm on IRC confirms that this issue was resolved with 2.4; the underlying cause being a segfault in the codebase which just happened to almost always manifest on ARM vs. other architectures.

Now all that remains is finding what changed in 2.4 to fix this, and hopefully backport it to 2.0

Revision history for this message
Michael Casadevall (mcasadevall) wrote :

So on the topic of making this bug more interesting, I have been talking with directhex (Jo Shields) who claims he hasn't seen any segfaults in the buildd logs for jaunty. This lead me to repeat the Hello World test on the porting box and low and behold:

(jaunty)mcasadevall@rimu:~$ mono --version
Mono JIT compiler version 2.0.1 (tarball)
Copyright (C) 2002-2008 Novell, Inc and Contributors. www.mono-project.com
 TLS: normal
 GC: Included Boehm (with typed GC)
 SIGSEGV: normal
 Notifications: epoll
 Architecture: armel,soft-float
 Disabled: none

(jaunty)mcasadevall@rimu:~$ mono test.exe
Hello, World!

(jaunty)mcasadevall@rimu:~/src$ cat /proc/cpuinfo
Processor : Feroceon rev 0 (v5l)
BogoMIPS : 999.42
Features : swp half thumb fastmult vfp edsp
CPU implementer : 0x41
CPU architecture: 5TE
CPU variant : 0x1
CPU part : 0x926
CPU revision : 0

Hardware : Marvell DB-78x00-BP Development Board
Revision : 0000
Serial : 0000000000000000

It seems the issue is dependent on what hardware (and possibly what kernel) is being using on the ARM board.

Changed in mono (Ubuntu Jaunty):
assignee: nobody → Canonical Mobile Team (canonical-mobile)
importance: Undecided → Medium
milestone: none → jaunty-updates
status: New → Confirmed
Changed in mono (Ubuntu Karmic):
status: Confirmed → Fix Released
Revision history for this message
Dave Martin (dave-martin-arm) wrote :
Download full text (3.2 KiB)

This is interesting --- I definitely get SIGILL, not SIGSEGV, in the mono binary. I did try installing f-spot-dbgsym, but this didn't seem to give me any debug symbols even when explicitly attempting to load them with "symbol-file /usr/lib/debug/usr/bin/mono" in GDB.

The instruction at PC is pop {r4} (0xE8BD0010), which definitely should not cause SIGILL. However, this instruction is immediately preceded by an old-style ARM syscall which probably doesn't work on the imx51 kernel. I couldn't see any explicit hard-coded syscalls in the mono code, but does the JIT maybe insert them into its output?

I'm not sure why the SIGILL is happening, but it may be that the JIT tried to execute a cache flush syscall which failed to execute, so the CPU may have executed stale garbage from the I-cache causing the SIGILL. But that's just speculation on my part :P

Is CONFIG_OABI_COMPAT enabled in the Marvell board? (Try zgrep OABI /proc/config.gz) If it is, this would allow the old-style syscalls to work and could explain the difference between the two boards' behaviour: CONFIG_OABI_COMPAT it is not enabled in the imx51 kernel right now.

Ideally, mono would be ported to use new-style syscalls, but CONFIG_OABI_COMPAT may provide an easier fix (if it works). I don't know how this handled by other JIT implementations. Putting the syscall number in r7 as well as in the SVC (SWI) instruction comment field will generally work with both ABI variants.

$ gdb --args /usr/bin/mono /usr/lib/f-spot/f-spot.exe
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi"...
(no debugging symbols found)
(no debugging symbols found)
(gdb) r
Starting program: /usr/bin/mono /usr/lib/f-spot/f-spot.exe
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[New Thread 0x40020050 (LWP 13079)]

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0x40020050 (LWP 13079)]
0x0002eb60 in ?? ()
(gdb) x/4i $pc-4
0x2eb5c: svc 0x009f0002
0x2eb60: pop {r4}
0x2eb64: bx lr
0x2eb68: cmp r3, #0 ; 0x0
(gdb) i r
r0 0x40358000 1077248000
r1 0x40358098 1077248152
r2 0x0 0
r3 0x40358098 1077248152
r4 0x40358098 1077248152
r5 0x40358080 1077248128
r6 0x40358084 1077248132
r7 0x40358088 1077248136
r8 0x40358068 1077248104
r9 0xe1a0f00c 3785420812
r10 0x4035804c 1077248076
r11 0x4035804c 1077248076
r12 0x0 0
sp 0xbeeb229c 0xbeeb229c
lr 0x387f4 231412
pc 0x2eb60 0x2eb60
fps 0x0 0
cpsr 0x...

Read more...

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Note: for the reasons given above, this issue should not be assumed to be fixed for Karmic unless someone has tried to reproduce it on a Babbage board.

Revision history for this message
Oliver Grawert (ogra) wrote :

confirming fixed on karmic on a babbage2 board (still running with the vendor kernel though, will re-confirm if our kernel and images are ready with alpha3)

f-spot runs as expected, all mono assemblies get installed without errors with the 2.4 version of mono

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Can you check whether CONFIG_OABI_COMPAT is enabled in the vendor kernel? The verndor kernel configs have historically not been the same as the stock Ubuntu kernel.

Revision history for this message
Oliver Grawert (ogra) wrote :

doesnt look like ...

ogra@babbage2:~$ uname -r
2.6.28-191-ga2f78a4
ogra@babbage2:~$ zcat /proc/config.gz |grep OABI
# CONFIG_OABI_COMPAT is not set

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I rebuilt the Jaunty imx51 kernel with CONFIG_OABI_COMPAT enabled: _now_ I think I may now be seeing the same bug that everyone else is seeing.

Now, I get the following behaviour:

  * f-spot and mono appear to install without segfaults or other problems.
  * Running f-spot now causes a segfault (without CONFIG_OABI_COMPAT, it failed with SIGILL instead).

See the attached files for output and backtrace information.

If the Karmic kernel configs are based on the Jaunty configs, then the CONFIG_OABI_COMPAT issue is likely to affect Karmic.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Oliver Grawert wrote 4 minutes ago

> doesnt look like ...
>
> ogra@babbage2:~$ uname -r
> 2.6.28-191-ga2f78a4
> ogra@babbage2:~$ zcat /proc/config.gz |grep OABI
> # CONFIG_OABI_COMPAT is not set

Weird... now I'm confused.

f-spot definitely works fine on this kernel? If so, perhaps the OABI support thing is not the problem after all.

description: updated
Revision history for this message
Oliver Grawert (ogra) wrote :

are you sure you run mono 2.4 over there ?

i'm also not so sure the way you run gdb is the right one for debugging mono, its probably better to install the mono-debugger package and run f-spot with the --mdb arg instead, so you get detailed debug output from the interpreter

(looking at the /usr/bin/f-spot script it seems also to accept --gdb as argument btw)

Revision history for this message
Oliver Grawert (ogra) wrote :

whoops, i just found out that mono-debugger only gets built on x86 compatible architectures, thats bad indeed

Revision history for this message
Dave Martin (dave-martin-arm) wrote : RE: [Bug 390591] Re: mono segfaults on ARM

Looking in /usr/bin/f-spot, there seems to be a --gdb option which I could
try.

I was using mono-2.0.x (Jaunty) though --- sorry for the confusion there; it
may not be useful to debug this after all.

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Revision history for this message
Loïc Minier (lool) wrote :

We're mixing jaunty and karmic here; under karmic we now have mono 2.4 which seems to work fine without OABI_COMPAT (as reported by ogra who's running the vendor's kernel) and f-spot can be launched. Both the SIGILL and SEGV are gone.

Under jaunty, you confirmed that OABI_COMPAT fixes the SIGILL and allows you to move on to the SEGV. This seemed to be a known bug in mono < 2.4. We could probably provide PPA packages for mono if you care to demo f-spot and the like on a jaunty base, but we can't include mono 2.4 in jaunty. We could consider adding a non-intrusive patch to mono in jaunty.

Dave, I think you wanted to check whether OABI_COMPAT has a performance penalty; if it doesn't, could you please file a bug to request setting this config on armel kernel configs in karmic? Thanks!

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I can't see any mono-jit binary package on ports.ubuntu.com for 2.4; this might be due to packaging changes between the versions.

Can someone who has a build of 2.4 please do
$ objdump -d /usr/bin/mono | grep svc

It would be interesting to see whether the 'svc 0x009f0002' instructions present in 2.0 have either gone away, or have r7 loaded with a syscall number as required by the new syscall ABI. This is not done in 2.0.

Thanks

Revision history for this message
Michael Casadevall (mcasadevall) wrote :

Dave, here's the output on Karmic:

mcasadevall@dawn:~$ objdump -d /usr/bin/mono | grep svc
   60ba0: bfe00000 svclt 0x00e00000
   737d8: 3ff80000 svccc 0x00f80000
   7889c: 3ff80000 svccc 0x00f80000
   9dd38: af286bcb svcge 0x00286bcb
   ea4ac: 7ff80000 svcvc 0x00f80000
   ea678: 3ff00000 svccc 0x00f00000 ; IMB
   ea67c: bff00000 svclt 0x00f00000 ; IMB
   ea684: 7ff00000 svcvc 0x00f00000 ; IMB
   ea688: 7fefffff svcvc 0x00efffff
   ea68c: 7ff80000 svcvc 0x00f80000
   ea6f0: 7ff80000 svcvc 0x00f80000
   ea754: 7ff80000 svcvc 0x00f80000
   ea864: 7fefffff svcvc 0x00efffff
   ea868: 7ff80000 svcvc 0x00f80000
   ea8c8: bff00000 svclt 0x00f00000 ; IMB
   ea8cc: 3ff00000 svccc 0x00f00000 ; IMB
   ea8d0: 7ff80000 svcvc 0x00f80000
   ea92c: bff00000 svclt 0x00f00000 ; IMB
   ea930: 3ff00000 svccc 0x00f00000 ; IMB
   ea934: 7ff80000 svcvc 0x00f80000
   eaa50: 3fe00000 svccc 0x00e00000
   eaa54: 3ff00000 svccc 0x00f00000 ; IMB
   eab40: 7fefffff svcvc 0x00efffff
   eab44: 7ff00000 svcvc 0x00f00000 ; IMB
  1646f4: 0f0f0f0f svceq 0x000f0f0f
  1662cc: 8f1bbcdc svchi 0x001bbcdc
  1782ec: 7ffffb83 svcvc 0x00fffb83
  179c58: 7ffffbff svcvc 0x00fffbff
  180cf4: ef000000 svc 0x00000000

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Thanks.

Hmmm, here's mine from mono-jit 2.0.1-4 (Jaunty). It does look like there has been a change to the way mono makes syscalls: the svc 0x9f0002 calls have gone, and svc 0 has appeared: this looks like a transition from the old syscall ABI to the new one.

This still doesn't explain the problems I still got after enabling CONFIG_OABI_COMPAT. (See https://bugs.launchpad.net/ubuntu/+source/mono/+bug/390591/comments/13)

This may be explained by / related to the other bugs listed in this thread.

https://bugs.launchpad.net/bugs/390802
https://bugs.edge.launchpad.net/ubuntu/+source/f-spot/+bug/390701

However, those are private so I have not been able to check.

Revision history for this message
Loïc Minier (lool) wrote :

Unembargoed bug #390802 and bug #390701

Revision history for this message
riki (keilormirandas) wrote :

ubuntu has no auido with my cell videos, something about ARM

Curtis Hovey (sinzui)
Changed in mono (Ubuntu):
assignee: Registry Administrators (registry) → nobody
Changed in mono (Ubuntu Jaunty):
assignee: Registry Administrators (registry) → nobody
Changed in mono (Ubuntu Karmic):
assignee: Registry Administrators (registry) → nobody
Revision history for this message
JC Hulce (soaringsky) wrote :

Thank you for taking the time to report this bug. This issue has been fixed in newer versions of Ubuntu, and Jaunty is EOL, so I am closing this bug task.

Changed in mono (Ubuntu Jaunty):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.