golang calls to exec crash user emulation

Bug #1696773 reported by Will Newton
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

An example program can be found here:

https://github.com/willnewton/qemucrash

This code starts a goroutine (thread) and calls exec repeatedly. This works ok natively but when run under ARM user emulation it segfaults (usually, there are occasionally other failures).

Tags: arm linux-user
Revision history for this message
Will Newton (will-newton) wrote :

You will need to apply the patch from https://bugs.launchpad.net/qemu/+bug/1696353 to run this sample app on current master.

Peter Maydell (pmaydell)
tags: added: arm linux-user
Revision history for this message
Edward Vielmetti (edward-vielmetti) wrote :

This bug is mentioned in this account from Cloudflare of porting their software stack to arm64:

https://blog.cloudflare.com/porting-our-software-to-arm64/

The relevant section from that blog reads as follows:

# Intermittent Go Failures

> With a decent amount of Go code running through our CI system, it was easy to spot a trend of intermittent segfaults.

> Going on a hunch, we confirmed a hypothesis that non-deterministic failures are generally due to threading issues. Unfortunately, opinion on the issue tracker showed that Go / QEMU incompatibilities aren’t a priority, so we were left without an upstream fix.

> The workaround we came up with is simple: if the problem is threading-related, limit where the threads can run! When we package our internal go binaries, we add a .deb post-install script to detect if we’re running under ARM64 emulation, and if so, reduce the number of CPUs the go binary can run under to one. We lose performance by pinning to one CPU, but this slowdown is negligible when we’re already running under emulation, and slow code is better than non-working code.

> With the workaround in place, reports of intermittent crashes dropped to zero. Onto the next problem!

Revision history for this message
Paweł Moll (pawel-moll) wrote :
Download full text (69.2 KiB)

Observed the same here... (details of the environment at the end of the post)

Just building a hello world is good enough to crash go 3 times out of 4:

--8<---------------------------------------------------------------------
ubuntu@qemu:~$ cat <<EOF > hello.go
> package main
>
> import "fmt"
>
> func main() {
> fmt.Println("Hello")
> }
> EOF

--8<---------------------------------------------------------------------

If I build it with affinity set to a single CPU, all is fine:

ubuntu@qemu:~$ taskset -c 1 /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build
ubuntu@qemu:~$ taskset -c 1 /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build
ubuntu@qemu:~$ taskset -c 1 /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build
ubuntu@qemu:~$ taskset -c 1 /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build

But with the go build going multithreaded, all sorts of hells break loose:

--8<---------------------------------------------------------------------
ubuntu@qemu:~$ /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build
fatal error: exitsyscall: syscall frame is no longer valid
fatal error: malloc deadlock
panic during panic

runtime stack:
runtime.startpanic_m()
 /usr/lib/go-1.10/src/runtime/panic.go:690 +fatal error: unexpected signal during runtime execution
0x178stack trace unavailable
--8<---------------------------------------------------------------------

or this:

--8<---------------------------------------------------------------------
ubuntu@qemu:~$ /usr/bin/qemu-aarch64-static /usr/lib/go-1.10/bin/go build
fatal error: unexpected signal value
panic during panic

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0x14420009ff0
stack: frame={sp:0x14420007900, fp:0x14420007920} stack=[0x14420002000,0x1442000a000)
0000014420007800: 0000000000000000 0000000000000084
0000014420007810: 000000000043d694 <runtime.sigtrampgo+44> 0000000000000000
0000014420007820: 0000000000000000 0000000000000000
0000014420007830: 00000144200ae000 0000000000000000
0000014420007840: 0000014420007920 00000144200079a0
0000014420007850: 000000000045266c <runtime.sigtramp+52> 0000014400000004
0000014420007860: 0000014420007920 00000144200079a0
0000014420007870: 00000144200ae180 0000000000000000
0000014420007880: 0000000000000000 0000000000000000
0000014420007890: 0000000000000000 0000000000000000
00000144200078a0: 0000000000000000 0000000000000000
00000144200078b0: 0000000000000000 00000144200ae180
00000144200078c0: 0000000000000000 0000000000000000
00000144200078d0: 0000000000000000 0000000000000000
00000144200078e0: 0000000000000000 0000000000000000
00000144200078f0: 0000000000000000 0000000000000000
0000014420007900: <0000014420009ff0 0000000000000004
0000014420007910: 0000014420007920 00000144200079a0
0000014420007920: >0000000000000004 0000000000000002
0000014420007930: 00000144201a4180 0000000000000000
0000014420007940: 000000000000000c 0000000000000001
0000014420007950: 0000000000000000 0000000000000000
0000014420007960: 0000000000000000 0000000000000000
0000014420007970: 0000000000000000 0000000000000000
0000014420007980: 0000...

Revision history for this message
Peter Maydell (pmaydell) wrote :

The 'qemucrash' test case from this bug still crashes as of current head-of-git (4.1 rc1).

Changed in qemu:
status: New → Confirmed
Revision history for this message
Peter Maydell (pmaydell) wrote :

The 'qemucrash' test case problem seems to be because we were incorrectly implementing 'sigaltstack' as setting a process-wide signal stack. This is incorrect, as sigaltstack stacks are supposed to be per-thread, and the Go runtime relies on this. I've just sent a patch which seems to me to fix the qemucrash test case, at least:

https://<email address hidden>/

Changed in qemu:
status: Confirmed → Won't Fix
status: Won't Fix → In Progress
Revision history for this message
Peter Maydell (pmaydell) wrote :

The sigaltstack fix is now in master (commit 5bfce0b74fbd5d530) and at least in my test environment this also fixes the "can't build hello.go reliably" example. So I'm marking this as 'fix committed'. If there are still problems with running Go binaries, these are likely to be independent bugs, so please open fresh LP issues for them.

Changed in qemu:
status: In Progress → Fix Committed
Revision history for this message
Peter Maydell (pmaydell) wrote :

Note that this fix will be in the upcoming 4.1 release.

Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.