Comment 11 for bug 1304754

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1304754] Re: gccgo on ppc64el using split stacks when not supported

On Wed, Apr 16, 2014 at 4:26 PM, Anton Blanchard <email address hidden> wrote:
> I've made some progress with these fails. A lot of the confusion is
> around the way gccgo hooks the SEGV handler and attempts to backtrace
> all goroutines (the code is in runtime_tracebackothers())
>
> It does this by calling runtime_gogo() which temporarily switches to the
> goroutine using setcontext(). If the context is bad in any way, this
> will cause us to SEGV again. I printed out the stack pointer (r1) and
> the NIA during this stack backtracing, and we see where things go south
> just as we are about to dump goroutine 0:
>
> goroutine 0 [idle]:
> DEBUG: runtime_gogo r1 0 nia 0
>
> r1 = 0, nia = 0. When we call setcontext on this invalid context we die
> with:
>
> jujud[5258]: bad frame in setup_rt_frame: 0000000000000000 nip
> 0000000000000000 lr 0000000000000000
>
> Perhaps we aren't saving away the context for goroutine 0 correctly.

Hmm, could be. It looks like the process was crashing anyway.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304754
>
> Title:
> gccgo on ppc64el using split stacks when not supported
>
> Status in “gccgo-4.9” package in Ubuntu:
> Confirmed
>
> Bug description:
> On kernels 3.13-18 and 3.13-23 (there may be others) the kernel is
> killing gccgo compiled binaries
>
> [18519.444748] jujud[19277]: bad frame in setup_rt_frame:
> 0000000000000000 nip 0000000000000000 lr 0000000000000000
> [18519.673632] init: juju-agent-ubuntu-local main process (19220)
> killed by SEGV signal
> [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning
>
> In powerpc/kernel/signal_64.c:
>
> sys_rt_sigreturn is jumping to the badframe: label and executing an
> unconditional force_sigsegv which is delivered to the userland
> process. Like C++, gccgo tries to decode SIGSEGV as a nil pointer
> access and blame some random function that happened to be the top
> stack frame.
>
> Reverting to the 3.13-08 kernel appears to resolve the issue which
> (weakly) points the finger at the recent switch to 64k pages.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754/+subscriptions