Comment 6 for bug 1304754

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1304754] Re: gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels

Thanks Anton, this is great debugging.

I tried the peano experiment on my -8 (4k) kernel and it failed as expected.

I talked to the upstream who said that ./configure should detect that
-fsplit-stack isn't supported on PPC and fall back to giving each
goroutine a full stack.

I will investigate this today.

With this said, should this bug be reassiged to gccgo (trusty) ?

On Thu, Apr 10, 2014 at 8:44 PM, Anton Blanchard <email address hidden> wrote:
> Based on the fail, I took a look at how gccgo handles stacks. It relies
> on the split stack feature in gold, which doesn't appear to be
> implemented for ppc64.
>
> Running one of the go recursion testcases (attached) shows what happens
> when we run out of stack and don't have the split stack feature to save
> us:
>
> #gccgo -g -O2 -o peano peano.go
> # ./peano
> Segmentation fault
>
> And we get the setup_rt_frame error in dmesg:
>
> peano[4538]: bad frame in setup_rt_frame: 000000c20ff7f000 nip
> 0000000010001018 lr 0000000010001024
>
> As expected, we are just continually recurse without checking out stack
> pointer for overflow:
>
> 0x0000000010001008 <+8>: cmpdi r3,0
> 0x000000001000100c <+12>: beq 0x10001040 <main.count+64>
> 0x0000000010001010 <+16>: mflr r0
> 0x0000000010001014 <+20>: std r0,16(r1)
> 0x0000000010001018 <+24>: stdu r1,-32(r1)
> 0x000000001000101c <+28>: ld r3,0(r3)
> 0x0000000010001020 <+32>: bl 0x10001008 <main.count+8>
>
>
> ** Attachment added: "peano.go"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754/+attachment/4079310/+files/peano.go
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304754
>
> Title:
> gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels
>
> Status in "linux" package in Ubuntu:
> Incomplete
>
> Bug description:
> On kernels 3.13-18 and 3.13-23 (there may be others) the kernel is
> killing gccgo compiled binaries
>
> [18519.444748] jujud[19277]: bad frame in setup_rt_frame:
> 0000000000000000 nip 0000000000000000 lr 0000000000000000
> [18519.673632] init: juju-agent-ubuntu-local main process (19220)
> killed by SEGV signal
> [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning
>
> In powerpc/kernel/signal_64.c:
>
> sys_rt_sigreturn is jumping to the badframe: label and executing an
> unconditional force_sigsegv which is delivered to the userland
> process. Like C++, gccgo tries to decode SIGSEGV as a nil pointer
> access and blame some random function that happened to be the top
> stack frame.
>
> Reverting to the 3.13-08 kernel appears to resolve the issue which
> (weakly) points the finger at the recent switch to 64k pages.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754/+subscriptions