gdb reports 'corrupt stack' on armhf without symbols

Bug #1325503 reported by Jean-Baptiste Lallement
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linaro GDB
New
Undecided
Unassigned
gdb (Ubuntu)
Triaged
High
Canonical Foundations Team

Bug Description

[Test Case]
sleep 120 &
kill -SEGV %1

Observe a corrupt stack in the generated crash file.

Original Report
---------------
On armhf crash files fail to retrace and gdb reports 'corrupt stack' errors

For example bug 1323241
Thread 1 (Thread 0xb0b3b450 (LWP 2243)):
#0 0x00000030 in ?? ()
No symbol table info available.
#1 0xa9990cbe in ?? () from /usr/lib/arm-linux-gnueabihf/unity8/qml/Unity/Launcher/libUnityLauncher-qml.so
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I also tried directly on the device to run an unstripped and stripped build of cat and gdb fails to unwind the stack when the binary is stripped. The result of this test is:

== unstripped ==
Reading symbols from ./cat...done.
(gdb) run
Starting program: /home/phablet/tmp/coreutils-8.21/src/cat
^C
Program received signal SIGINT, Interrupt.
0xb6f6e914 in read () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) bt
#0 0xb6f6e914 in read () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0x0000b648 in read (__nbytes=65536, __buf=0x19000, __fd=0) at /usr/include/arm-linux-gnueabihf/bits/unistd.h:44
#2 safe_read (fd=0, buf=buf@entry=0x19000, count=count@entry=65536) at lib/safe-read.c:66
#3 0x00009ace in simple_cat (bufsize=65536, buf=0x19000 "") at src/cat.c:168
#4 main (argc=1, argv=<optimized out>) at src/cat.c:730
(gdb) quit

== stripped ==
Reading symbols from ./cat...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/phablet/tmp/coreutils-8.21/src/cat
^C
Program received signal SIGINT, Interrupt.
0xb6f6e914 in read () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) bt
#0 0xb6f6e914 in read () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0x0000b648 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I'd expect the same number of frames and addresses when the binary is stripped or not with '??' instead of names when it is stripped.

ProblemType: BugDistroRelease: Ubuntu 14.10
Package: gdb (not installed)
Uname: Linux 3.4.0-5-mako armv7l
ApportVersion: 2.14.3-0ubuntu1
Architecture: armhf
Date: Mon Jun 2 11:07:07 2014
InstallationDate: Installed on 2014-06-02 (0 days ago)
InstallationMedia: Ubuntu Utopic Unicorn (development branch) - armhf (20140602)SourcePackage: gdb
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
description: updated
summary: - gdb reports 'corrupt stack' on armhf
+ gdb reports 'corrupt stack' on armhf without symbols
Changed in gdb (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gdb (Ubuntu):
status: New → Confirmed
tags: added: qa-touch
Revision history for this message
Brian Murray (brian-murray) wrote :

This is true with both gdb and gdb-minimal versions 7.7.1-0ubuntu3 in utopic.

Revision history for this message
Brian Murray (brian-murray) wrote :

I installed the trusty version of gdb 7.7-0ubuntu3.1 on a utopic system with the same results. A binary that had strip run against it results in a corrupt stack.

Revision history for this message
Brian Murray (brian-murray) wrote :

I install the saucy version of gdb-minimal 7.6.1-0ubuntu3 on a utopic system and had the same corrupt stack.

Revision history for this message
Brian Murray (brian-murray) wrote :

I installed the precise version of gdb (there was no gdb-minimal) on a utopic system and that also ended in a corrupt stack.

Revision history for this message
Brian Murray (brian-murray) wrote :

I built coreutils on utopic with DEB_BUILD_OPTIONS=nocheck,nostrip and did receive a good stacktrace from gdb.

description: updated
Revision history for this message
Matthias Klose (doko) wrote :

the frame unwinder on arm needs work.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Setting to critical because crashes from the phone fail to retrace.

Changed in gdb (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
importance: High → Critical
Revision history for this message
Matthias Klose (doko) wrote :

you should be able to get these call stacks by using the dwarf debug info, which is usually found in the .ddbg packages. Are these packages installed when trying to run crash?

Revision history for this message
Julien Funk (jaboing) wrote :

So my team is telling me this defect is blocking the usefulness of the long term test automation and crashes we're detecting there. Those tests are a key player in Ubuntu Engineering goals for RTM so I +1 the 'critical' status of this defect and will be checking it regularly for progress.

Revision history for this message
Brian Murray (brian-murray) wrote :

Installing coreutils-dbgsym package 8.21-1ubuntu5 did not produce a more useful Stacktrace.

Stacktrace:
 #0 0xb6e90fa6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 No locals.
 #1 0x0000a91e in ?? ()
 No symbol table info available.
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

How was "sleep" and system glibc compiled?

To get reliable stack traces GDB has to have access to either frame pointer (-fno-omit-frame-pointer compiler flag) or to unwind tables (-funwind-tables). In the absence of either of these, GDB has to guess where stack frame boundaries are. In this case GDB guesses wrong.

The recommended way to get reliable stack traces is to use -funwind-tables, which does not have performance penalty (like -fno-omit-frame-pointer), and only increases disk footprint of binaries by several percent.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1325503] Re: gdb reports 'corrupt stack' on armhf without symbols

Hi Maxim,

On Wed, Jun 25, 2014 at 11:51:51PM -0000, Maxim Kuvyrkov wrote:
> How was "sleep" and system glibc compiled?

> To get reliable stack traces GDB has to have access to either frame
> pointer (-fno-omit-frame-pointer compiler flag) or to unwind tables
> (-funwind-tables). In the absence of either of these, GDB has to guess
> where stack frame boundaries are. In this case GDB guesses wrong.

These binaries are built using the stock compiler flags in Ubuntu.

[...]
arm-linux-gnueabihf-gcc -std=gnu99 -g -O2 -fstack-protector -param=ssp-buffer-size=4 -Wformat -Werror=format-security -DSYSLOG_SUCCESS -DSYSLOG_FAILURE -DSYSLOG_NON_ROOT -Wl,--as-needed -Wl,-Bsymbolic-functions -Wl,-z,relro -o src/sleep src/sleep.o src/libver.a lib/libcoreutils.a lib/libcoreutils.a
[...]

  https://launchpad.net/ubuntu/+source/coreutils/8.21-1ubuntu5/+build/5843130
  https://launchpad.net/ubuntu/+source/coreutils/8.21-1ubuntu5/+build/5843130/+files/buildlog_ubuntu-trusty-armhf.coreutils_8.21-1ubuntu5_UPLOADING.txt.gz

> The recommended way to get reliable stack traces is to use -funwind-
> tables, which does not have performance penalty (like -fno-omit-frame-
> pointer), and only increases disk footprint of binaries by several
> percent.

If this is recommended, should it be turned on by default in gcc upstream?

Barring that, should we turn it on by default in our gcc build in Ubuntu, or
in our common distro compiler flags?

From Michael, I understand that "several percent" here is "on the order of
5%".

Thanks,
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Brian Murray (brian-murray) wrote :

This may be partially resolved by the reintroduction of the patch in bug 1233185. The initial Stacktrace still ends in a corrupt stack but the non-multiarch version of gdb is able to produce a more useful Stacktrace and a StacktraceAddressSignature when retracing the crash. For example, with the gnome calculator crash from apport test crashes we can see the following differences.

Stacktrace:
 #0 0x4081ed22 in poll () from /lib/arm-linux-gnueabihf/libc.so.6
 No symbol table info available.
 #1 0x4067c4e6 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
 No symbol table info available.
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

After retracing:

Stacktrace:
 #0 0x4081ed22 in poll () at ../sysdeps/unix/syscall-template.S:81
 No locals.
 #1 0x4067c4e6 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x41400c68, timeout=-1, context=0xd00d0) at /build/buildd/glib2.0-2.40.0/./glib/gmain.c:4028
         poll_func = 0x406862a5 <g_poll>
 #2 g_main_context_iterate (context=context@entry=0xd00d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/buildd/glib2.0-2.40.0/./glib/gmain.c:3729
         max_priority = 2147483647
         timeout = -1
         some_ready = <optimized out>
         nfds = 1
         allocated_nfds = <optimized out>
         fds = 0x41400c68
 #3 0x4067c588 in g_main_context_iteration (context=context@entry=0xd00d0, may_block=may_block@entry=1) at /build/buildd/glib2.0-2.40.0/./glib/gmain.c:3795
         retval = <optimized out>
 #4 0x410c1cd0 in dconf_gdbus_worker_thread (user_data=0xd00d0) at dconf-gdbus-thread.c:82
         context = 0xd00d0
 #5 0x40695eea in g_thread_proxy (data=0x9db80) at /build/buildd/glib2.0-2.40.0/./glib/gthread.c:764
         thread = 0x9db80
 #6 0x4077efbc in start_thread (arg=0x413ff2d0) at pthread_create.c:314
         pd = 0x413ff2d0
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {1094710504, 1094709968, 1, 1094708424, 1094708752, 1081684380, 1094710532, -1090523248, 220780268, 207672765, 0 <repeats 54 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = 0
         pagesize_m1 = <optimized out>
         sp = <optimized out>
         freesize = <optimized out>
         __PRETTY_FUNCTION__ = "start_thread"
 #7 0x40827b3c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /srv/daisy.staging.ubuntu.com/production/cache/Ubuntu 14.04/cache-DhmXbj/sandbox/lib/arm-linux-gnueabihf/libc.so.6
 No locals.
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Changed in gdb (Ubuntu):
status: Confirmed → Triaged
importance: Critical → High
Revision history for this message
Thomas Karl Pietrowski (thopiekar) wrote :
Download full text (5.3 KiB)

I also see a lot of these errors here on Wily at KDE Plasma:

Application: Plasma (plasmashell), signal: Segmentation fault
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
[Current thread is 1 (Thread 0xb2ad9000 (LWP 4363))]

Thread 14 (Thread 0xb076a3e0 (LWP 4365)):
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1 0xb55706b0 in __pthread_cond_wait (cond=0xc0560, mutex=0xc0548) at pthread_cond_wait.c:186
#2 0xb2296592 in ?? () from /usr/lib/arm-linux-gnueabihf/dri/swrast_dri.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 13 (Thread 0xaff6a3e0 (LWP 4366)):
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1 0xb55706b0 in __pthread_cond_wait (cond=0xc0668, mutex=0xc0650) at pthread_cond_wait.c:186
#2 0xb2296592 in ?? () from /usr/lib/arm-linux-gnueabihf/dri/swrast_dri.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 12 (Thread 0xaf76a3e0 (LWP 4367)):
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1 0xb55706b0 in __pthread_cond_wait (cond=0xc0770, mutex=0xc0758) at pthread_cond_wait.c:186
#2 0xb2296592 in ?? () from /usr/lib/arm-linux-gnueabihf/dri/swrast_dri.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 11 (Thread 0xaef6a3e0 (LWP 4368)):
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1 0xb55706b0 in __pthread_cond_wait (cond=0xc0878, mutex=0xc0860) at pthread_cond_wait.c:186
#2 0xb2296592 in ?? () from /usr/lib/arm-linux-gnueabihf/dri/swrast_dri.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 10 (Thread 0xae7143e0 (LWP 4369)):
#0 0xb56904e0 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0xb6bb4168 in ?? () from /usr/lib/arm-linux-gnueabihf/libxcb.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 9 (Thread 0xad1c43e0 (LWP 4377)):
#0 0xb5a03d80 in QTimerInfoList::timerWait(timespec&) () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
#1 0xb5a04c52 in ?? () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
#2 0xb4fb0c54 in g_main_context_prepare () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#3 0xb4fb12ee in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 8 (Thread 0xaad3d3e0 (LWP 4378)):
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
#1 0xb56a011a in __GI___clock_gettime (clock_id=0, tp=0xaad3cbec) at ../sysdeps/unix/clock_gettime.c:99
#2 0xb59095b2 in ?? () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0xa8fe23e0 (LWP 4386)):
#0 0xb4fb13c2 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0xa7dff3e0 (LWP 4387)):
#0 0xb56904e2 in poll () at ../sysdeps/unix/syscall-template.S:81
#1 ...

Read more...

Revision history for this message
Nonny Moose (moosenonny10) wrote :

I am running Ubuntu Mate Xenial and gdb always reports the following bt, symbols or not:
```
Program received signal SIGSEGV, Segmentation fault.
0x76fd9822 in ?? () from /lib/ld-linux-armhf.so.3
(gdb) bt
#0 0x76fd9822 in ?? () from /lib/ld-linux-armhf.so.3
#1 0x76fd983a in ?? () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Is this related?
```

Revision history for this message
matteo (matteoids) wrote :

Nonny Moose I have the same problem of you (running Ubuntu Mate and gdb v.7.11.1).
Actually I can't perform any application in debug due to the same segmentation fault!
Did you solve the problem?
Thanks
Matteo

Revision history for this message
Brian Makin (merimus) wrote :

ubuntu-mate (Xenial Xerus) on raspberry pi 3.

having similar issue.

$ gcc -g main.c -o main
$ gdb main
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...done.
(gdb) run
Starting program: /home/merimus/foo/main

Program received signal SIGSEGV, Segmentation fault.
0x76fd9822 in ?? () from /lib/ld-linux-armhf.so.3

Revision history for this message
alamaral (alamaral) wrote :

Don't know if anyone is still working on this problem (i.e. corrupt stack on arm in gdb), but I've found a solution. Any code that is compiled with -g seems to work fine with gdb, as far as generating a backtrace. The problem is that most system library code is built without -g, so gdb doesn't have whatever information is necessary to unwind the stack properly.

It seems that gcc, with the -g option, adds .cfi directives into the assembler code, and gdb needs that info. Remove the .cfi directives and you get the "Backtrace stopped: previous frame identical to this frame (corrupt stack?)" error.

Even a very simple program with subroutine calls (similar to below) will exhibit this problem:

void foo(int i)
{
    if (i < 100) foo(i+1);
    printf("i=%d\n", i);
}

main()
{
    foo(0);
}

When compiled without -g each time the program calls foo the stack looks to gdb like it's corrupted, and only the topmost level is shown, along with the error. Compile with -g and everything works, at least until you step into printf, which wasn't compiled with -g.

Once you step out of printf you'll get your stack back.

This feels like a compiler bug to me, i.e. gcc __SHOULD__ generate at least the minimal set of .cfi directives that are needed for gdb to generate a backtrace, regardless of whether -g is specified or not.

Revision history for this message
Matthias Klose (doko) wrote :

On 17.05.2018 16:03, alamaral wrote:
> The problem is that most system library code is built
> without -g, so gdb doesn't have whatever information is necessary to
> unwind the stack properly.

this is wrong. every package is built with -g, however the debug symbols are
split out into separate -dbg or dbgsym packages.

> It seems that gcc, with the -g option, adds .cfi directives into the
> assembler code, and gdb needs that info. Remove the .cfi directives and
> you get the "Backtrace stopped: previous frame identical to this frame
> (corrupt stack?)" error.
>
> Even a very simple program with subroutine calls (similar to below) will
> exhibit this problem:
>
> void foo(int i)
> {
> if (i < 100) foo(i+1);
> printf("i=%d\n", i);
> }
>
> main()
> {
> foo(0);
> }
>
> When compiled without -g each time the program calls foo the stack looks
> to gdb like it's corrupted, and only the topmost level is shown, along
> with the error. Compile with -g and everything works, at least until
> you step into printf, which wasn't compiled with -g.
>
> Once you step out of printf you'll get your stack back.
>
> This feels like a compiler bug to me, i.e. gcc __SHOULD__ generate at
> least the minimal set of .cfi directives that are needed for gdb to
> generate a backtrace, regardless of whether -g is specified or not.

is gdb using the split out debug symbols for your use case?

Revision history for this message
Benjamin Drung (bdrung) wrote :

This behavior breaks apports autopkgtest. Simple steps:

```
armhf-jammy$ printf '#!/bin/bash\nkill -SEGV $$\n' > self-killing-script
armhf-jammy$ gdb --batch --ex "run self-killing-script" --ex bt /bin/bash
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0xb6e8c9e8 in kill () at ../sysdeps/unix/syscall-template.S:120
120 ../sysdeps/unix/syscall-template.S: No such file or directory.
#0 0xb6e8c9e8 in kill () at ../sysdeps/unix/syscall-template.S:120
#1 0x00474416 in kill_builtin ()
#2 0x00430ffc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
```

To create a proper backtrace, you have to install bash-dbgsym.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.