fix StacktraceAddressSignatures for frames without addresses

Bug #1533349 reported by Brian Murray on 2016-01-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Apport
Medium
Brian Murray

Bug Description

When adding gdb information to a report that is being retraced, the function crash_signature_addresses is used to create the StacktraceAddressSignature. This is suboptimal given that the crash_signature function is more accurate - per the documtation.

1246 For signal crashes this the concatenation of ExecutablePath, Signal
1247 number, and StacktraceTop function names, separated by a colon. If
1248 StacktraceTop has unknown functions or the report lacks any of those
1249 fields, return None. In this case, you can use
1250 crash_signature_addresses() to get a less precise duplicate signature
1251 based on addresses instead of symbol names.

I discovered this when trying to figure out why bug 1532722 didn't appear to have retraced. From the retracer log files:

./e-t-retracer-app-10.42.32.22/production-logs/retracer-armhf.log-20160112.gz:2016-01-11 14:48:51,401:15371:139705901741824:INFO:root:5d013668-b872-11e5-862f-fa163e78b027:swift:Could not retrace.
./e-t-retracer-app-10.42.32.22/production-logs/retracer-armhf.log-20160112.gz:2016-01-11 14:48:51,401:15371:139705901741824:INFO:root:5d013668-b872-11e5-862f-fa163e78b027:swift:Retraced report missing stacktrace_addr_sig.
./e-t-retracer-app-10.42.32.22/production-logs/retracer-armhf.log-20160112.gz:2016-01-11 14:48:51,402:15371:139705901741824:INFO:root:5d013668-b872-11e5-862f-fa163e78b027:swift:Saved OOPS 5d013668-b872-11e5-862f-fa163e78b027 for manual investigation.

Related branches

Martin Pitt (pitti) wrote :

What is the exact issue here?

> the function crash_signature_addresses is used to create the StacktraceAddressSignature.

Yes, but that's not a bug, but by design. This is being used for client-side (whoopsie) duplicate detection, and we can't rely on a "real" crash signature on the client side. The retracer on the other hand should stop looking at SAS once it figured out a crash_signature(). We should not muddy the semantics of SAS by sometimes putting in different information.

Can you please explain what the real issue is here? I figure we need to change something in apport-retrace or crashdb.py, but I don't think this should change the behaviour of user boxes that report crashes.

Thanks!

Changed in apport:
status: New → Incomplete
summary: - StacktraceAddressSignature is generated using suboptimal function
+ crashes sometimes do not get retraces
summary: - crashes sometimes do not get retraces
+ crashes sometimes do not get retraced

As I tried to describe in the description the exact issue is that apport-retrace does not create a StacktraceAddressSignature for the ifupdown crash seen in bug 1532722. I'll attach a sample crash report for you. Here is pdb output when retracing the crash:

ipdb> self.crash_signature_addresses()
ipdb> self.crash_signature()
'/sbin/ifup:11:__GI_strncpy:strncpy:do_interface:main'

Brian Murray (brian-murray) wrote :
Brian Murray (brian-murray) wrote :

Could we discuss this next week?

Martin Pitt (pitti) wrote :
Download full text (3.5 KiB)

Ah, so the problem is that gdb is not showing most of the addresses in the frames on "bt", so that the address signature cannot be computed. I tried to retrace this manually on xenial. With both the current ifupdown version (i. e. report vs. current xenial version mismatch) as well as the actual 0.8.6-1ubuntu1 version I can reproduce this stack trace with missing addresses.

Interestingly, "info f" does show a PC for the ones missing an address in "bt":

(gdb) bt
#0 __GI_strncpy (s1=0xbe86aa6f "", s1@entry=0xbe86aa70 "lo", s2=0x5 <error: Cannot access memory at address 0x5>, n=n@entry=80)
    at strncpy.c:41
#1 0x00013032 in strncpy (__len=80, __src=<optimized out>, __dest=0xbe86aa70 "lo")
    at /usr/include/arm-linux-gnueabihf/bits/string3.h:126
#2 do_interface (target_iface=<optimized out>) at main.c:846
#3 0x00011994 in main (argc=<optimized out>, argv=0xbe86ade8) at main.c:1146
(gdb) info f 0
Stack frame at 0xbe86a9b0:
 pc = 0xb6e9a124 in __GI_strncpy (strncpy.c:41); saved pc = 0x13032
 called by frame at 0xbe86ac40
 source language c.
 Arglist at 0xbe86a9a0, args: s1=0xbe86aa6f "", s1@entry=0xbe86aa70 "lo", s2=0x5 <error: Cannot access memory at address 0x5>,
    n=n@entry=80
 Locals at 0xbe86a9a0, Previous frame's sp is 0xbe86a9b0
 Saved registers:
  r4 at 0xbe86a9a0, r5 at 0xbe86a9a4, r6 at 0xbe86a9a8, lr at 0xbe86a9ac
(gdb) info f 1
Stack frame at 0xbe86ac40:
 pc = 0x13032 in strncpy (/usr/include/arm-linux-gnueabihf/bits/string3.h:126); saved pc = 0x11994
 inlined into frame 2, caller of frame at 0xbe86a9b0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0xbe86a9b0
 Saved registers:
  r4 at 0xbe86a9a0, r5 at 0xbe86a9a4, r6 at 0xbe86a9a8, lr at 0xbe86a9ac
(gdb) info f 2
Stack frame at 0xbe86ac40:
 pc = 0x13032 in do_interface (main.c:846); saved pc = 0x11994
 called by frame at 0xbe86ac90, caller of frame at 0xbe86ac40
 source language c.
 Arglist at 0xbe86a9b0, args: target_iface=<optimized out>
 Locals at 0xbe86a9b0, Previous frame's sp is 0xbe86ac40
 Saved registers:
  r4 at 0xbe86ac1c, r5 at 0xbe86ac20, r6 at 0xbe86ac24, r7 at 0xbe86ac28, r8 at 0xbe86ac2c, r9 at 0xbe86ac30, r10 at 0xbe86ac34,
  r11 at 0xbe86ac38, lr at 0xbe86ac3c
(gdb) info f 3
Stack frame at 0xbe86ac90:
 pc = 0x11994 in main (main.c:1146); saved pc = 0xb6e59772
 caller of frame at 0xbe86ac40
 source language c.
 Arglist at 0xbe86ac40, args: argc=<optimized out>, argv=0xbe86ade8
 Locals at 0xbe86ac40, Previous frame's sp is 0xbe86ac90
 Saved registers:
  r4 at 0xbe86ac6c, r5 at 0xbe86ac70, r6 at 0xbe86ac74, r7 at 0xbe86ac78, r8 at 0xbe86ac7c, r9 at 0xbe86ac80, r10 at 0xbe86ac84,
  r11 at 0xbe86ac88, lr at 0xbe86ac8c

Reading ftp://ftp.gnu.org/old-gnu/Manuals/gdb/html_chapter/gdb_7.html suggests that -fomit-frame-pointer could be responsible for this. It also mentions a missing address for the topmost frame. Following the suggestion there, I think we should make crash_signature_addresses() get some fallbacks. In particular, here:

                addr = line.split()[1]
                if not addr.startswith('0x'):
                    continue

instead of ignoring that frame, we should check if the frame ha...

Read more...

summary: - crashes sometimes do not get retraced
+ fix StacktraceAddressSignatures for frames without addresses
Download full text (3.8 KiB)

On Wed, Feb 17, 2016 at 08:26:20AM -0000, Martin Pitt wrote:
> Ah, so the problem is that gdb is not showing most of the addresses in
> the frames on "bt", so that the address signature cannot be computed. I
> tried to retrace this manually on xenial. With both the current ifupdown
> version (i. e. report vs. current xenial version mismatch) as well as
> the actual 0.8.6-1ubuntu1 version I can reproduce this stack trace with
> missing addresses.
>
> Interestingly, "info f" does show a PC for the ones missing an address
> in "bt":
>
> (gdb) bt
> #0 __GI_strncpy (s1=0xbe86aa6f "", s1@entry=0xbe86aa70 "lo", s2=0x5 <error: Cannot access memory at address 0x5>, n=n@entry=80)
> at strncpy.c:41
> #1 0x00013032 in strncpy (__len=80, __src=<optimized out>, __dest=0xbe86aa70 "lo")
> at /usr/include/arm-linux-gnueabihf/bits/string3.h:126
> #2 do_interface (target_iface=<optimized out>) at main.c:846
> #3 0x00011994 in main (argc=<optimized out>, argv=0xbe86ade8) at main.c:1146
> (gdb) info f 0
> Stack frame at 0xbe86a9b0:
> pc = 0xb6e9a124 in __GI_strncpy (strncpy.c:41); saved pc = 0x13032
> called by frame at 0xbe86ac40
> source language c.
> Arglist at 0xbe86a9a0, args: s1=0xbe86aa6f "", s1@entry=0xbe86aa70 "lo", s2=0x5 <error: Cannot access memory at address 0x5>,
> n=n@entry=80
> Locals at 0xbe86a9a0, Previous frame's sp is 0xbe86a9b0
> Saved registers:
> r4 at 0xbe86a9a0, r5 at 0xbe86a9a4, r6 at 0xbe86a9a8, lr at 0xbe86a9ac
> (gdb) info f 1
> Stack frame at 0xbe86ac40:
> pc = 0x13032 in strncpy (/usr/include/arm-linux-gnueabihf/bits/string3.h:126); saved pc = 0x11994
> inlined into frame 2, caller of frame at 0xbe86a9b0
> source language c.
> Arglist at unknown address.
> Locals at unknown address, Previous frame's sp is 0xbe86a9b0
> Saved registers:
> r4 at 0xbe86a9a0, r5 at 0xbe86a9a4, r6 at 0xbe86a9a8, lr at 0xbe86a9ac
> (gdb) info f 2
> Stack frame at 0xbe86ac40:
> pc = 0x13032 in do_interface (main.c:846); saved pc = 0x11994
> called by frame at 0xbe86ac90, caller of frame at 0xbe86ac40
> source language c.
> Arglist at 0xbe86a9b0, args: target_iface=<optimized out>
> Locals at 0xbe86a9b0, Previous frame's sp is 0xbe86ac40
> Saved registers:
> r4 at 0xbe86ac1c, r5 at 0xbe86ac20, r6 at 0xbe86ac24, r7 at 0xbe86ac28, r8 at 0xbe86ac2c, r9 at 0xbe86ac30, r10 at 0xbe86ac34,
> r11 at 0xbe86ac38, lr at 0xbe86ac3c
> (gdb) info f 3
> Stack frame at 0xbe86ac90:
> pc = 0x11994 in main (main.c:1146); saved pc = 0xb6e59772
> caller of frame at 0xbe86ac40
> source language c.
> Arglist at 0xbe86ac40, args: argc=<optimized out>, argv=0xbe86ade8
> Locals at 0xbe86ac40, Previous frame's sp is 0xbe86ac90
> Saved registers:
> r4 at 0xbe86ac6c, r5 at 0xbe86ac70, r6 at 0xbe86ac74, r7 at 0xbe86ac78, r8 at 0xbe86ac7c, r9 at 0xbe86ac80, r10 at 0xbe86ac84,
> r11 at 0xbe86ac88, lr at 0xbe86ac8c
>
> Reading ftp://ftp.gnu.org/old-gnu/Manuals/gdb/html_chapter/gdb_7.html
> suggests that -fomit-frame-pointer could be responsible for this. It
> also mentions a missing address for the topmost frame. Following the
> suggestion there, I think we should make crash_signature_addresses() get
> some fallbacks. In particular, ...

Read more...

Martin Pitt (pitti) wrote :

Brian Murray [2016-02-17 23:37 -0000]:
> Given that 'info f' for a frame returns a pc, is there a reason not to
> use that?

Two reasons:

 - At least in the crash I looked at, the PC was exactly the same as
   in the previous frame. I suppose due to -fomit-frame-pointer gdb
   could not figure out the new PC and thus didn't write it into the
   bt frame. So if we'd do that, we would merely duplicate the
   previous address.

 - We'd need to add this logic to add_gdb_info(), as only at that time
   we have gdb actually running and could get PC for each frame. Not a
   big deal, but I wonder if it's actually worth it.

Changed in apport:
status: Incomplete → Confirmed
Changed in apport:
importance: Undecided → Medium
assignee: nobody → Brian Murray (brian-murray)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments