Segmentation fault (core dumped) while re-drawing traces

Bug #1553804 reported by Hermann Kleier
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
Critical
Maciej Suminski

Bug Description

I am using

kicad 0.201603051447+6608~43~ubuntu15.10.1
kicad-common 0.201511181331+6321~30~ubuntu15.10.1

When trying to clean up traces by re-drawing I observed a “segmentation fault”. Actually, this happens from time to time (when tidying up layouts) but I am unable to determine the precise context. Meaning, I cannot reproduce the failure intentionally. But nevertheless, it happens from time to time.

All I have is a core dump and —from that— a backtrace:

-------------- snip ---------------------------------

h@b4:~/cvs/Xmit433/board$ gdb /usr/bin/kicad core
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/kicad...(no debugging symbols found)...done.
[New LWP 15468]
[New LWP 15488]
[New LWP 15489]
[New LWP 15486]
[New LWP 15485]
[New LWP 15490]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `kicad'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007ff903f49b98 in ?? () from /usr/bin/_pcbnew.kiface
[Current thread is 1 (Thread 0x7ff91684ca00 (LWP 15468))]
(gdb) where
#0 0x00007ff903f49b98 in ?? () from /usr/bin/_pcbnew.kiface
#1 0x00007ff90376ac8a in ?? () from /usr/bin/_pcbnew.kiface
#2 0x00007ff90376b2c6 in ?? () from /usr/bin/_pcbnew.kiface
#3 0x00007ff903754a4a in ROUTER_TOOL::performRouting() () from /usr/bin/_pcbnew.kiface
#4 0x00007ff903755a56 in ROUTER_TOOL::mainLoop(PNS_ROUTER_MODE) () from /usr/bin/_pcbnew.kiface
#5 0x00007ff903755f4a in ROUTER_TOOL::RouteSingleTrace(TOOL_EVENT const&) () from /usr/bin/_pcbnew.kiface
#6 0x00007ff90388e471 in ?? () from /usr/bin/_pcbnew.kiface
#7 0x00007ff908cb8d61 in make_fcontext () at libs/context/src/asm/make_x86_64_sysv_elf_gas.S:64
#8 0x0000000000000000 in ?? ()
(gdb) info proc all

      0x7ff902e28000 0x7ff903d47000 0xf1f000 0x0 /usr/bin/_pcbnew.kiface
      0x7ff903d47000 0x7ff903f46000 0x1ff000 0xf1f000 /usr/bin/_pcbnew.kiface
      0x7ff903f46000 0x7ff903fcc000 0x86000 0xf1e000 /usr/bin/_pcbnew.kiface
      0x7ff903fcc000 0x7ff903fee000 0x22000 0xfa4000 /usr/bin/_pcbnew.kiface

---------------------- snap ----------------------------------------------

The output of “info proc all” should enable you to find out the names of the three functions at “#0”, “#1”, and “#2”.

I reckon that I will be observing the very same bug again (in a while).

OT: The new interactive router is a great tool which simplifies routing a lot. Thanks for that.

Tags: cern pcbnew pns
Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

Hi Hermann,

Many thanks for your report. Would you be able to run a debug build of Kicad to check where exactly the problem occurs?

Also, could you send us (in private) the PCB that causes the crash?

Cheers,
Tom

Revision history for this message
martin layley (martin-scoutingsquirrel) wrote :

It is not Ubuntu specific. Also occurs in OSX in OpenGL mode using the interactive router. Has existed for me since I started in late September. I'm using the nightly builds. I haven't got around to filing a bug report as either (1) the crashes were occurring too close to deadlines, or (2) changing my sequence of actions would affect the repeatability of the crash and I wanted something easily crashable to file here.

/martin

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

@Martin: I forgot to mention that the problem came up when I started to use the OpenGL mode.

@ Tom: I will create a debug build within the next days and will continue to work with that. I have not found any pattern when the problem arises. I feel that the faults are irreproducible. Therefore I reckon that analysing the core file of a debug build will be the only way to localise the problem. I do not feel that my board is somehow unusual. Anyway, I’ll attach it. The exception was thrown when I tried to re-draw the MPU-track at the very bottom of the board. The board has been modified since then and there is still a lot to do. Hopefully I’ll get another core.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Current Status: I am running the debug build (git cloned kicad-source-mirror). When starting up a debug message told me that The i965 graphic driver was inaccessible. The reason was that I did not belong to the group “video”. Adding me to this group gave me access to the video driver. There was no core dump since then. Alas, the time is too short to draw safe conclusions already.

I.e., my observations were for a de-activated i965 driver. To my knowledge this means that I have been using the VESA (fallback) drivers.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Today I have got another core dump. Unfortunately, I was using an “old” version [KICAD_BUILD_VERSION "(2016-04-11 BZR 6687, Git f239aee)", BOOST_LIB_VERSION "1_58"] but it had debugging messages compiled in. Therefore I got some messages on the console (file: console_messages.txt) and I could fire up gdb with the core.

The fault happened in PNS_ROUTER::updateView() when I was shoving a trace in a very simple low-density layout.

How can I provide more information?

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

One more event. It always happens in the same context: shoving (and pushing?) traces.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Another “Segmentation fault” at the very same location in a similar context. This time I tried to shove a via. All this is not reproducible in the sense of “shove the via at R13 north-west”. I guess it depends more on the timing (mouse speed, frequency of clicks, …) but I have not found a pattern yet.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Another “Segmentation fault”. Same location. Similar context: Shoving traces.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

One more event. As is happens very often now it may make sense to save the kicad_pcb file. “Very often” means that I got just 184 lines of log messages between the SEGVs. The board is a simple evaluation board with an Arduino type header. Nothing special (see attachment).

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Now I have observed the SEGV three times in a row(!!!) in a very similar context:

 - Fire up Pcbnew with the kicad_pcb file (attached in #9 of this thread).
 - Locate U7. Two traces (nets: SDA and SCL) go down from this chip using two vias pairs.
 - Take the lower vias pair and shove it down to make room for another horizontal trace.
 - Try to move the horizontal trace (net: DEBUG) down.
 - See that it is made up of three segments. Move the segments to the same level.
 - Click Edit → Cleanup Tracks and Vias to merge them.
 - Try to move the DEBUG-trace to verify that it has been fused.
 - Bang! SEGV!

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Three SEGVs later… There seems to be a pattern. Same board, other trace. A “Shove” IMMEDIATELY following a SUCCESSFUL “Cleanup Tracks and Vias” seems to so the trick. Very speculative: Looks like the “Cleanup Tracks and Vias” corrupts the track list.

Revision history for this message
Nick Østergaard (nickoe) wrote :

What build are you testing with now, is this the same issue as orignally reported? Is this in the legacy or gal canvases?

tags: added: cern pcbnew pns
Revision history for this message
Nick Østergaard (nickoe) wrote :

Ok, looking at some of the traces, it seem you are using gal with the pns.

Changed in kicad:
importance: Undecided → Critical
Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

The version is as reported in #5 of this thread [KICAD_BUILD_VERSION "(2016-04-11 BZR 6687, Git f239aee)", BOOST_LIB_VERSION "1_58"]. I this stage of board development am using OpenGL (because of its GREAT shoving function).

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

I could upgrade to the most recent revision (from git) and see if the behaviour is still reproducible. Would you like me to do so?

Revision history for this message
Nick Østergaard (nickoe) wrote :

You could try it, there has been some GAL changes very recently, but unfortunately also something that has made stuff crash a bit in rev 6912 and 6913. You might want to experiment a bit.

Revision history for this message
Hermann Kleier (hermann-kleier) wrote :

Meanwhile I have upgraded to [KICAD_BUILD_VERSION "(2016-06-11 BZR 6917, Git d49ecb1)"]. The procedure described in #10 still works. Meaning that I get a SEGV after some seconds reliably.

Revision history for this message
Nick Østergaard (nickoe) wrote :

Hi Hermann, could you try to see if you can reproduce it in 4.0.4, otherwise we might want to see if this can be debugged further before the 4.0.5 release.

Changed in kicad:
assignee: nobody → Maciej Sumiński (orsonmmz)
status: New → In Progress
Revision history for this message
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision a95df8463d707959e4fd59d090712f700da5a7d8
https://git.launchpad.net/kicad/patch/?id=a95df8463d707959e4fd59d090712f700da5a7d8

Changed in kicad:
status: In Progress → Fix Committed
Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Now I wonder how to backport the fix to the 4.0 branch. The PNS router creates its own internal data model when it is launched. If the data is modified somewhere else (e.g. track clean up process), the PNS model becomes stale and inevitably leads to a crash.

The current fix relies on the BOARD_COMMIT mechanism which sends a message notifying all potentially interested tools that the model has changed. It is done every time there is a change occurring, so the fix ought to be universal. (BTW. to be replaced with observer in the near future).

The 4.0 branch does not have BOARD_COMMIT, hence the most trivial solution is to add the notification in every place where the model may be changed in the background. In the legacy canvas, I see the routing tool is simply deactivated when the Track/Via clean up dialog is invoked, perhaps that is the way to go. Still, I guess it fixes only this particular case and not the root cause.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote : Re: [Bug 1553804] Re: Segmentation fault (core dumped) while re-drawing traces

On 11/28/2016 10:03 AM, Maciej Sumiński wrote:
> Now I wonder how to backport the fix to the 4.0 branch. The PNS router
> creates its own internal data model when it is launched. If the data is
> modified somewhere else (e.g. track clean up process), the PNS model
> becomes stale and inevitably leads to a crash.

Does the PNS router actually have to create it's own separate internal
data model? Can this data model be moved into the BOARD object? This
seems to be the primary design flaw. Using message passing or callbacks
seems more complicated than necessary.

>
> The current fix relies on the BOARD_COMMIT mechanism which sends a
> message notifying all potentially interested tools that the model has
> changed. It is done every time there is a change occurring, so the fix
> ought to be universal. (BTW. to be replaced with observer in the near
> future).
>
> The 4.0 branch does not have BOARD_COMMIT, hence the most trivial
> solution is to add the notification in every place where the model may
> be changed in the background. In the legacy canvas, I see the routing
> tool is simply deactivated when the Track/Via clean up dialog is
> invoked, perhaps that is the way to go. Still, I guess it fixes only
> this particular case and not the root cause.
>

Disabling every tool that can corrupt the PNS router data is going to be
ugly. Every time someone adds new tool they will have to know if they
could potentially corrupt the PNS data. This will create lots of
opportunities to get it wrong. I'm OK with disabling tools in the
stable 4 branch. In the development branch, we should make sure we have
a better solution.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@Orson, are you currently working on fixing this in the stable 4 branch? If so, I will hold off on the 4.0.5 release until you commit the fix. If not, when do you think you will have it fixed so I can make a decision on whether or not to release 4.0.5.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Wayne,

Excuse the late response. Ultimately, we would like BOARD to use the model currently used by the PNS, meaning r-tree for spatial indexing and geometry library for collision detection (e.g. connectivity, DRC). I guess it would be easier to introduce once we get rid of the legacy canvas.

I think it is an important issue, so let me have one day to fix it. I think a workaround (disabling the active tool upon dialog appearance) should not be hard to implement.

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :
Download full text (4.5 KiB)

Workaround: wylacz pns przy wywolaniu onmodify ;)

Sent from my Samsung Galaxy smartphone.

-------- Original message --------
From: Maciej Sumiński <email address hidden>
Date: 30/11/2016 21:05 (GMT+01:00)
To: Tomasz Wlostowski <email address hidden>
Subject: [Bug 1553804] Re: Segmentation fault (core dumped) while re-drawing traces

Wayne,

Excuse the late response. Ultimately, we would like BOARD to use the
model currently used by the PNS, meaning r-tree for spatial indexing and
geometry library for collision detection (e.g. connectivity, DRC). I
guess it would be easier to introduce once we get rid of the legacy
canvas.

I think it is an important issue, so let me have one day to fix it. I
think a workaround (disabling the active tool upon dialog appearance)
should not be hard to implement.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1553804

Title:
  Segmentation fault (core dumped) while re-drawing traces

Status in KiCad:
  Fix Committed

Bug description:
  I am using

  kicad 0.201603051447+6608~43~ubuntu15.10.1
  kicad-common 0.201511181331+6321~30~ubuntu15.10.1

  When trying to clean up traces by re-drawing I observed a
  “segmentation fault”. Actually, this happens from time to time (when
  tidying up layouts) but I am unable to determine the precise context.
  Meaning, I cannot reproduce the failure intentionally. But
  nevertheless, it happens from time to time.

  All I have is a core dump and —from that— a backtrace:

  -------------- snip ---------------------------------

  h@b4:~/cvs/Xmit433/board$ gdb /usr/bin/kicad core
  GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
  Copyright (C) 2015 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law. Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>.
  Find the GDB manual and other documentation resources online at:
  <http://www.gnu.org/software/gdb/documentation/>.
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from /usr/bin/kicad...(no debugging symbols found)...done.
  [New LWP 15468]
  [New LWP 15488]
  [New LWP 15489]
  [New LWP 15486]
  [New LWP 15485]
  [New LWP 15490]
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  Core was generated by `kicad'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0 0x00007ff903f49b98 in ?? () from /usr/bin/_pcbnew.kiface
  [Current thread is 1 (Thread 0x7ff91684ca00 (LWP 15468))]
  (gdb) where
  #0 0x00007ff903f49b98 in ?? () from /usr/bin/_pcbnew.kiface
  #1 0x00007ff90376ac8a in ?? () from /usr/bin/_pcbnew.kiface
  #2 0x00007ff90376b2c6 in ?? () fro...

Read more...

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@Orson, I'll hold off on the 4.0.5 release until you commit the fix for
this in the stable 4 branch. Thanks.

On 11/30/2016 2:53 PM, Maciej Sumiński wrote:
> Wayne,
>
> Excuse the late response. Ultimately, we would like BOARD to use the
> model currently used by the PNS, meaning r-tree for spatial indexing and
> geometry library for collision detection (e.g. connectivity, DRC). I
> guess it would be easier to introduce once we get rid of the legacy
> canvas.
>
> I think it is an important issue, so let me have one day to fix it. I
> think a workaround (disabling the active tool upon dialog appearance)
> should not be hard to implement.
>

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Wayne,

I have just pushed a workaround which disables the active tool when a dialog pops up. It is the easiest method to shoot oneself in the foot, thought not the most elegant solution.

I really hope we manage to release 5.0 soon or at leastthere are no many bugs left in 4.0. It is becoming harder and harder to maintain. Currently I am not able to build the 4.0 branch with Github plugin enabled (even after downgrading boost). The problem seems to be caused by disabled SSLv3 in newer OpenSSL versions, but I have not really checked.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

On 12/1/2016 9:22 AM, Maciej Sumiński wrote:
> Wayne,
>
> I have just pushed a workaround which disables the active tool when a
> dialog pops up. It is the easiest method to shoot oneself in the foot,
> thought not the most elegant solution.

I took a look at your commit. Are you sure this is the only path that
could open a dialog that changes the board state? I'm pretty sure that
Process_Special_Functions() is not the only path but I could be wrong.

>
> I really hope we manage to release 5.0 soon or at leastthere are no many
> bugs left in 4.0. It is becoming harder and harder to maintain.
> Currently I am not able to build the 4.0 branch with Github plugin
> enabled (even after downgrading boost). The problem seems to be caused
> by disabled SSLv3 in newer OpenSSL versions, but I have not really
> checked.
>

Now you understand why we were reluctant to provide stable releases in
the past. It becomes increasingly difficult as time goes by and
requires more an more time to maintain.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Wayne,

I have just checked and it seems the fix should sufficient. Most of the dialogs capable of introducing changes to a BOARD object actually deactivate the current tool. The exceptions are:
- Set Footprint Field Sizes, which modifies only texts that PNS does not care about
- Layers Setup, which does not remove trackes if the number of layers is decreased but hides them

I have also reviewed the hotkey list, but I could not find one that could modify BOARD, while keeping the current tool enabled.

To sum up, I think we are safe with the current patch.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

On 12/2/2016 7:41 PM, Maciej Sumiński wrote:
> Wayne,
>
> I have just checked and it seems the fix should sufficient. Most of the dialogs capable of introducing changes to a BOARD object actually deactivate the current tool. The exceptions are:
> - Set Footprint Field Sizes, which modifies only texts that PNS does not care about
> - Layers Setup, which does not remove trackes if the number of layers is decreased but hides them
>
> I have also reviewed the hotkey list, but I could not find one that
> could modify BOARD, while keeping the current tool enabled.
>
> To sum up, I think we are safe with the current patch.
>

Thanks for the update Orson. I'll tag 4.0.5 and push it to launchpad.

Revision history for this message
Luis (luisfhm007) wrote :

For the record, I am still having this issue (I guess is the same or very related) with KiCad 4.0.5 + dfsg1-4~bpo8+1 wxWidgets 3.0.2 Unicode and Boost 1.55.0 on linux 4.9.0, 64 bits, debian as installed by package manager

In OpenGl mode, when the "Mouse drag behavior" is set to "interactive drag" a segmentation fault is generated when clicking and dragging any net. As a workaround using the key "M" allows to interactively move a net without segfault, but if you forget and click and drag... segfault is generated.

the behavior is very consistent and always happens

Revision history for this message
Lutz.H (lutz-heinisch) wrote :

Place cross check:

#1673940 Pcbnew: Unconnected traces lose netcode on reload/connectivity rebuild

Jeff Young (jeyjey)
Changed in kicad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.