losing all tracks of a board

Bug #1767826 reported by Novak Tamas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
Critical
Maciej Suminski

Bug Description

Sometimes when pcb-new crashes, it looses all the tracks. I mean with a board almost fully wired pcbnew crashes. Running again pcbnew I see only ratsnetst opening the .kicad-pcb file.

I can re-take my work with deleting .kicad_pcb and renaming .kicad_pcb-bak to .kicad_pcb.
It happens randomly approx. once in every couple days of heavy using kicad.

Latest "void" .kicad_pcb file is much smaller than previous good (from -bak version). I attach both example files...maybe something turnes out.

I guess it is not enough info to do anything with it, so the real question is:
is there any way to investigate a case like this? Some log or journal file, or running in a debugger?

Tags: pcbnew
Revision history for this message
Novak Tamas (novak-7) wrote :
Revision history for this message
Nick Østergaard (nickoe) wrote :

Please add version information. If you are on linux, you can enable coredumpctl. I think it can be enabled as a service with systemd. Google should be your friend here. with the coredump it should be possible to get a backtrace after it crashes.

tags: added: pcbnew
Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Running in a debugger may get you a backtrace which frequently is a great debugging help. I did not understand what are your expectations regarding the attached files.

Is there any pattern to reproduce a lose-all-tracks kind of crash? It sounds scary and I have never experienced such a problem myself.

Changed in kicad:
milestone: none → 5.0.0-rc2
Revision history for this message
Novak Tamas (novak-7) wrote :

I'm on Win7/64, and this happened lately with nigtly r10086, but it has happened to me a couple times in last months with different nightly versions.

Application: kicad
Version: (5.0.0-rc2-dev-493-gd776eaca8), release build
Libraries:
    wxWidgets 3.0.3
    libcurl/7.54.1 OpenSSL/1.0.2l zlib/1.2.11 libssh2/1.8.0 nghttp2/1.23.1 librtmp/2.3
Platform: Windows 7 (build 7601, Service Pack 1), 64-bit edition, 64 bit, Little endian, wxMSW
Build Info:
    wxWidgets: 3.0.3 (wchar_t,wx containers,compatible with 2.8)
    Boost: 1.60.0
    Curl: 7.54.1
    Compiler: GCC 7.1.0 with C++ ABI 1011

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=OFF
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_SPICE=ON

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

It happened to me as well about a month ago but I thought it was just some weird anomaly and it hasn't happened since but I haven't done anything significant since then. The really frustrating part about this is that when you open your board file the next time, it overwrites the back up file with the board with no traces and you loose everything. I lost about 4 hours of work when it happened to me. I was not amused. It happened when I was using the modern canvas. What is strange about this crash is that there is no auto save file so it looks like the board was saved as expected with no traces. I would recommend that if you get a crash while routing using the modern canvas that you make a copy of the board backup file before opening the board again or you will loose all of your previous work up to that point.

Changed in kicad:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Is it a crash on save? I do not see how otherwise all tracks might be lost in file. Is the board file truncated or otherwise malformed after such crash?

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

I'm not sure when the crash happened. I have Ctrl+S on auto pilot just out of decades of habit so I may have saved right before the crash happened. I will run a debug build of kicad from gdb until this issue is resolved. I recommend that everyone else do the same whenever routing a board using the modern canvas.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

The board is not malformed. It opens correctly minus *all* of the traces and vias.

Revision history for this message
Novak Tamas (novak-7) wrote :

Yes, as far as I remember it always happened when I tried to save. I didn't really loose any data, as opening again after crash I saw the empty board, so exited without overwriting backup, and I could regain my work.
It always happened with the almost ready board, at final trimming. My (good or bad?) design practice is to wire the board with thinnest wires, then finally replace them with thicker tracks where required. The handling of transitions of different track widths is quite lame (You can't really D drag these corners, as normal same-width corners), so weird things usually happen at that point.

What debugger you propose for Win7 for continuous background checking of KiCad?

Revision history for this message
Aleksandr Sh (dsa-t) wrote :

You can compile a debug build from source using MSYS2 and work in KiCad without a debugger (it runs slowly when a debugger), then after KiCad crashes and windows "KiCad had stopped" dialog pops up, you can attach to KiCad process with mingw's gdb and get backtraces(but do not close the dialog).
GDB commands:

attach <KiCad process id>
set pagination off
set logging on
thread apply all bt
detach

Backtraces will be in gdb.txt in mingw working directory.

You can get "<KiCad process id>" from the task manager.
"set pagination off" makes so you don't have to press return to get all of the backtraces
"set logging on" will enable logging.
"thread apply all bt" will get backtraces of all threads.

That works for me on Windows 10, should work on Windows 7.

Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :
Download full text (5.6 KiB)

Sometimes after I dragged some tracks/vias in routing mode and press ESC a "Clarify Selection" dialog pops up. It *think* it always contains two entries. When I move the mouse pointer over an entry a "ghost track/via" appears. These are tracks/vias before the drag operation.
When I click on an entry in order to select the highlighted track/via and remove the elements by pressing "DEL", pcbnew crashes.

Maybe six month ago this clarify dialog appeared quite often and removing selected "ghost elements" did not lead to a crash but all routed tracks disappeared. Not immediately but the displayed tracks could no longer be edited and after reopening the board file the tracks were gone if the file has been saved after the remove operation. This bug has been (partly?) fixed.

Now I could reproduce this behavior while pcbnew was running within a gdb session. The crash is apparently a set of failed assertions. Pcbnew did not crash within gdb (at least not until I closed it) and after I saved the current board and reopened it all tracks/vias were gone.

Maybe this is the same bug as described here and Linux/Windows behave differently.

Console output:

/data/src/kicad-source-mirror/common/view/view.cpp(374): assert "viewData->m_view == this" failed in Remove().
[Thread 0x7fffcedd1700 (LWP 24032) exited]
/data/src/kicad-source-mirror/common/dlist.cpp(163): assert "aElement->GetList() == this" failed in remove().
/data/src/kicad-source-mirror/common/dlist.cpp(171): assert "last == aElement" failed in remove().
/data/src/kicad-source-mirror/common/dlist.cpp(181): assert "first == aElement" failed in remove().
[New Thread 0x7fffcedd1700 (LWP 24117)]

GDB output after I closed pcbnew:

Thread 1 "kicad" received signal SIGSEGV, Segmentation fault.
0x00000000000000a1 in ?? ()
(gdb) bt
#0 0x00000000000000a1 in ?? ()
#1 0x00007fffe58a7e17 in PICKED_ITEMS_LIST::ClearListAndDeleteItems (this=0x3bbe6b0) at /data/src/kicad-source-mirror/common/undo_redo_container.cpp:135
#2 0x00007fffe51eedff in PCB_SCREEN::ClearUndoORRedoList (this=0x402b160, aList=..., aItemCount=-1) at /data/src/kicad-source-mirror/pcbnew/undo_redo.cpp:614
#3 0x00007fffe561086b in BASE_SCREEN::ClearUndoRedoList (this=0x402b160) at /data/src/kicad-source-mirror/common/base_screen.cpp:409
#4 0x00007fffe5138315 in PCB_EDIT_FRAME::Clear_Pcb (this=0x27bf4c0, aQuery=false) at /data/src/kicad-source-mirror/pcbnew/initpcb.cpp:52
#5 0x00007fffe51803a0 in PCB_EDIT_FRAME::OnCloseWindow (this=0x27bf4c0, Event=...) at /data/src/kicad-source-mirror/pcbnew/pcb_edit_frame.cpp:719
#6 0x00007ffff648736e in wxAppConsoleBase::CallEventHandler (this=0x822890, handler=0x27bf4c0, functor=..., event=...) at ../src/common/appbase.cpp:623
#7 0x00007ffff660d4d7 in wxEvtHandler::ProcessEventIfMatchesId (entry=..., handler=<optimized out>, event=...) at ../src/common/event.cpp:1390
#8 0x00007ffff660d5cb in wxEventHashTable::HandleEvent (this=<optimized out>, event=..., self=self@entry=0x27bf4c0) at ../src/common/event.cpp:996
#9 0x00007ffff660d97b in wxEvtHandler::TryHereOnly (this=0x27bf4c0, event=...) at ../src/common/event.cpp:1587
#10 0x00007fffe586be9c in EDA_BASE_FRAME::ProcessEvent (this=0x27bf4c0, aEvent=...

Read more...

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

One interesting thing regarding the files Novak attached in #1 is that both have non zero track count in the "general" section of the header, but only one of them actually contains any segment entries.

The number of tracks for "general" section is obtained via DLIST::GetCount(), which returns value of an integer 'count' variable, maintained as objects are added are removed. The fact that despite non zero 'count' value there are no tracks saved, may indicate a corrupted list (e.g. nullptr as the list head).

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

Hi guys,

Can anyone send a full core dump file next time this happens? I'm unable to recreate it here no matter how hard I try...

Tom

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

I am trying to understand how to bring the track DLIST to this particular corrupted state (list head == nullptr, count > 0). I suppose this would be a hint informative enough to let us search for particular patterns in the code.

One simple way would be to prepend a nullptr (e.g. board->m_Track.PushFront( nullptr )), but this instantly crashes when DHEAD::insert() tries to set next and previous elements, so no bonus.

Another way would be to add tracks to another DLIST instance and clear it:

  board->m_Track.PushFront( nullptr );
  DLIST<TRACK> tracks;

  for( TRACK* t = board->m_Track; t; t = t->Next() )
      tracks.PushBack( t );

  while( tracks.PopFront() );

The snippet above results in similar symptoms as DLIST::PopFront() resets next/previous pointers for each element, effectively disconnecting them on both list, but the counter value is only maintained for the local 'tracks' variable (IMHO this is bad enough to replace DLIST with an STL container). The only difference is that the original list head still points to a track, which is saved in the file, so it is not exactly the problem we are trying to reproduce.

Any other ideas?

Revision history for this message
Wayne Stambaugh (stambaughw) wrote : Re: [Bug 1767826] Re: losing all tracks of a board

On 05/04/2018 11:30 AM, Maciej Suminski wrote:
> I am trying to understand how to bring the track DLIST to this
> particular corrupted state (list head == nullptr, count > 0). I suppose
> this would be a hint informative enough to let us search for particular
> patterns in the code.

This is a bit scary. It would seem to agree with the stack trace and
assertion outputs that where provided in this bug report. It does
appear that this condition occurs somewhere in the undo/redo list code.
Whether or not it is the undo/redo code itself or somewhere the
undo/redo buffers are being set to state that causes the issue. I do
know that I've only had this happen with the p&s router. Given that
I've routed far more boards with the legacy router and have never seen
this issue, this may be a useful data point.

>
>
> One simple way would be to prepend a nullptr (e.g. board->m_Track.PushFront( nullptr )), but this instantly crashes when DHEAD::insert() tries to set next and previous elements, so no bonus.
>
>
> Another way would be to add tracks to another DLIST instance and clear it:
>
> board->m_Track.PushFront( nullptr );
> DLIST<TRACK> tracks;
>
> for( TRACK* t = board->m_Track; t; t = t->Next() )
> tracks.PushBack( t );
>
> while( tracks.PopFront() );
>
> The snippet above results in similar symptoms as DLIST::PopFront()
> resets next/previous pointers for each element, effectively
> disconnecting them on both list, but the counter value is only
> maintained for the local 'tracks' variable (IMHO this is bad enough to
> replace DLIST with an STL container). The only difference is that the
> original list head still points to a track, which is saved in the file,
> so it is not exactly the problem we are trying to reproduce.

At some point, we should move everything over to stl or boost pointer
containers but I am reluctant this late in the v5 development cycle. I
don't know what new issues this will potentially introduce although I
will not completely rule it out.

>
> Any other ideas?
>

I can't think of any and without a more detailed stack trace or some
known way of duplicating the bug, it's going to difficult to fix this.
I do not want to release rc2 given the severity of this bug. I will try
my best tomorrow to see if I can trigger this bug and get a better back
trace.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Michael, are you able to reproduce the described issue at will? If so, would you list the steps?

Revision history for this message
Novak Tamas (novak-7) wrote :

It happened to me again today! I have MS Visual Studio and Qt with MinGW and gdb installed on my Windows box, so I tried to connect their debugger, with little success. I do use development tools, but never used a standalone debugger. I think if KiCad haven't been built on my machine, necessary debug infos are not available. I must take a deep breath and build KiCad from source.

Still I attach the .kicad-pcb and its backup file (strange, empty version is bigger than previous "last good" version...although I think it's of a little help.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Novak,

I am adding extra debug information to report the problem as soon as it appears, even in release builds. The most valuable information is to find out how to trigger the bug, and with the current code noone will find out the list is corrupted until 'save' command is issued.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

I have added extra asserts, but they are active only in the debug builds. I may add dumping stacktraces whenever an anomaly in a DLIST object is detected, but most likely it will require symbol visibility setting change (-fvisibility and -fvisibility-inlines-hidden) that may result in larger binary files. I am not sure this is the right thing to do.

It would be highly appreciated if you could build a debug version, so you could observe the asserts. If not, I can prepare a debug build for you and upload it somewhere. I may add an extra patch that continuously checks the tracks list coherency, so any issues will be detected instantly.

By the way, do you use any other tools modifying the tracks? E.g. clean-up tracks or global delete? Are you able to recall the actions you have performed before the crash?

Revision history for this message
Novak Tamas (novak-7) wrote :

@Orson, how to get that "extra debug" version? I may download from somewhere, or you can send me with https://wetransfer.com/ --> <email address hidden>

I have more development tools installed (SystemWorkbench STM32 in Eclipse, Qt Creator, Keil uVision, and MS VisualStudio), and I fear of screw** up something in setups if try to build KiCad from source.

My memories are somewhat fresh on the "incident": I was over a couple hours of pcbnew work, final trimming, dragged tracks and vias with D. Suddenly when I was moving mouse over a corner point of two tracks, pushing D, machine seemed stopping responding. I was not sure what was happening, so I pushed Esc and D a couple times. Mouse cursor kept moving, but dragging track didn't get started. Then I clicked Save on top toolbar, and crash Windows messagebox appeared at that point.

I tried to connect gdb (I have Qt Creator installed and gdb is its debug), but somehow VisualStudio started instead (I don't know what I was doing exactly). It generated only files unusable like:

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 14
VisualStudioVersion = 14.0.25420.1
MinimumVisualStudioVersion = 10.0.40219.1
Project("{911E67C6-3D85-4FCE-B560-20A9C3E3FF48}") = "kicad", "..\..\Program Files\KiCad-10202\bin\kicad.exe", "{7A3AE587-51CF-4C49-B26A-D0BA951F356C}"
 ProjectSection(DebuggerProjectSystem) = preProject
  PortSupplier = 00000000-0000-0000-0000-000000000000
  Executable = C:\Program Files\KiCad-10202\bin\kicad.exe
  RemoteMachine = BENTI
  StartingDirectory = C:\Program Files\KiCad-10202\bin
  Environment = Default
  LaunchingEngine = 00000000-0000-0000-0000-000000000000
  UseLegacyDebugEngines = No
  LaunchSQLEngine = No
  AttachLaunchAction = No
 EndProjectSection
EndProject
Global
 GlobalSection(SolutionConfigurationPlatforms) = preSolution
  Release|x64 = Release|x64
 EndGlobalSection
 GlobalSection(ProjectConfigurationPlatforms) = postSolution
  {7A3AE587-51CF-4C49-B26A-D0BA951F356C}.Release|x64.ActiveCfg = Release|x64
 EndGlobalSection
 GlobalSection(SolutionProperties) = preSolution
  HideSolutionNode = FALSE
 EndGlobalSection
EndGlobal

Revision history for this message
Novak Tamas (novak-7) wrote :

New observations..I don't know if it may help. See attached video (sorry for 4k fullscreen..didn't dare to change settings).
- Nr of track=0 and vias=0 on lower status bar
- tracks can not be selected while footprints can.
- Highlight net still highlight "semi-existing" tracks
- at this point (still before crash!!) .kicad:pcb-bak is 837kB, but .kicad_pcb is 637kB (tracks missing), so previous save saved invalid data without crashing!!
- UI seems to be responsive. Save button is disabled now, as previous "fake save" was successful.

As I can not Drag tracks or via, I move a footprint (with M). Save button become enabled. Click Save again. Saving is successful, without crash(!). Now both kicad_pcb and kicad_pcb-bak is 637kB ("empty size") so full data loss!. I still can keep on working...without tracks of course.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Thank you for the video. This is consistent with what I have managed to reproduce by breaking DLIST in the way described in #14, so I take it as a confirmation of DLIST corruption. Now we need to find out how/when does that happen.

I have uploaded an extra debug pcbnew build [1]. The two files inside the archive need to replace existing ones in a KiCad installation, you may want to backup the original ones. The debug version will show asserts when the list of tracks becomes corrupted, and additionally it verifies its integrity on virtually any operation. If you find it unusably slow, I will disable the latter - I think the asserts may suffice.

I appreciate your help with debugging. It is very hard to fix a problem that cannot be observed on a local machine.

1. https://orson.net.pl/pub/pcbnew_debug.zip

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

BTW. do you recall if at any point during the editing session you have used clean-up tracks/global deletion/append a board or another tool that modifies multiple tracks at once?

Revision history for this message
Novak Tamas (novak-7) wrote :

#23: In the last failed session I definitely did not use Clean-up tracks, Global deletion, or Append board. I opened an board already complete, started to widen the tracks and make "trimming": small dragging tracks, sometimes re-route connections to a more tidy different track. It took about 2 hours to crash it.
I'm just downloading your fat debug code...coming back later.

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

On 08/05/18 10:25, Novak Tamas wrote:
> #23: In the last failed session I definitely did not use Clean-up tracks, Global deletion, or Append board. I opened an board already complete, started to widen the tracks and make "trimming": small dragging tracks, sometimes re-route connections to a more tidy different track. It took about 2 hours to crash it.
> I'm just downloading your fat debug code...coming back later.
>

Did you switch between legacy/modern toolkits?

Tom

Revision history for this message
Novak Tamas (novak-7) wrote :

No, I'm always on OpenGL (F11), never switch to legacy or Cairo.

Revision history for this message
Novak Tamas (novak-7) wrote :

I copied your two files to my C:\Program Files\KiCad-10202\bin.
KiCad launcher and eeschema starts normally, but launching pcbnew gives "pcbnew failed to load: failed to load kiface library" error 127.
I used my latest nightly r10202's bin subdir. Should I download another nightly to be replaced by debug .exe and .kiface?

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

I will try to resolve the problem on my side. In the worst case I will pack the whole build, but I wanted to avoid large archive.

Revision history for this message
jean-pierre charras (jp-charras) wrote :

Perhaps this lib: libkicad_3dsg.dll (loaded by pcbnew) is missing or incorrect.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

I have built KiCad without scripting support, so the debug build is linked against a different version of wxWidgets than is normally used for scripting builds. A new build is on the way.

Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :
Download full text (4.7 KiB)

I have not found a way to reproduce this bug other than dragging some tracks. I made a video that shows the phenomenon I am sometimes observing. But I never lost any tracks nor did kicad crash as long as I do not remove a ghost track. So usually I simply dismiss the clarify dialog and everything is fine.

About the video: The "clarify selection" dialog appeared after I pressed ESC to exit the routing tool. The assert dialog appeared after I pressed DEL.

After the assert dialog appeared I saved the board and this time the tracks were still there after I reopened it. The were fewer assert messages than before but none of the additional asserts were triggered.

I have a core dump with lost tracks but I don't think it's useful. I cannot not use with it with gdb on another machine.

This is the console output (commit 5c5b74b29):

OpenGL WARNING: Buffer performance warning: Buffer object 2 (bound to GL_ARRAY_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) is being copied/moved from VIDEO memory to DMA CACHED memory.
[Thread 0x7fffe73ef700 (LWP 23640) exited]

/data/src/kicad-source-mirror/common/view/view.cpp(374): assert "viewData->m_view == this" failed in Remove().
/data/src/kicad-source-mirror/common/dlist.cpp(175): assert "aElement && aElement->GetList() == this" failed in remove().
[New Thread 0x7fffe73ef700 (LWP 23659)]
[Thread 0x7fffe73ef700 (LWP 23659) exited]

Segfault after exiting:

Thread 1 "kicad" received signal SIGSEGV, Segmentation fault.
0x00000000000000a1 in ?? ()
(gdb)
(gdb) bt
#0 0x00000000000000a1 in ?? ()
#1 0x00007fffd5796337 in PICKED_ITEMS_LIST::ClearListAndDeleteItems (this=0x6a66910) at /data/src/kicad-source-mirror/common/undo_redo_container.cpp:135
#2 0x00007fffd50dc0a1 in PCB_SCREEN::ClearUndoORRedoList (this=0x408c530, aList=..., aItemCount=-1) at /data/src/kicad-source-mirror/pcbnew/undo_redo.cpp:614
#3 0x00007fffd54fe09f in BASE_SCREEN::ClearUndoRedoList (this=0x408c530) at /data/src/kicad-source-mirror/common/base_screen.cpp:409
#4 0x00007fffd5025bf5 in PCB_EDIT_FRAME::Clear_Pcb (this=0x27f28d0, aQuery=false) at /data/src/kicad-source-mirror/pcbnew/initpcb.cpp:52
#5 0x00007fffd506d1a8 in PCB_EDIT_FRAME::OnCloseWindow (this=0x27f28d0, Event=...) at /data/src/kicad-source-mirror/pcbnew/pcb_edit_frame.cpp:717
#6 0x00007ffff648736e in wxAppConsoleBase::CallEventHandler (this=0x822890, handler=0x27f28d0, functor=..., event=...) at ../src/common/appbase.cpp:623
#7 0x00007ffff660d4d7 in wxEvtHandler::ProcessEventIfMatchesId (entry=..., handler=<optimized out>, event=...) at ../src/common/event.cpp:1390
#8 0x00007ffff660d5cb in wxEventHashTable::HandleEvent (this=<optimized out>, event=..., self=self@entry=0x27f28d0) at ../src/common/event.cpp:996
#9 0x00007ffff660d97b in wxEvtHandler::TryHereOnly (this=0x27f28d0, event=...) at ../src/common/event.cpp:1587
#10 0x00007fffd575a2f4 in EDA_BASE_FRAME::ProcessEvent (this=0x27f28d0, aEvent=...) at /data/src/kicad-source-mirror/common/eda_base_frame.cpp:194
#11 0x00007ffff660d783 in wxEvtHandler::DoTryChain (this=<optimized out>, event=...) at ../src/common/event.cpp:1552
#12 0x00007ffff660da65 in wxEvtHandler::ProcessEvent (this=0x27f2d38, event=...) at ../src/common/e...

Read more...

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

@mgeselbracht Hi Michael, could you send me the PCB you're editing in your movie? Tom

Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :

Sure. It is an old eagle design. I used it to test the eagle import filter.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

@Novak,

It turns out that the simplest way is to replace the wxWidgets libraries [1]. I have tested it with one of the recent nightlies and it turned out to be ok. I have also included a script (kicad_debug.bat) that will run pcbnew under a debugger, so it should let you obtain a backtrace when things go wrong. When you run the script and pcbnew crashes, please type 'bt' into the console window and post the output here.
BTW. Is your first name Novak or Tamas? I do not want to sound rude by calling you with your surname.

1. https://orson.net.pl/pub/wx_dll.zip

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

@Michael,

You might have not lost the tracks this time, as recently the assert you triggered (dlist.cpp:175) has been changed to wxCHECK. The difference between wxASSERT and wxCHECK is that the latter will return from the function when the tested condition is not fulfilled, so it might have prevented DLIST corruption.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Stacktrace from a crash after dragging some of the tracks shown on Michael's video: https://pastebin.com/raw/h70VLiAY

We are slowly getting there..

Revision history for this message
Novak Tamas (novak-7) wrote :

@Orson Good for you:)) I installed newest nightly r10278, replaced dll's, exe, and .kiface with your stuff successfully. Pcbnew works well, I can't see significant speed drop. I'm struggling with it for 3 hours now, but can't invoke crashing. It may healed:))))

About my name: we have upside-down names in Hungary: Novak, my family name stands first, and Tamas (=Thomas) my first name is last. I think "Novak" is a common family name in Poland, Slovakia, and Hungary. (The only things confuses me is the name of tennis player Novak Djokovich. Really don't know which is his first name.)

Anyway, no hard feelings if you call me "Novak", but I prefer "Tommy":)

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Unfortunately, the bug is still there, but it is just circumvented. The DLIST corruption is prevented, but it is hard to assess other possible problems. I need to find out how to reliably trigger the assert visible on Michael's video and then fix the cause. With the debug build you should be able to see the assert in case it happens, and if so - please try to describe what have you done to trigger it.

Thanks for the explanation regarding the names. "Novak" sounded to me more like a surname, but all other names on Launchpad have first name in the beginning, so I was slightly confused;)

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

@Michael

Any chance you could make a movie that includes the status bar of pcbnew (so that we can see the coordinates where you placed the dragged tracks)?

Tom

Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :

I just tried and I was lucky. The assert dialog appeared after only a few drag operations.
The video shows the entire session. Again, the "Clarify selection" dialog appeared after I pressed 'ESC' and the assert dialog after I press 'DEL'.

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

Michael, thank you for the video, it is a valuable data point. You seem to be very lucky triggering the bug, could you tell me which revision do you use? I doubt there is any recent change that could affect the issue, but I want to replicate your setup as much as possible.

Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :

It was a build of commit 538ab0eb3

But I found a way to enforce the appearance of the clarify dialog. Both on Linux and Windows.
Under both OS Kicad crashes after pcbnew is closed after a ghost track is deleted.

The first drag of the +3V3 track as shown in the video does not combine the three segments. An additional drag will do this. Now if the combined segment is dragged the clarify dialog appears after pressing 'ESC'. So three drag operations are required.
It is also important that the router settings dialog is invoked after the route tool has been started. It does not matter if it is closed with 'ESC', 'Cancel' or 'Ok'.

This is not the only way to trigger this bug but at least it is reproducible.
The Windows version is a nightly build of commit 787ee62db.

Revision history for this message
Tomasz Wlostowski (twlostow) wrote :

@Michael

Many thanks for figuring this out, I've been trying for 2 hours and only managed to reproduce this issue once...

I hope this will help us squash this nasty bug!

Tom

Revision history for this message
Seth Hillbrand (sethh) wrote :

Orson and Michael, can you try the attached patch? This looks like my bug!

Revision history for this message
Maciej Suminski (orsonmmz) wrote :

@Michael, you are the man! Now we are finally able to reproduce the bug reliably.

@Seth, I think it is reasonable to prevent track removal while the router is active, but it seems to me the problem reported by Michael and Tamas has a different root cause.

Changed in kicad:
assignee: nobody → Maciej Suminski (orsonmmz)
status: Triaged → In Progress
Revision history for this message
Michael Geselbracht (mgeselbracht) wrote :

Well, it took me long enough to come up with something like this. I was starting to believe that someone put this slippery thing in just to annoy me.

@Seth
Your patch seems to address a commit that is only a few weeks old. This bug is much older.

Revision history for this message
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision dfcdfe91fa6434661d66efe9661edd3c02fce6d4
https://git.launchpad.net/kicad/patch/?id=dfcdfe91fa6434661d66efe9661edd3c02fce6d4

Changed in kicad:
status: In Progress → Fix Committed
Revision history for this message
Seth Hillbrand (sethh) wrote :

That makes sense. Nice work Orson!

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

Awesome work everyone! Hopefully we are can start working towards the stable release.

Revision history for this message
Novak Tamas (novak-7) wrote :

Orson, Wayne, do you think it is the same bug I reported originally? I've never seen this phantom segment phenomenon.

I'm using the fat-debug version Orson compiled for me for 8 days now, but it hasn't crashed lately. Anyhow I experienced the bug for abt 5 times in a year, so it's quite rare.

I'll file a new bugreport if it's happening to me again, as this report got closed.

Revision history for this message
Jeff Young (jeyjey) wrote :

@Orson, nice work!

@Tommy, keep an eye out for it. You can also re-open this bug report if you catch it crashing again.

Changed in kicad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.