Segfault closing pcbnew

Bug #1847532 reported by Seth Hillbrand on 2019-10-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KiCad
Critical
Ian McInerney

Bug Description

Steps to reproduce:

1) Open pcbnew, draw an outline on Edge.Cuts using polygon tool
2) Close pcbnew, don't save

This segfaults with a double-free but no useful backtrace. Valgrind shows invalid memory access

Application: KiCad
Version: (5.99.0-215-g89e9857f3), release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Platform: Linux 4.19.0-6-amd64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    Build date: Oct 3 2019 09:12:49
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8) GTK+ 3.24
    Boost: 1.67.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.64.0
    Compiler: GCC 8.3.0 with C++ ABI 1013

Build settings:
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=ON
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=ON
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Seth Hillbrand (sethh) wrote :
Seth Hillbrand (sethh) wrote :

Does not affect 5.1

Ian McInerney (imcinerney) wrote :

I can only recreate this when the tool has not been cancelled. If I cancel the tool first, then no crash happens.

Changed in kicad:
status: New → Triaged
Changed in kicad:
assignee: nobody → Ian McInerney (imcinerney)
status: Triaged → In Progress
Ian McInerney (imcinerney) wrote :
Download full text (4.8 KiB)

Ok, ASAN managed to give a better stack trace and analysis. What seems to be happening is that wxWidgets is trying to destroy a child object, but that child object lives in the stack frame of a tool routine instead of on the heap. The easiest way to fix this is to simply ensure all tools are cancelled before we destroy the frame, that way all stack items will have been destroyed on their own. The attached patch does that for all the tool-based windows I could think of (basically if any window uses a tool manager, we should ensure the stack is empty before destroying the frame).

=================================================================
==29942==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7fbe3f2c15d0 in thread T0
    #0 0x40965f in operator delete(void*) (/master/build/debug/pcbnew/pcbnew+0x40965f)
    #1 0x7fbe3ce3a421 in STATUS_TEXT_POPUP::~STATUS_TEXT_POPUP() /master/include/status_popup.h:83:34
    #2 0x7fbe459cc39b in wxWindowBase::Destroy() ../src/common/wincmn.cpp:576:12
    #3 0x7fbe459cc3d3 in wxWindowBase::DestroyChildren() ../src/common/wincmn.cpp:608:37
    #4 0x7fbe457fb146 in wxWindow::~wxWindow() ../src/gtk/window.cpp:2519:20
    #5 0x7fbe3cccfbfc in EDA_BASE_FRAME::~EDA_BASE_FRAME() /master/common/eda_base_frame.cpp:161:1
    #6 0x7fbe3cdc9b77 in KIWAY_PLAYER::~KIWAY_PLAYER() /master/common/kiway_player.cpp:66:40
    #7 0x7fbe3cd317af in EDA_DRAW_FRAME::~EDA_DRAW_FRAME() /master/common/eda_draw_frame.cpp:191:1
    #8 0x7fbe3c505318 in PCB_BASE_FRAME::~PCB_BASE_FRAME() /master/pcbnew/pcb_base_frame.cpp:109:1
    #9 0x7fbe3b545316 in PCB_BASE_EDIT_FRAME::~PCB_BASE_EDIT_FRAME() /master/pcbnew/pcb_base_edit_frame.cpp:55:1
    #10 0x7fbe3b565542 in PCB_EDIT_FRAME::~PCB_EDIT_FRAME() /master/pcbnew/pcb_edit_frame.cpp:342:1
    #11 0x7fbe3b5655d8 in PCB_EDIT_FRAME::~PCB_EDIT_FRAME() /master/pcbnew/pcb_edit_frame.cpp:338:1
    #12 0x7fbe452939c6 in wxAppConsoleBase::DeletePendingObjects() ../src/common/appbase.cpp:591:16
    #13 0x7fbe45293a48 in wxAppConsoleBase::ProcessIdle() ../src/common/appbase.cpp:397:25
    #14 0x7fbe458944a7 in wxAppBase::ProcessIdle() ../src/common/appcmn.cpp:366:50
    #15 0x7fbe457be094 in wxApp::DoIdle() ../src/gtk/app.cpp:159:31
    #16 0x7fbe457be1b6 ../src/gtk/app.cpp:107:28
    #17 0x7fbe437d07da (/lib64/libglib-2.0.so.0+0x4c7da)
    #18 0x7fbe437d3edc in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4fedc)
    #19 0x7fbe437d426f (/lib64/libglib-2.0.so.0+0x5026f)
    #20 0x7fbe437d45a2 in g_main_loop_run (/lib64/libglib-2.0.so.0+0x505a2)
    #21 0x7fbe43e12b3c in gtk_main (/lib64/libgtk-3.so.0+0x24db3c)
    #22 0x7fbe457ddbc4 in wxGUIEventLoop::DoRun() ../src/gtk/evtloop.cpp:65:17
    #23 0x7fbe452d6170 in wxEventLoopBase::Run() ../src/common/evtloopcmn.cpp:78:17
    #24 0x7fbe45296c69 in wxAppConsoleBase::MainLoop() ../src/common/appbase.cpp:334:40
    #25 0x417f49 in APP_SINGLE_TOP::OnRun() /master/common/single_top.cpp:197:26
    #26 0x7fbe4532aabb in wxEntry(int&, wchar_t**) ../src/common/init.cpp:506:31
    #27 0x40be4e in main /master/common/single_top.cpp:271:1
    #28 0x7fbe442fcf32 in __libc_start_main (/lib64/libc.so.6+0x23f32)
    #29 0x2...

Read more...

Seth Hillbrand (sethh) wrote :

@Ian- If that is the root issue, could we change the variable ordering to ensure m_toolmanager gets the dtor before the window?

Ian McInerney (imcinerney) wrote :

@Seth, I had first tried to put the tool stack clearance into the dtor for the tool manager, but the problem is it is called explicitly from the EDA_BASE_FRAME dtor. What then happened was if a tool was active (such as the point editor), it would crash because the tool was trying to access the board in Pcbnew, but that board is destroyed in one of the subclasses of EDA_BASE_FRAME, so it was destroyed first. That is why I had to go the more drastic route of putting the stack emptying at the beginning of the frame dtor.

Ian McInerney (imcinerney) wrote :

So, I was thinking more about this and this actually is a result of some more insidious behavior in the tool framework/destructor methods.

The underlying issue is one of variable lifetime inside the tool stack. The cause of the crash was that we stack allocated an object that was passed to wxWidgets as a child element. Normally this will not be a problem because functions return and it gets destroyed then. But with our tool stack, the function never returned because it was waiting for a tool event and was not active. So essentially when the tool manager was deleted all its variables just hung around and no destructors were run and no stacks were cleared. This caused the crash since the object was not destroyed before we destroyed the window, so wxWidgets tried to destroy it. Since it was stack allocated, the call to delete was invalid and it crashed.

The fix I put together for this is to ensure that all active tools that have registered themselves on the tool stack of the frame get cancelled before the window deletion process really begins. This way the tools will exit gracefully and any stack allocated variables can be destroyed properly (this also will prevent memory leaks from these tools if they were active). I wanted to have the tool manager destructor do the stack clearance, but there is a lifetime issue with that as well. Since it is explicitly called in our highest-parent class (EDA_BASE_FRAME) all the destructors of the child classes have already run, which delete things like the board, schematic objects, etc. So any tools that are interacting with those objects would crash when we send the cancel event because they are trying to interact with already deleted objects. This meant I needed to put the stack clearance before we actually destroy any objects in the lowest-child class.

The other half of this issue that I just realized is our selection tool framework that is always running. Currently, there is no way to stop it, and so anything that is allocated on the stack before the while loop begins will never be destroyed. This shouldn't pose a problem currently because a quick glance into the tool main functions shows that they don't allocate any problematic items before they begin looping. We would get a problem though if we were to add any graphics objects (such as a status text popup like in the zone tool) to those tools in the same way as the zone tool. Then we would have the same variable lifetime issue as here.

The only good way to fix that would be to introduce a new tool event to end a tool (such as TA_END). This event could be sent in the destructor after the tool stack is cleared to ensure that the main tool loops are also stopped. Then we would have to include a break in the loop to handle that event (and modify the current assert( 0 ) in the function).

I don't know how likely we are to add anything more to the selection tool main function, so I don't know if that modification is going to be worth doing, but it is hard to predict what people will want to add to the functions in the future.

I think the main thing we want to avoid is someone else having to do all this thinking/research over again next time it happens. To that aim, whatever we do needs to be visible. The TA_END event certainly meets that criteria.

Wayne Stambaugh (stambaughw) wrote :

I would prefer a more robust solution. The fix should take into account that someone might add graphics objects to the selection tool so it should be fixed to exit gracefully. Otherwise, we will be right back here again. I'm assuming that fixing the issue this way would make your previous patch unnecessary. We could temporarily merge your patch to fix the immediate issue if you cannot get to a more robust fix in the near future.

Seth Hillbrand (sethh) wrote :

@Ian- I agree with your analysis. My concern here is that we are building a fragile structure to repair the underlying issue of ownership. If we can't unroll the destructor safely without doing a preparation call, then I'd like to think about where m_toolmanager belongs in the inheritance. I don't know of a specific requirement for putting it in EDA_BASE_FRAME (although there may be one).

Jeff Young (jeyjey) wrote :

@Seth, while we might be able to fix it by moving things in the inheritance structure, that doesn't make it visible, and we'll still end up right back here again at some future date.

Having a "shutdown" call for the tool manager makes sense, especially in light of the fact that it's spinning up virtual event loops. We already have a "shutdown" call for the GAL canvas, so it's not foreign to our control flow.

Seth Hillbrand (sethh) wrote :

@Jeff- I like the shutdown call. I'd just prefer that it exist in the dtor.

I agree that we already have the required shutdown call structure in the GAL canvas but that's bit us as well. Even now, we don't have a clean canvas shutdown that saves some state variables (e.g. canvas type). The timing of when and where the shutdown call happens makes this a bit touchy.

Ian McInerney (imcinerney) wrote :

@Seth, which dtor do you think the tool shutdown should go into? Right now I put the shutdown call into the child class destructor as the first thing run. I think the reason the tool manager is inside the EDA_BASE_FRAME is because that placement removed duplication of code in the child classes for each window. If we want to, we could shift it into the child classes, we would just have the duplication then. I do think I prefer a shutdown call in the child class destructor though (such as m_toolManager->StopTools()) that makes it obvious we are stopping all tools.

Looking through it some more, we have another option for terminating the tool loop: passing a null event when the tool is reactivated. This will automatically break it out of the while loop that waits for the events, and doesn't require the developer to explicitly check for an event type to end the tool. It requires a bit more implementation work in the tool manager itself, but it is probably cleaner (we could implement it to work on a single tool and on all active tools as well). This will still run any code after the end of the loop though, so the tools should be stopped at as the first step in the destructor.

Seth Hillbrand (sethh) wrote :

@Ian- I'd prefer it in the tool manager dtor. In other words, I think that the safe implementation allows the tool manager to be destroyed without requiring a preparation call.

One option might be placing the duplicated code in a base toolmanager class and abstracting the parts that reference board items or child windows in the frames that generate those items.

What I'd like to avoid is the pointer loop.

Ian McInerney (imcinerney) wrote :

@Seth, the main issue is not with the tool manager itself but with the tools that are running. I agree that the pointer loop is not ideal, and I was actually thinking that maybe turning the tool manager into something that the child frames inherit from instead of instantiate could be a fix for that (that is how wxWidgets handles their event handler routines). Then the child frames inherit from both the EDA_BASE_FRAME chain and the TOOL_MANAGER class. We then document that the tool manager class is for inheritance only and is never to be directly instantiated. This should allow us to just use "this" as the frame pointer inside the tool manager routines.

Removing the pointer loop between the frame and the tool manager doesn't solve the issue with a tool referencing an object from the frame. For instance, look at the pcbnew drawing tool's DrawDimension action. After the while loop runs, the tool calls into the view controls and also into the frame (and passes the board object). This means that we will need to be careful when stopping this tool, because if the part after the loop is run after we delete the board, there will be an issue.

An alternative would be to introduce the TA_END/STOP action and have the loop immediately check for it and return from inside the loop so that no other code can be run. I don't know if that will fix everything though, because what if the developer instantiates an object on the stack whose destructor references a frame element? I think that would still get us into trouble. While I agree it would be best if the tool manager destructor could shutdown the tools, I don't know if that is really feasible given that a tool can do basically anything (and that we don't know what kind of tools we might want in the future). It seems the safest option is to shutdown all the tools when we first begin the destuction of the frames (similar to how an OS stops all user programs before it stops the kernel). Yes it will require an explicit call in each frame destructor, but then I think we can guarantee we don't get into possible lifetime issues.

Jeff Young (jeyjey) wrote :

Each frame has explicit statements to start up its tools, so it seems logical to me that the frame d'tor would include statements to shut them down.

Seth Hillbrand (sethh) wrote :

OK. Maybe I'm tilting at windmills here with matching destructor idea. @Ian if you'd like to go ahead with the patch you were discussing in #7, it sounds like the best approach discussed so far.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments