crash on vrml export in Fedora

Bug #1838448 reported by Rene Poeschl on 2019-07-30
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
High
Seth Hillbrand

Bug Description

I made a simple board with two mounting pads and a solder wire pad and tried to export it as vrml. KiCad crashes without any message shown.

Attached the file that causes the crash for me.

---

Application: kicad
Version: 5.1.2-1.fc30, release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.65.3 OpenSSL/1.1.1c-fips zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.38.0
Platform: Linux 5.1.19-300.fc30.x86_64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8) GTK+ 3.24
    Boost: 1.69.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.64.0
    Compiler: GCC 9.0.1 with C++ ABI 1013

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=ON
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=ON
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=ON
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Revision history for this message
Rene Poeschl (poeschlr) wrote :
Revision history for this message
Michael Kavanagh (michaelkavanagh) wrote :

@Rene, what are the VRML export settings you're using? I cannot reproduce on macOS using 5.1.2 or master.

Changed in kicad:
status: New → Incomplete
tags: added: export pcbnew vrml
Revision history for this message
Rene Poeschl (poeschlr) wrote :

The only thing i changed from default is using mm instead of meter as the output unit. But meter also crashes so it is the same.

It must have something to do with the solder wire footprint. It seems any through hole footprint crashes my system. (I can easily export with only the mounting hole footprint or with a soic one. But the solder wire footprint and also a dip footprint do crash kicad on export.)

Revision history for this message
eelik (eelik) wrote :

It doesn't crash on Windows, either.

Application: Pcbnew
Version: (5.1.3-7-g8373e91a1)-1, release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.61.1 OpenSSL/1.1.1 (WinSSL) zlib/1.2.11 brotli/1.0.6 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) nghttp2/1.34.0
Platform: Windows 8 (build 9200), 64-bit edition, 64 bit, Little endian, wxMSW
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8)
    Boost: 1.68.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.61.1
    Compiler: GCC 8.2.0 with C++ ABI 1013

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=OFF
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=OFF
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=OFF
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Revision history for this message
Rene Poeschl (poeschlr) wrote :

Our build settings differ in WX_OVERLAY and PYTHON version. Could any of these two be responsible?

Otherwise it might be a dependency mismatch in the fedora packaging.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@Rene, I wouldn't think so as the vrml exported doesn't use python or wxOverlay but I could be wrong. I cannot get this to crash on Debian testing.

Application: Pcbnew
Version: (5.1.3-15-g090073cc3), debug build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.65.1 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.37.0 librtmp/2.3
Platform: Linux 4.19.0-5-amd64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8) GTK+ 3.24
    Boost: 1.62.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.65.1
    Compiler: Clang 7.0.1 with C++ ABI 1002

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=ON
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=ON
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=ON
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Revision history for this message
Nick Østergaard (nickoe) wrote :

Is this something you built yourself or is it from fedora official repos?

Revision history for this message
Rene Poeschl (poeschlr) wrote :

It is the official fedora package

output of dnf list kicad:
sudo dnf list kicad
Last metadata expiration check: 0:17:05 ago on Wed 31 Jul 2019 17:14:45 CEST.
Installed Packages
kicad.x86_64 1:5.1.2-1.fc30 @updates
Available Packages
kicad.i686 1:5.1.2-1.fc30 updates

Revision history for this message
Nick Østergaard (nickoe) wrote :

Can you provide a backtrace?

Revision history for this message
Ian McInerney (imcinerney) wrote :
Download full text (3.9 KiB)

I was just able to reproduce this on the Fedora release build for 5.1.2. I had to uncheck all the options, and it was set to meters. When I had the options checked it wouldn't crash. I tried briefly on the HEAD of the 5.1 branch, but it didn't crash. I was running that as a debug build though, so it could be this is release build specific.

It crashes on:
/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.

The relevant backtrace is:
#0 0x00007ffff6cafe75 in raise () at /lib64/libc.so.6
#1 0x00007ffff6c9a895 in abort () at /lib64/libc.so.6
#2 0x00007fffd6cd8a28 in std::__replacement_assert(char const*, int, char const*, char const*)
    (__file=__file@entry=0x7fffd7815558 "/usr/include/c++/9/bits/stl_vector.h", __line=__line@entry=1042, __function=__function@entry=0x7fffd78218d8 "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&;"..., __condition=__condition@entry=0x7fffd78155a8 "__builtin_expect(__n < this->size(), true)")
    at /usr/include/c++/9/x86_64-redhat-linux/bits/c++config.h:2510
#3 0x00007fffd6efbd94 in std::vector<int, std::allocator<int> >::operator[](unsigned long) (__n=0, this=0x7fffffff74d0) at /usr/src/debug/kicad-5.1.2-1.fc30.x86_64/pcbnew/exporters/export_vrml.cpp:1853
#4 0x00007fffd6efbd94 in create_vrml_shell(IFSG_TRANSFORM&, VRML_COLOR_INDEX, VRML_LAYER*, double, double) (PcbOutput=..., colorID=VRML_COLOR_TIN, layer=<optimized out>, top_z=<optimized out>, bottom_z=<optimized out>)
    at /usr/src/debug/kicad-5.1.2-1.fc30.x86_64/pcbnew/exporters/export_vrml.cpp:1833
#5 0x00007fffd6efc6e8 in write_layers(MODEL_VRML&, char const*, std::ofstream*, BOARD*) (aModel=
    ..., aFileName=0x555557eb5cb0 "/home/imcinerney/Documents/dev/kicad/testItems/BugSpecific/1838448/00_pr_invest/00_pr_invest.wrl", aOutputFile=aOutputFile@entry=0x0, aPcb=<optimized out>)
    at /usr/src/debug/kicad-5.1.2-1.fc30.x86_64/pcbnew/exporters/export_vrml.cpp:525
#6 0x00007fffd6f0260a in PCB_EDIT_FRAME::ExportVRML_File(wxString const&, double, bool, bool, bool, wxString const&, double, double)
    (this=<optimized out>, aFullFileName=..., aMMtoWRMLunit=<optimized out>, aExport3DFiles=<optimized out>, aUseRelativePaths=<optimized out>, aUsePlainPCB=<optimized out>, a3D_Subdir=..., aXRef=<optimized out>, aYRef=<optimized out>)
    at /usr/include/wx-3.0/wx/buffer.h:157
#7 0x00007fffd6d611f8 in PCB_EDIT_FRAME::OnExportVRML(wxCommandEvent&) () at /usr/src/debug/kicad-5.1.2-1.fc30.x86_64/pcbnew/dialogs/dialog_export_vrml.cpp:249
#8 0x00007ffff746df0b in wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) () at /lib64/libwx_baseu-3.0.so.0
#9 0x00007ffff746e013 in wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) () at /lib64/libwx_baseu-3.0.so.0...

Read more...

Revision history for this message
Rene Poeschl (poeschlr) wrote :

I hope i did everything correctly (never created a backtrace before)

Revision history for this message
Seth Hillbrand (sethh) wrote :

@SteveFalco- It looks like the fedora release is being built with debug symbols. These asserts should not be crashing releases. Other platforms had issues because they were being compiled with RelWithDebInfo instead of Release. This apparently enables the asserts.

We should still fix this. It appears that the VRML triangulation is failing to provide an indexPlane for some reason

Changed in kicad:
status: Incomplete → Confirmed
milestone: none → 5.1.4
importance: Undecided → Critical
importance: Critical → High
summary: - crash on vrml export
+ crash on vrml export in Fedora
Revision history for this message
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision ff28501d6d647a14423c15165db4295f360b0f53
https://git.launchpad.net/kicad/patch/?id=ff28501d6d647a14423c15165db4295f360b0f53

Changed in kicad:
status: Confirmed → Fix Committed
assignee: nobody → Seth Hillbrand (sethh)
Revision history for this message
Rene Poeschl (poeschlr) wrote :

Regarding debug stuff: I installed debug symbols (It is a separate package under fedora) after this crash appeared to be able to make the backtrace. Might this be the reason for it appearing to have debug enabled?

Revision history for this message
Ian McInerney (imcinerney) wrote :

@Seth, @SteveFalco - Fedora appears to enable building with the debug symbols by default (independent of the CMAKE type flag) by passing -g to the compiler in a system-level CFLAGS/CXXFLAGS (see here: https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/buildflags.md#individual-compiler-flags).

Revision history for this message
Seth Hillbrand (sethh) wrote :

@Rene- Got it. Perhaps I am mistaken then. My statement was based on the fact that this error is an assert and not a crash per se. When built for Release, CMake specifies NDEBUG=1, this makes the asserts into nop statements. When built for RelWithDebInfo, it doesn't set the NDEBUG flag and the asserts remain enabled.

Revision history for this message
Ian McInerney (imcinerney) wrote :

@Seth, the assert is being generated in the standard library not KiCad so the NDEBUG flag from KiCad's cmake won't affect it since it is controlled by different flags. What seems to be happening is that Fedora is enabling the standard library asserts by passing "-D_GLIBCXX_ASSERTIONS" to the build as one of the compile flags from the system. This enables the checking of array indices at runtime with assert statements inside the library itself.

We need the Fedora build system to not send the "-D_GLIBCXX_ASSERTIONS" flag to the compiler to disable these checks (or somehow remove it from the generated CXXFLAGS variable).

Revision history for this message
Seth Hillbrand (sethh) wrote :

@Ian- Are you sure that the asserts in the header files aren't controlled by the KiCad flags?

Revision history for this message
Seth Hillbrand (sethh) wrote :
Revision history for this message
Steven Falco (stevenfalco) wrote :

I'm not too familiar with this, but I do know that Fedora has recently been applying so-called "hardening" flags, in order to catch more run-time errors. The current flags are:

-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection

I could try adding:

%undefine _hardened_build

which should disable the hardening flags. Or I could experiment with overriding the %{optflags} macro, which should let me choose specific flags to include or omit. But I'd need some guidance on which flags we want.

But before I do that, I have to ask if we really want to override the hardening. I would have thought that the hardening was a desirable thing.

Please let me know how you'd like me to proceed.

Revision history for this message
Seth Hillbrand (sethh) wrote :

Hi Steven-

Thanks for looking into this. The flag that we'd like to omit is "-Wp,-D_GLIBCXX_ASSERTIONS" We shouldn't need to fully remove the hardening but this flag adds additional crashes (read: non-standard program exits) that do not exist on other platforms. These are useful for debugging but not for production systems.

Revision history for this message
Steven Falco (stevenfalco) wrote :

I'm able to turn off the "-Wp,-D_GLIBCXX_ASSERTIONS" by overriding the "__global_compiler_flags" macro that the Fedora build-system uses. I added the following line to the Fedora spec file:

%define __global_compiler_flags -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -grecord-gcc-switches %{_hardened_cflags} %{_annotated_cflags}

It is a bit ugly, because if Fedora ever changes the "__global_compiler_flags" macro, we will be out of sync with the rest of Fedora. Also, since that macro includes other macros, things could break badly. I'll try to keep an eye out for that.

I started the Fedora builds, but it will be a week or more before they become available, owing to the Fedora Karma requirements. If enough people test, the builds could pop out sooner, but historically that doesn't happen, so we will likely have to incur the 1 week Karma penalty.

I'm not sure if this change is also wanted for the nightly builds. Since those are debug builds, I don't think we want the change there, but please tell me if they should be changed too.

One thing bothers me - if the Fedora build had not crashed on @Rene, the underlying VRML bug would have gone unnoticed. That makes me wonder if we should really be adding "-Wp,-D_GLIBCXX_ASSERTIONS" to all the other builds (like Mac and Windows) rather than removing it from Fedora. Wasn't discovering the VRML bug a good thing?

Revision history for this message
Florian Weimer (fweimer) wrote :

I agree with Steven.

Seth, the cause of these crashes are out-of-bounds std::vector accesses. C programmers would call these “buffer overflows”. I do not think it is a good idea to remove these checks. Unlike C, C++ standard containers can actually perform such checks and reduce exposure of security vulnerabilities.

Revision history for this message
Ian McInerney (imcinerney) wrote :

The main problem with these assertions is they are turning problems that were non-fatal into fatal crashes (this is a good example, it didn't crash on every other platform that was tried, other than Fedora where the assertions were enabled). In release builds that is not usually desired.

I can see the utility of this for nightly and debug builds, so it might be worth adding this as an explicit flag we pass for debug release types and have the nightly builds add it, but I still don't think it belongs in the release builds.

Revision history for this message
Jeff Young (jeyjey) wrote :

Removing the checks will indeed increase the vulnerability surface area. There are existing methods "out in the wild" of exploiting buffer overflows to execute malicious code. The checks therefore need to be run even on release builds.

Which makes we wonder why we aren't seeing this on other platforms. Do we have the std::container checks turned off or something?

Revision history for this message
Ian McInerney (imcinerney) wrote :

@Jeff, these checks are off by default when compiled. They have to be explicitly enabled by passing the _GLIBCXX_ASSERTIONS flag to the preprocessor. So far only Fedora seems to pass that flag by default so it is the only build with these runtime checks.

Revision history for this message
Seth Hillbrand (sethh) wrote :

I would advocate for adding the check to all systems for debug builds and removing it from release builds like we do for all other assertions.

Revision history for this message
Seth Hillbrand (sethh) wrote :

Sent the previous message too quickly.

I have more concern about people losing work due to a crash than a malicious hack embedded in KiCad data files.

Revision history for this message
jean-pierre charras (jp-charras) wrote :

I share the Seth's opinion.

@Seth,
Are you sure about your fix in "static void create_vrml_plane()":
It test idxSide.empty() and exit the function if true.
But unless I missed something, idxSide is a not used variable, and idxSide.empty() is always true.
So create_vrml_plane() is never executed.

Revision history for this message
Jeff Young (jeyjey) wrote :

Just to be the devil's advocate, we felt the same way about PDF in the early days. When it caught up to us it was ugly (we lost a full 18-month development cycle and a lot of customer trust).

Revision history for this message
Seth Hillbrand (sethh) wrote :

@Jeff- How about we start with enabling additional checks for the debug builds? I think we are all in agreement there. This gets us additional testing surface against the access errors.

@jp- Thanks. copy-paste error there. I've fixed it in master and 5.1

Revision history for this message
Jeff Young (jeyjey) wrote :

@Seth, definitely.

Just to be clear, I'm a -0 on not enabling the checks on release (ie: I think it's the wrong decision, but I won't stand in the way of a consensus).

Revision history for this message
Seth Hillbrand (sethh) wrote :

@Jeff, good to know, thanks. I wonder if there's a way we could re-direct the assertions to the crash reporter? If there's a useful bit of information to be gleaned from the assertion during normal use, I'm less opposed. It's just the crash+disappear with no message that I think is not helpful in production systems.

Revision history for this message
Andrew Lutsenko (qu1ck) wrote :

I think we absolutely should enable all possible sanitizations and checks in all builds.

It's just pure luck that the bug is in vrml export and it's no big deal to ignore it. What if next time it will corrupt the project files? It's much better to "catch" that in a crash than silently ignore it.

KiCad doesn't have nearly enough automated test coverage so we rely heavily on user reports and there are not a lot of users running debug builds. I'm a dev and I only run debug build when I am triaging an issue because it's too slow for everyday use.

It's one thing to suppress gui library asserts because they are annoying and confusing in release build. But ignoring memory access violations can go very wrong very fast.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

The gcc assertions should be enabled in debug builds. If they were, this bug would have been found a long time ago. I am less convinced of their usefulness in release builds. Assertions are added as a form of defensive programming to find bugs before release not as a catch all to find bugs in release builds. I agree with Seth that users are going to be far more upset about loosing work due to an assertion crashing KiCad rather then a malformed vrml output file. If Fedora wants to do this, that is their prerogative but I don't want to force this on other platforms release builds.

Revision history for this message
Seth Hillbrand (sethh) wrote :

OK.

I'm working on adding the following flags to the debug builds:

-D_GLIBCXX_SANITIZE_VECTOR -D_GLIBCXX_DEBUG -fsanitize=address

The GLIBCXX_DEBUG define gets the assertions as well as other debugging info for libstdc++. Compiling this way exposes quite a few shortcuts, so will take a while to sort out.

Revision history for this message
Ian McInerney (imcinerney) wrote :

Seth, I would suggest making the sanitizer flags these to get nicer traces for any errors that appear (these are the flags I am currently running with):

-fsanitize=address -fno-optimize-sibling-calls -fsanitize-address-use-after-scope -fno-omit-frame-pointer

Also, if we wanted to we could add a -U_GLIBCXX_ASSERTIONS to the end of the CXX flags in release mode to get around any premade compile flags...

Revision history for this message
Steven Falco (stevenfalco) wrote :

I got curious, so I just checked the Fedora debug builds on Copr. They already have the _GLIBCXX_ASSERTIONS flag enabled, which is not surprising, given the similar infrastructure between Copr and the official release builder.

Then I went back to see when the change was introduced. It looks like that flag was added in Fedora 28 around January 2018.

The good news is that there must be very few of these bugs in the KiCAD code, or we would have seen more crashes by now. Either that or there are just not many users of KiCAD on Fedora. :-)

Revision history for this message
Seth Hillbrand (sethh) wrote :

Thanks Ian. I'm going to make these an optional flag for now. Trying to run KiCad at the moment with GLIBCXX_DEBUG on is nigh-on impossible, so we're going to need to have some serious debugging before this gets into mainstream builds.

In more annoying news, it appears that -fsanitize=address exposes a gold linker bug[1] when used with our raw asm inlines. We can get around this by using the naked attribute in an explicit function call but that's a large testing project as we need to verify the builds for all platforms (and this is not automated at the moment).

I'm going to encourage devs to use the flags when they can. They do substantially slow KiCad, so it might not be an all-the-time thing. However, at the moment, even clicking a menu with all debug options on throws an assert, so we've got a few issues to sort here.

[1] https://www.gitmemory.com/issue/rust-lang/rust/59652/482237905

Changed in kicad:
status: Fix Committed → Fix Released
Revision history for this message
Jonathan Wakely (jwakely) wrote :

-D_GLIBCXX_DEBUG enables the full Debug Mode for GCC's standard library:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode.html

That enables a large number of checks, but alters the class layout for all standard library containers and also changes the algorithmic complexity of some standard library algorithms, e.g. an O(log N) algorithm might become O(N) or worse. It's a very heavyweight set of checks, and affects the ABI of the compiled code.

While testing sometimes with _GLIBCXX_DEBUG defined may be valuable, it's probably not suitable for the normal debug builds and for CI testing. For a start, it would mean that a debug build is ABI-incompatible with a release build, due to the class layout changes.

The _GLIBCXX_ASSERTIONS macro defines a subset of the _GLIBCXX_DEBUG checks, with no impact on class layout and minimal impact on runtime (and no changes in algorithmic complexity). It is intended to be suitable for both debug builds and release builds. If you don't want to use it for your release builds that's fine, but it's probably the macro you want to define for regaulr debug builds and for CI testing.

N.B. defining _GLIBCXX_ASSERTIONS for other compilers is pointless, it's a macro belonging to GCC's standard library (libstdc++) and will be completely ignored by Clang's libc++ and Visual Studio's standard library.

Revision history for this message
Seth Hillbrand (sethh) wrote :

@Jonathan-

Thank you for the background. I didn't know you followed our bug tracker but it's always nice to get the definitive answer.

Revision history for this message
Jonathan Wakely (jwakely) wrote :

I don't usually, but this issue was pointed out on the Fedora dev list, and since I'm responsible for adding the _GLIBCXX_ASSERTIONS checks to libstdc++ I thought I should comment.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers