Assertion failed in flattenBVHTree() while using 3D viewer with raytracing

Bug #1762377 reported by Eugeniy Meshcheryakov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
Undecided
Mario Luzeiro

Bug Description

Attempting to use 3D viewer with raytracing engine results in the following error:

    cbvh_pbrt.cpp(1109): assert "node->nPrimitives < 65536" failed in flattenBVHTree().

After pressing "Continue" button (if dialog stays responsible) the board is drawn without any 3d models, even if they are enabled preferences. Removing most of the footprints from the board gets rid of the assertion, but the models are still invisible.

This bug is reproducible for example with "pic_programmer" demo. To reproduce:
 - load the project "pic_programmer"
 - open pcbnew
 - open 3D viewer (works fine with default settings)
 - push raytracer button in the toolbar (assertion window is shown, the board is rendered without models after pressing "Continue")

This is with kicad from Debian experimental. There was no assertion with some daily builds from PPA, but the models were not visible anyway.

Application: kicad
Version: 5.0.0-rc2-dev-unknown+dfsg1+20180318-2, release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.58.0 OpenSSL/1.0.2o zlib/1.2.8 libidn2/2.0.4 libpsl/0.20.1 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.31.0 librtmp/2.3
Platform: Linux 4.15.0-2-amd64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8) GTK+ 3.22
    Boost: 1.62.0
    Curl: 7.58.0
    Compiler: GCC 7.3.0 with C++ ABI 1011
Build settings:
    USE_WX_GRAPHICS_CONTEXT=ON
    USE_WX_OVERLAY=OFF
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_ACTION_MENU=OFF
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_SPICE=OFF

ASSERT INFO:
/build/kicad-ylqbzj/kicad-5.0.0~rc1+dfsg1+20180318/3d-viewer/3d_rendering/3d_render_raytracing/accelerators/cbvh_pbrt.cpp(1109): assert "node->nPrimitives < 65536" failed in flattenBVHTree().

BACKTRACE:
[1] wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&)
[2] wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*)
[3] wxEvtHandler::TryHereOnly(wxEvent&)
[4] wxEvtHandler::ProcessEventLocally(wxEvent&)
[5] wxEvtHandler::ProcessEvent(wxEvent&)
[6] wxEvtHandler::ProcessPendingEvents()
[7] wxAppConsoleBase::ProcessPendingEvents()
[8] wxApp::DoIdle()
[9] g_main_context_dispatch
[10] g_main_loop_run
[11] gtk_main
[12] wxGUIEventLoop::DoRun()
[13] wxEventLoopBase::Run()
[14] wxAppConsoleBase::MainLoop()
[15] wxEntry(int&, wchar_t**)
[16] __libc_start_main
[17] _start

Tags: 3d-viewer
Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

I can give a look on this once I get some time..
but still stranger, I remember to tested with that demo board.
also,
nPrimitives is a uint16_t
so this assert may not make sense and should always be true: "nPrimitives < 65536"

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

BVHBuildNode::nPrimitives is defined as int in cbvh_pbrt.cpp:127 so the assertion can be false.

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

ah ok thanks I was looking for the wrong declaration

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

I cannot reproduce, it sound to be some memory corruption issue that may be triggered with your gcc version?
I built mine with
    Boost: 1.58.0
    Compiler: GCC 5.4.0 with C++ ABI 1009

Do you have a simpler board where it crashs?

Related with missing 3D shapes, it looks it is related with 3D libraries, some models are missing on the old/new 3D libraries and this demoboard was not updated yet.

Changed in kicad:
assignee: nobody → Mario Luzeiro (mrluzeiro)
Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

Before using kicad from Debian unstable I was also using js-reynaud/ppa-kicad for 'bionic'. The assertion was not happening with that version, but the models still were absent when using raytracer (and everything was fine with OpenGL). I cannot test that PPA now because it became uninstallable due to dependencies.

The raytracer worked fine in the past with version from PPA. I've successfully used it with my own board in the end of March. The kicad shapshot was probably from the last year, I have no exact version, but the schematic sheet from that time contains "kicad 5.0-dev-unknown-d043ef562ubuntu18.04.1".

I'm attaching simplified version of pic_programmer that still triggers assertion on my machine. Assertion does not happen when any component is removed, but the board is rendered without any models.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

Here is comparison of rendering the board I've posted without 2 extra copies of R4 (and no assertions failures on my machine) with OpenGL on the left and Raytracing on the right. The raytracer was used with default options, except the "Add Floor" option was turned off because of bug #1762379.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

I've tried to bisect the assertion failure and here is the result:

8f42e128dae844bb1f74763025998a515295a897 is the first bad commit
commit 8f42e128dae844bb1f74763025998a515295a897
Author: Bernhard Stegmaier <email address hidden>
Date: Sun Mar 4 13:17:51 2018 +0100

    Fix missing C++-11 compiler flags when building on macOS with non-Xcode clang

:100644 100644 ea998ee314f02988549a4070c7d1b541515d3875 afc11a7fe9f5e4e718d5b248da401c9ed187a2a3 M CMakeLists.txt

And I'm not using macOS...

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

https://github.com/KiCad/kicad-source-mirror/commit/8f42e128dae844bb1f74763025998a515295a897
Looks not related.

Could it be instead the previous commit? It is 3D source code related:
https://github.com/KiCad/kicad-source-mirror/commit/a7860787472c121cb72a4bec90fe8398229e09cd
but it is not related with that assert (the Assert happens before this)

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

The crash is triggered by the change in CMakeLists.txt. It can be reproduced on my machine by applying the attached patch to the previous revision a7860787472c121cb72a4bec90fe8398229e09cd . So the cmake_policy addition for macOS is not important, but moving cmake_minimum_required_version() and project() calls is.

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

oks its a good finding, but should be still some issue with sourcecode.
But maybe Bernhard could help us understand what that changes in makefile are triggering...

Hi Bernhard could you come here have a look on this discussion?
What could that changes be interfering?

Revision history for this message
Bernhard Stegmaier (stegmaier) wrote :

Strange. According to cmake docs (https://cmake.org/cmake/help/latest/command/cmake_minimum_required.html) cmake_minimum_required should be the very first statement directly followed by project:
<<<
Call the cmake_minimum_required() command at the beginning of the top-level CMakeLists.txt file even before calling the project() command. It is important to establish version and policy settings before invoking other commands whose behavior they may affect. See also policy CMP0000.
>>>

Could you make a diff of cmake generated stuff with/without the patch (only run the cmake command with/without patch and compare/diff generated build folder)?
It might show what is done differently...

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

I'm configuring the build with the following command:

    cmake .. -DCMAKE_INSTALL_PREFIX:PATH=/opt/kicad -DKICAD_SPICE=OFF

The only difference that I can see is that before the patch was applied there were two additional arguments passed to the compiler "-O3 -DNDEBUG". I'guess NDEBUG just disables assertions, so the problem probably still was there, but it was ignored.

After moving those lines to the beginning of the file those two additional flags don't appear in the build log anymore (strangely, there is no -g either). So this comment in CMakeLists.txt:35 is not entirely true:

    # Default to CMAKE_BUILD_TYPE = Release unless overridden on command line

I guess I'll need to pass CMAKE_BUILD_TYPE=Debug to cmake to find actual commit that causes that assertion failure.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

I can reproduce the bug with revision b656a8180a8c9c6153caa69e96176a52ba8e3416 from Mon Aug 22 15:50:38 2016 +1000. The same is true for visual artifacts (#1762379) and missing models. Given that everything was fine some time last year, I think that the problem is caused by either compiler change or by changes in some libraries.

Revision history for this message
Bernhard Stegmaier (stegmaier) wrote :

I always pass the build type on command line.
So, maybe I didn't notice that my change possibly did break setting release build as a default.
I'll check that.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

This looks like a real bug in the code to me. Function CBVH_PBRT::recursiveBuild() returns a single node for primitives with close centroids. In my case there were too many such primitives (not sure why), so the assertion was triggered. The attached patch calls recursiveBuild recursively to split such primitives between multiple nodes.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

Here is slightly cleaner version of the patch (removed extra parentheses).

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

I had look on that case at first, since it is the only easy source to get the variable >65535
I opened a bug to discuss this possible issue in the original project:
https://github.com/mmp/pbrt-v3/issues/182#issuecomment-380439470

However still something is different (or wrong) somewhere as only you can reproduce it at moment.
Does it only happen with the 3D model shapes right? Something wrong loading the model files?

Revision history for this message
Seth Hillbrand (sethh) wrote :

The proposed patch in #16 does not appear to have adverse consequences for my system (debian stable). However, I did not observe the reported issues.

I'd be curious to know how many primitives you observed (and therefore what recursion level your patch reaches)

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

@mrluzerio: the module matrix was not initialized correctly, see bug #1763026

@sethh: there were 65812 primitives in one node with simplified version of pic_programmer attached to this bug log, so this is ca. more 8 recursion levels with that board. And this should not happen when models are loaded correctly.

Revision history for this message
Seth Hillbrand (sethh) wrote :

I'm inclined to agree with @mrluzeiro here. The diff should have no effect on your code. I suspect that there may be a stale build cache on your system @eugen-debian. Can you try re-building in a clean build directory and see if the problem follows?

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

I'm always removing the build directory before rebuild. And the diff was definitely fixing the assertion with revisions before 18b0b78397a1f88996b7410a5f2abd936dde16b9.

The assertion should be easily reproducible with revision before 18b0b78397a1f88996b7410a5f2abd936dde16b9 (where bug #1763026 was fixed) and glm version 0.9.9-a2 (where matrices are not initialized by default as identity matrices anymore). With this combination all model matrices are zero matrices. This results in all primitives in those models having the same centroids (probably 0/0/0 or nan/nan/nan).

The bug should be hard to reproduce with current version where initialization fixes were committed. The two possibilities I can think of are:
  - create a board with more than 65535 identical components in one place
  - create a 3D model with more than 65535 primitives with the same center
The second case is probably easier to implement. I don't think any useful board will trigger this assertion anymore. But I also don't think that relying on undefined behavior of converting large integers to uint16 is a good idea when data can come from an untrusted source.

Revision history for this message
Eugeniy Meshcheryakov (eugen-debian) wrote :

Ignore that comment about undefined behavior. It seems conversion from signed to unsigned is well-defined. The extra primitives will be ignored.

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

Related to this (now possible) issue, it reminds me to implement something in the future to check for invalid primitives (or very small that will not be rendered) that are slowing down the render.
I believe some of this cases (the ones I am referring) could happen with automatic CAD->Triangle generation, that may create some small garbage that should be ignored.

Revision history for this message
Seth Hillbrand (sethh) wrote :

@eugen-debian - if I understand correctly, the error is not being tripped now that the matrices are correctly initialized?

If so, I'd have a preference for not modifying upstream code more than necessary as it'll make it more difficult to merge future revisions. @mrluzeiro, do you have a preference for how you'd like to proceed?

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

IMO if this will still be added to v5, and if bug is fixed, we should not add more changes now and maybe open a new ticket for future improvements discussed here regarding this subject.

Revision history for this message
Jeff Young (jeyjey) wrote :

@Mario, I'm in agreement with comment #25. Could you open a new wishlist item so we can close this one out? Thanks!

Revision history for this message
Mario Luzeiro (mrluzeiro) wrote :

I created the wishlist issue at:
https://bugs.launchpad.net/kicad/+bug/1792810
with the main ideas discussed here.

Feel free to close this issue.

@eugen-debian if you would like to so some kind of tests, as related with recent changed polygon triangulation algorithm, to see if you can catch anything using similar cases, as you tested for this issue...

Revision history for this message
Jeff Young (jeyjey) wrote :

Thanks, Mario.

Changed in kicad:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.