FTBFS (test fail with sigbus) on armhf in Hirsute
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Netgen |
Fix Released
|
Unknown
|
|||
netgen (Debian) |
Fix Released
|
Unknown
|
|||
netgen (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
opencascade (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hi,
I was checking a build fail in Ubuntu on armhf.
=> https:/
It worked fine for the actual build, but then crashes in the self tests:
$ export PYTHONPATH=
$ apt install python3-tk python3-numpy
$ cd ~/netgen-
$ LD_LIBRARY_
...
test_pickling.py Bus error (core dumped)
This seems to be 100% reproducible, if one follow the steps that the Debian package build does.
The other tests pass
test_pickling.
test_pickling.
test_pickling.
test_pickling.
Just test_pickle_csg fails.
And in this test the failing line is: geo_dump = pickle.dumps(geo)
With geo being <netgen.
Running that in python3-dbg and gdb into the core file shows the pickling
deep into netgen's code (which is better than a generic pickling issue I guess)
#0 0xf659c99e in ngcore:
#1 ngcore:
#2 0xf641d4de in netgen:
#3 netgen:
#4 netgen:
#5 0xf641dc00 in netgen:
#6 0xf6434c28 in ngcore:
#7 ngcore:
#8 0xf6430dca in ngcore:
#9 ngcore:
#10 ngcore:
#11 ngcore:
#12 netgen:
#13 0xf648a958 in ngcore:
#14 ngcore:
#15 0xf64a4218 in ngcore:
self=<optimized out>, this=<optimized out>) at /usr/include/
....
That is:
./libsrc/
721 private:
722 template <typename T>
723 Archive & Write (T x)
724 {
725 if (unlikely(ptr > BUFFERSIZE-
726 {
727 stream-
728 *reinterpret_
729 ptr = sizeof(T);
730 return *this;
731 }
732 *reinterpret_
733 ptr += sizeof(T);
734 return *this;
735 }
736 };
With the variables in the crash file being:
(gdb) p &buffer
$5 = (std::array<char, 1024> *) 0xffa90d40
(gdb) p ptr
$3 = 1
Depending on how the real code (not gdb on the crash file) interprets this pointer addition that might explain the SigBus as it reflects unaligned access and if it adds that up to just "0xffa90d41" (which happens in gdb) then it fails.
I'm a bit lost as .hpp backends to serialize/pickle python files really isn't my home turf :-/
Therefore I wanted to reach out to you as experts on netgen if this makes sense to you.
I can keep the repro-systems around for a while, so if you have debug-questions or small modifications to try I should be able test them.
P.S. The reason this didn't show up in the past is because before the tests were not correctly run at build time, the last Debian upload fixed that and since then it is an FTFBS. But it seems not to trigger in all environments, e.g. in the Debian builds it did not crash the same way.
FYI: I'm not entirely sure, there also is this recent bug about unaligned access - but the logs linked there didn't look to be "the same". Still as FYI: https:/
Note: I've reported the very same bug upstream and will link it, this LP bug is meant as tracker to be found via the update-excuse tag.
Changed in netgen: | |
status: | Unknown → New |
Changed in netgen (Debian): | |
status: | Unknown → New |
Changed in netgen (Ubuntu): | |
status: | New → Fix Released |
Changed in opencascade (Ubuntu): | |
status: | New → Fix Released |
Changed in netgen: | |
status: | New → Fix Released |
Changed in netgen (Debian): | |
status: | New → Fix Released |
Since it was broken before in /launchpad. net/ubuntu/ +source/ netgen/ 6.2.2006+ really6. 2.1905+ dfsg-2 /launchpad. net/ubuntu/ +source/ netgen/ 6.2.2006+ really6. 2.1905+ dfsg-2
https:/
And only now shows up because of
" * [5426125] Fix running tests"
being in
https:/
I wonder if skipping this test on armhf would the right way to mitigate it for the time being and not get things stuck in proposed until really resolved. After all it seems that it would not be a degradation to before (on armhf).