FTBFS ppc64el obj_basic_integration/TEST5 crashed

Bug #2061913 reported by Bryce Harrington
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
pmdk
New
Unknown
pmdk (Debian)
New
Unknown
pmdk (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Affects ppc64el (only):

https://launchpadlibrarian.net/724116691/buildlog_ubuntu-noble-ppc64el.pmdk_1.13.1-1.1build1_BUILDING.txt.gz
https://launchpadlibrarian.net/724821331/buildlog_ubuntu-noble-ppc64el.pmdk_1.13.1-1.1build2_BUILDING.txt.gz

Also, exact failure also appears to affect Debian on same architecture:

https://buildd.debian.org/status/fetch.php?pkg=pmdk&arch=ppc64el&ver=1.13.1-1.1%2Bb1&stamp=1708597682&raw=0
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064559

obj_basic_integration/TEST5 crashed (signal 4). err5.log below.
{ut_backtrace.c:175 ut_sighandler} obj_basic_integration/TEST5:

{ut_backtrace.c:176 ut_sighandler} obj_basic_integration/TEST5: Signal 4, backtrace:
{ut_backtrace.c:120 ut_dump_backtrace} obj_basic_integration/TEST5: 0: ./obj_basic_integration(+0xc9f8) [0x18c9f8]
{ut_backtrace.c:120 ut_dump_backtrace} obj_basic_integration/TEST5: 1: ./obj_basic_integration(+0xcb8c) [0x18cb8c]
{ut_backtrace.c:178 ut_sighandler} obj_basic_integration/TEST5:

err5.log below.
obj_basic_integration/TEST5 err5.log {ut_backtrace.c:175 ut_sighandler} obj_basic_integration/TEST5:
obj_basic_integration/TEST5 err5.log
obj_basic_integration/TEST5 err5.log {ut_backtrace.c:176 ut_sighandler} obj_basic_integration/TEST5: Signal 4, backtrace:
obj_basic_integration/TEST5 err5.log {ut_backtrace.c:120 ut_dump_backtrace} obj_basic_integration/TEST5: 0: ./obj_basic_integration(+0xc9f8) [0x18c9f8]
obj_basic_integration/TEST5 err5.log {ut_backtrace.c:120 ut_dump_backtrace} obj_basic_integration/TEST5: 1: ./obj_basic_integration(+0xcb8c) [0x18cb8c]
obj_basic_integration/TEST5 err5.log {ut_backtrace.c:178 ut_sighandler} obj_basic_integration/TEST5:
obj_basic_integration/TEST5 err5.log

Last 30 lines of memcheck5.log below (whole file has 48 lines).
obj_basic_integration/TEST5 memcheck5.log ==89952== by 0x4915EB7: util_pool_create_uuids (set.c:2521)
obj_basic_integration/TEST5 memcheck5.log ==89952== by 0x49160FB: util_pool_create (set.c:2563)
obj_basic_integration/TEST5 memcheck5.log ==89952== by 0x4941183: pmemobj_createU (obj.c:1164)
obj_basic_integration/TEST5 memcheck5.log ==89952== by 0x4941643: pmemobj_create (obj.c:1244)
obj_basic_integration/TEST5 memcheck5.log ==89952== Your program just tried to execute an instruction that Valgrind
obj_basic_integration/TEST5 memcheck5.log ==89952== did not recognise. There are two possible reasons for this.
obj_basic_integration/TEST5 memcheck5.log ==89952== 1. Your program has a bug and erroneously jumped to a non-code
obj_basic_integration/TEST5 memcheck5.log ==89952== location. If you are running Memcheck and you just saw a
obj_basic_integration/TEST5 memcheck5.log ==89952== warning about a bad jump, it's probably your program's fault.
obj_basic_integration/TEST5 memcheck5.log ==89952== 2. The instruction is legitimate but Valgrind doesn't handle it,
obj_basic_integration/TEST5 memcheck5.log ==89952== i.e. it's Valgrind's fault. If you think this is the case or
obj_basic_integration/TEST5 memcheck5.log ==89952== you are not sure, please let us know and we'll try to fix it.
obj_basic_integration/TEST5 memcheck5.log ==89952== Either way, Valgrind will now raise a SIGILL signal which will
obj_basic_integration/TEST5 memcheck5.log ==89952== probably kill your program.
obj_basic_integration/TEST5 memcheck5.log ==89952==
obj_basic_integration/TEST5 memcheck5.log ==89952== HEAP SUMMARY:
obj_basic_integration/TEST5 memcheck5.log ==89952== in use at exit: 3,172 bytes in 39 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== total heap usage: 193 allocs, 154 frees, 433,659 bytes allocated
obj_basic_integration/TEST5 memcheck5.log ==89952==
obj_basic_integration/TEST5 memcheck5.log ==89952== LEAK SUMMARY:
obj_basic_integration/TEST5 memcheck5.log ==89952== definitely lost: 0 bytes in 0 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== indirectly lost: 0 bytes in 0 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== possibly lost: 0 bytes in 0 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== still reachable: 3,172 bytes in 39 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== suppressed: 0 bytes in 0 blocks
obj_basic_integration/TEST5 memcheck5.log ==89952== Reachable blocks (those to which a pointer was found) are not shown.
obj_basic_integration/TEST5 memcheck5.log ==89952== To see them, rerun with: --leak-check=full --show-leak-kinds=all
obj_basic_integration/TEST5 memcheck5.log ==89952==
obj_basic_integration/TEST5 memcheck5.log ==89952== For lists of detected and suppressed errors, rerun with: -s
obj_basic_integration/TEST5 memcheck5.log ==89952== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

There are also some instances of valgrind crashes:

pmempool_feature/TEST4: SETUP (check/pmem/debug/memcheck)
../unittest/unittest.sh: line 747: 1396902 Illegal instruction /usr/bin/valgrind --tool=memcheck --log-file=memcheck4.log --suppressions=../memcheck-dlopen.supp --suppressions=../memcheck-dlopen.supp --leak-check=full --suppressions=../ld.supp --suppressions=../memcheck-libunwind.supp --suppressions=../memcheck-ndctl.supp ../../tools/pmempool/pmempool feature -d SHUTDOWN_STATE /tmp//test_pmempool_feature4😘⠏⠍⠙⠅ɗPMDKӜ⥺🙋/testset &>> grep4.log
pmempool_feature/TEST4 crashed (signal 4).
grep4.log below.

RUNTESTS: stopping: pmempool_feature/TEST4 failed, TEST=check FS=any BUILD=debug
pmempool_feature/TEST5: SETUP (check/pmem/debug/memcheck)
../unittest/unittest.sh: line 747: 1397154 Illegal instruction /usr/bin/valgrind --tool=memcheck --log-file=memcheck5.log --suppressions=../memcheck-dlopen.supp --suppressions=../memcheck-dlopen.supp --leak-check=full --suppressions=../ld.supp --suppressions=../memcheck-libunwind.supp --suppressions=../memcheck-ndctl.supp ../../tools/pmempool/pmempool feature -d SHUTDOWN_STATE /tmp//test_pmempool_feature5😘⠏⠍⠙⠅ɗPMDKӜ⥺🙋/testset &>> grep5.log
pmempool_feature/TEST5 crashed (signal 4).
grep5.log below.
pmempool_feature/TEST5 grep5.log query SHUTDOWN_STATE result is 1

1

Last 30 lines of memcheck5.log below (whole file has 65 lines).
pmempool_feature/TEST5 memcheck5.log ==1397154== Illegal opcode at address 0x4B59240
pmempool_feature/TEST5 memcheck5.log ==1397154== at 0x4B59240: ppc_flush (init.c:53)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x4B519C7: pmem_flush (pmem.c:229)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x4B51A6B: pmem_persist (pmem.c:240)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492CA93: util_persist (util_pmem.h:27)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492CBA7: util_persist_auto (util_pmem.h:40)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492DDC3: set_hdr (feature.c:256)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492E143: feature_set (feature.c:325)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492E967: disable_shutdown_state (feature.c:500)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492EF2F: pmempool_feature_disableU (feature.c:662)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x492F1AB: pmempool_feature_disable (feature.c:738)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x196897: feature_perform (feature.c:110)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x196897: pmempool_feature_func (feature.c:206)
pmempool_feature/TEST5 memcheck5.log ==1397154== by 0x18A45B: main (pmempool.c:271)
pmempool_feature/TEST5 memcheck5.log ==1397154==
pmempool_feature/TEST5 memcheck5.log ==1397154== HEAP SUMMARY:
pmempool_feature/TEST5 memcheck5.log ==1397154== in use at exit: 52,839 bytes in 21 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== total heap usage: 64 allocs, 43 frees, 108,953 bytes allocated
pmempool_feature/TEST5 memcheck5.log ==1397154==
pmempool_feature/TEST5 memcheck5.log ==1397154== LEAK SUMMARY:
pmempool_feature/TEST5 memcheck5.log ==1397154== definitely lost: 0 bytes in 0 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== indirectly lost: 0 bytes in 0 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== possibly lost: 0 bytes in 0 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== still reachable: 50,479 bytes in 16 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== suppressed: 2,360 bytes in 5 blocks
pmempool_feature/TEST5 memcheck5.log ==1397154== Reachable blocks (those to which a pointer was found) are not shown.
pmempool_feature/TEST5 memcheck5.log ==1397154== To see them, rerun with: --leak-check=full --show-leak-kinds=all
pmempool_feature/TEST5 memcheck5.log ==1397154==
pmempool_feature/TEST5 memcheck5.log ==1397154== For lists of detected and suppressed errors, rerun with: -s
pmempool_feature/TEST5 memcheck5.log ==1397154== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Related branches

Changed in pmdk:
status: Unknown → New
Changed in pmdk (Debian):
status: Unknown → New
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Before the latest delta, I see:

20 tests failed:
obj_basic_integration/TEST5
obj_basic_integration/TEST6
obj_action/TEST1
obj_ctl_arenas/TEST6
obj_ctl_debug/TEST1
obj_locks/TEST1
obj_locks/TEST2
obj_mem/TEST1
obj_memcheck_register/TEST0
obj_pmalloc_mt/TEST2
obj_tx_alloc_mt/TEST2
obj_tx_locks/TEST1
obj_tx_locks/TEST2
obj_tx_locks_abort/TEST1
obj_tx_locks_abort/TEST2
out_err_mt/TEST1
out_err_mt/TEST2
pmempool_create/TEST7
pmempool_feature/TEST4
pmempool_feature/TEST5

After the delta:

19 tests failed:
obj_basic_integration/TEST6
obj_action/TEST1
obj_ctl_arenas/TEST6
obj_ctl_debug/TEST1
obj_locks/TEST1
obj_locks/TEST2
obj_mem/TEST1
obj_memcheck_register/TEST0
obj_pmalloc_mt/TEST2
obj_tx_alloc_mt/TEST2
obj_tx_locks/TEST1
obj_tx_locks/TEST2
obj_tx_locks_abort/TEST1
obj_tx_locks_abort/TEST2
out_err_mt/TEST1
out_err_mt/TEST2
pmempool_create/TEST7
pmempool_feature/TEST4
pmempool_feature/TEST5

I suppose that if we are skipping tests, we want to add these all to the ppc64el skip list so we get a successful build here.

Revision history for this message
Bryce Harrington (bryce) wrote :

I also counted 19 failed tests (that we know of). I don't have a solid feeling whether these are having the same root cause or could be multiple underlying issues. It would not surprise me, for example, if there are ppc64el-specific issues in valgrind itself, in addition to separate and unrelated issues in pmdk. Also, if there is one specific op code, for example, that causes all these failures, I have not had luck in pinpointing it and it feels like it might require at least a deep dive or even manual debugging on a ppc64el host. I also don't think it is wise to assume Debian or upstream is prepared and ready to do that at the moment.

Given the uncertainty + short timeframe, after a discussion the server team determined best approach would be to bypass the tests on ppc64el, to get a successful build, that will hopefully migrate and allow its rdepends to resolve.

We can't be certain whether these tests represent actual faults that will affect users, or are false positives or testsuite-specific issues that won't affect them. Just in case it's the former, this should be identified as a known issue in the release notes, a priority given to ascertain which is the case, and then the release notes updated and followup SRU bugs filed accordingly.

For now, we'll use this bug report for tracking purposes of the investigation of the test failures generally, but may wish to divide this into separate bug reports for more specific issues and use this as an umbrella bug report. I'll prioritize this as a "server-todo" bug for this work.

tags: added: server-todo
Bryce Harrington (bryce)
tags: added: update-excuse
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pmdk - 1.13.1-1.1ubuntu2

---------------
pmdk (1.13.1-1.1ubuntu2) noble; urgency=medium

  * Fix FTBFS issues in ppc64el:
    - d/rules: skip ppc64el build time tests
    - d/p/debian-changes: remove bogus file

 -- Athos Ribeiro <email address hidden> Thu, 18 Apr 2024 09:44:59 -0300

Changed in pmdk (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.