2024-04-05 06:55:53 |
Martin Pitt |
description |
In Cockpit's CI we see a lot of pmproxy crashes like [1] in a test which starts/stops/reconfigures pmlogger, pmproxy, and redis. The journal (some examples are [2][3][4]) always shows a similar stack trace:
pmproxy[9832]: segfault at 3 ip 0000767961047e45 sp 00007ffe97e825d0 error 4 in libpcp_web.so.1[767961018000+5c000] likely on CPU 0 (core 0, socket 0)
Stack trace of thread 9832:
#0 0x0000767961047e45 n/a (libpcp_web.so.1 + 0x38e45)
#1 0x0000767961059745 n/a (libpcp_web.so.1 + 0x4a745)
#2 0x0000767961056311 n/a (libpcp_web.so.1 + 0x47311)
#3 0x0000767960f5c52b n/a (libuv.so.1 + 0x2752b)
#4 0x0000767960f5dbdb n/a (libuv.so.1 + 0x28bdb)
#5 0x0000767960f44ce8 uv_run (libuv.so.1 + 0xfce8)
#6 0x00005cae24f55097 n/a (pmproxy + 0xb097)
#7 0x00005cae24f53b6d n/a (pmproxy + 0x9b6d)
#8 0x000076796062a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
#9 0x000076796062a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
#10 0x00005cae24f54135 n/a (pmproxy + 0xa135)
Unfortunately that's not super useful
[1] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20264-13fcc041-20240404-201827-ubuntu-stable-other/log.html#34
[2] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20264-13fcc041-20240404-201827-ubuntu-stable-other/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2201-FAIL.log.gz
[3] https://cockpit-logs.us-east-1.linodeobjects.com/pull-6177-6626b317-20240404-225904-ubuntu-stable-other-cockpit-project-cockpit/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2401-FAIL.log.gz
[4] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20261-d1621935-20240404-105717-ubuntu-stable-other/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2201-FAIL.log.gz |
In Cockpit's CI we see a lot of pmproxy crashes like [1] in a test which starts/stops/reconfigures pmlogger, pmproxy, and redis. The journal (some examples are [2][3][4]) always shows a similar stack trace:
pmproxy[9832]: segfault at 3 ip 0000767961047e45 sp 00007ffe97e825d0 error 4 in libpcp_web.so.1[767961018000+5c000] likely on CPU 0 (core 0, socket 0)
Stack trace of thread 9832:
#0 0x0000767961047e45 n/a (libpcp_web.so.1 + 0x38e45)
#1 0x0000767961059745 n/a (libpcp_web.so.1 + 0x4a745)
#2 0x0000767961056311 n/a (libpcp_web.so.1 + 0x47311)
#3 0x0000767960f5c52b n/a (libuv.so.1 + 0x2752b)
#4 0x0000767960f5dbdb n/a (libuv.so.1 + 0x28bdb)
#5 0x0000767960f44ce8 uv_run (libuv.so.1 + 0xfce8)
#6 0x00005cae24f55097 n/a (pmproxy + 0xb097)
#7 0x00005cae24f53b6d n/a (pmproxy + 0x9b6d)
#8 0x000076796062a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
#9 0x000076796062a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
#10 0x00005cae24f54135 n/a (pmproxy + 0xa135)
Unfortunately that's not super useful. But I managed to reproduce it once locally and got a core dump (attached). But running it through gdb isn't super enlightening either. It does spend several minutes downloading debug symbols, but apparently not the right ones?
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
Downloading separate debug info for /lib/libpcp_web.so.1
[... lots more ...]
(gdb) bt
#0 0x00007b1d588cbe45 in ?? () from /lib/libpcp_web.so.1
#1 0x00007b1d588dd745 in ?? () from /lib/libpcp_web.so.1
#2 0x00007b1d588da311 in ?? () from /lib/libpcp_web.so.1
#3 0x00007b1d587e052b in uv__inotify_read (loop=0x7b1d587ed180 <default_loop_struct>, dummy=<optimized out>, events=1)
at /usr/src/libuv1-1.48.0-1/src/unix/linux.c:2466
#4 0x00007b1d587e1bdb in uv__io_poll (loop=0x7b1d587ed180 <default_loop_struct>, timeout=<optimized out>)
at /usr/src/libuv1-1.48.0-1/src/unix/linux.c:1528
#5 0x00007b1d587c8ce8 in uv_run (loop=0x7b1d587ed180 <default_loop_struct>, mode=UV_RUN_DEFAULT) at /usr/src/libuv1-1.48.0-1/src/unix/core.c:448
#6 0x00005b98349dd097 in ?? ()
#7 0x00005b98349dbb6d in ?? ()
#8 0x00007b1d57e2a1ca in __libc_start_call_main (main=main@entry=0x5b98349db610, argc=argc@entry=3, argv=argv@entry=0x7ffc673aeac8)
at ../sysdeps/nptl/libc_start_call_main.h:58
#9 0x00007b1d57e2a28b in __libc_start_main_impl (main=0x5b98349db610, argc=3, argv=0x7ffc673aeac8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffc673aeab8) at ../csu/libc-start.c:360
#10 0x00005b98349dc135 in ?? ()
So I followed the "good old dbgsym" way [5], but:
E: Unable to locate package libpcp-web1-dbgsym
E: Unable to locate package libpcp3-dbgsym
E: Unable to locate package pcp-dbgsym
The build log [6] also doesn't mention any dbgsym builds, so it seems they are missing?
[1] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20264-13fcc041-20240404-201827-ubuntu-stable-other/log.html#34
[2] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20264-13fcc041-20240404-201827-ubuntu-stable-other/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2201-FAIL.log.gz
[3] https://cockpit-logs.us-east-1.linodeobjects.com/pull-6177-6626b317-20240404-225904-ubuntu-stable-other-cockpit-project-cockpit/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2401-FAIL.log.gz
[4] https://cockpit-logs.us-east-1.linodeobjects.com/pull-20261-d1621935-20240404-105717-ubuntu-stable-other/TestHistoryMetrics-testPmProxySettings-ubuntu-stable-127.0.0.2-2201-FAIL.log.gz
[5] https://wiki.ubuntu.com/DebuggingProgramCrash
[6] https://launchpadlibrarian.net/714485247/buildlog_ubuntu-noble-amd64.pcp_6.2.0-1_BUILDING.txt.gz
Ubuntu 24.04
pcp 6.2.0-1 |
|