Activity log for bug #1915828

Date Who What changed Old value New value Message
2021-02-16 14:15:16 Jim Dickie bug added bug
2021-02-16 22:52:48 Dominique Poulain bug added subscriber Dominique Poulain
2021-02-17 15:25:00 Sergio Durigan Junior bug added subscriber Lucas Kanashiro
2021-02-19 00:28:21 Dominique Poulain tags sts
2021-02-19 01:48:46 Nivedita Singhvi bug added subscriber Nivedita Singhvi
2021-02-23 07:11:51 Christian Ehrhardt  bug added subscriber Ubuntu Server
2021-02-23 12:24:30 Christian Ehrhardt  tags sts server-next sts
2021-02-23 16:01:04 Dariusz Gadomski pacemaker (Ubuntu): assignee Dariusz Gadomski (dgadomski)
2021-02-23 16:01:12 Dariusz Gadomski pacemaker (Ubuntu): importance Undecided Medium
2021-03-01 21:29:21 Dan Streetman description When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker: Installed: 2.0.3-3ubuntu4.1 Candidate: 2.0.3-3ubuntu4.1 Version table: *** 2.0.3-3ubuntu4.1 500 500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages 100 /var/lib/dpkg/status 2.0.3-3ubuntu3 500 500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include <qb/qblog.h> QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) { return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The binutils in bionic and earlier does not appear to cause the problematic behavior with the libqb linker "magic", so no change is needed there. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker:   Installed: 2.0.3-3ubuntu4.1   Candidate: 2.0.3-3ubuntu4.1   Version table:  *** 2.0.3-3ubuntu4.1 500         500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages         500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages         100 /var/lib/dpkg/status      2.0.3-3ubuntu3 500         500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
2021-03-01 21:36:20 Dan Streetman nominated for series Ubuntu Focal
2021-03-01 21:36:20 Dan Streetman bug task added pacemaker (Ubuntu Focal)
2021-03-01 21:36:26 Dan Streetman pacemaker (Ubuntu): status New Fix Released
2021-03-01 21:36:36 Dan Streetman pacemaker (Ubuntu Focal): assignee Dariusz Gadomski (dgadomski)
2021-03-01 21:36:37 Dan Streetman pacemaker (Ubuntu Focal): importance Undecided Medium
2021-03-01 21:36:39 Dan Streetman pacemaker (Ubuntu Focal): status New In Progress
2021-03-01 21:36:47 Dan Streetman pacemaker (Ubuntu): assignee Dariusz Gadomski (dgadomski)
2021-03-01 21:36:49 Dan Streetman pacemaker (Ubuntu): importance Medium Undecided
2021-03-01 22:06:08 Dan Streetman description [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include <qb/qblog.h> QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) { return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The binutils in bionic and earlier does not appear to cause the problematic behavior with the libqb linker "magic", so no change is needed there. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker:   Installed: 2.0.3-3ubuntu4.1   Candidate: 2.0.3-3ubuntu4.1   Version table:  *** 2.0.3-3ubuntu4.1 500         500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages         500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages         100 /var/lib/dpkg/status      2.0.3-3ubuntu3 500         500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include <qb/qblog.h> QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) {   return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The libqb code in bionic does not include the linker "magic" and so does not have this problem. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker:   Installed: 2.0.3-3ubuntu4.1   Candidate: 2.0.3-3ubuntu4.1   Version table:  *** 2.0.3-3ubuntu4.1 500         500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages         500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages         100 /var/lib/dpkg/status      2.0.3-3ubuntu3 500         500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
2021-03-04 09:26:09 Dariusz Gadomski nominated for series Ubuntu Groovy
2021-03-04 09:26:09 Dariusz Gadomski bug task added pacemaker (Ubuntu Groovy)
2021-03-05 06:59:10 Rafael David Tinoco bug added subscriber Rafael David Tinoco
2021-03-05 14:06:14 Dariusz Gadomski attachment added focal.debdiff https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5473371/+files/focal.debdiff
2021-03-05 14:28:11 Dariusz Gadomski pacemaker (Ubuntu Groovy): assignee Dariusz Gadomski (dgadomski)
2021-03-08 09:18:43 Dariusz Gadomski attachment removed focal.debdiff https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5473371/+files/focal.debdiff
2021-03-08 09:19:09 Dariusz Gadomski attachment added focal.debdiff https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5474407/+files/focal.debdiff
2021-03-08 09:20:02 Dariusz Gadomski attachment added groovy.debdiff https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5474408/+files/groovy.debdiff
2021-03-08 14:03:57 Łukasz Zemczak pacemaker (Ubuntu Groovy): status New Fix Committed
2021-03-08 14:03:58 Łukasz Zemczak bug added subscriber Ubuntu Stable Release Updates Team
2021-03-08 14:04:01 Łukasz Zemczak bug added subscriber SRU Verification
2021-03-08 14:04:06 Łukasz Zemczak tags server-next sts server-next sts verification-needed verification-needed-groovy
2021-03-08 14:55:29 Łukasz Zemczak pacemaker (Ubuntu Focal): status In Progress Fix Committed
2021-03-08 14:55:35 Łukasz Zemczak tags server-next sts verification-needed verification-needed-groovy server-next sts verification-needed verification-needed-focal verification-needed-groovy
2021-03-08 15:13:27 Dariusz Gadomski description [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include <qb/qblog.h> QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) {   return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The libqb code in bionic does not include the linker "magic" and so does not have this problem. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker:   Installed: 2.0.3-3ubuntu4.1   Candidate: 2.0.3-3ubuntu4.1   Version table:  *** 2.0.3-3ubuntu4.1 500         500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages         500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages         100 /var/lib/dpkg/status      2.0.3-3ubuntu3 500         500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include <qb/qblog.h> QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) {   return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. additionally, altering of build flags (namely -DQB_KILL_ATTRIBUTE_SECTION) removes some symbols from pacemaker libraries (please see the debdiffs for the full list of them). Those seem to be previously defined by macros (resolved in the end to QB_LOG_INIT_DATA) and used internally by libqb for logging purposes. If there was anything using those symbols build time or runtime missing symbols may be reported. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The libqb code in bionic does not include the linker "magic" and so does not have this problem. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This patch from clusterlabs is an attempt to resolve the assertion, but is still not sufficient. https://lists.clusterlabs.org/pipermail/users/2018-February/023614.html Taking out the assertion in <qb/qblog.h> and recompiling pacemaker appears to be the only way to allow dlm_stonith to work. journalctl shows dlm_controld keeps trying to get a successful response from dlm_stonith Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence result 2 pid 26568 result -1 term signal 6 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence status 2 receive -1 from 1 walltime 1613481117 local 4389 Feb 16 13:11:57 u2004-1 dlm_controld[9344]: 4389 fence request 2 pid 26607 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence result 2 pid 26607 result -1 term signal 6 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence status 2 receive -1 from 1 walltime 1613481118 local 4391 Feb 16 13:11:58 u2004-1 dlm_controld[9344]: 4391 fence request 2 pid 26637 nodedown time 1613481102 fence_all dlm_stonith Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence result 2 pid 26637 result -1 term signal 6 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence status 2 receive -1 from 1 walltime 1613481120 local 4392 Feb 16 13:12:00 u2004-1 dlm_controld[9344]: 4392 fence request 2 pid 26693 nodedown time 1613481102 fence_all dlm_stonith .... Calling 'dlm_tool fence_ack 2' by hand immediately releases the dlm resource locks. root@u2004-1:~# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 root@u2004-1:~# apt-cache policy pacemaker pacemaker:   Installed: 2.0.3-3ubuntu4.1   Candidate: 2.0.3-3ubuntu4.1   Version table:  *** 2.0.3-3ubuntu4.1 500         500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages         500 http://gb.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages         100 /var/lib/dpkg/status      2.0.3-3ubuntu3 500         500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
2021-03-08 15:13:33 Dariusz Gadomski pacemaker (Ubuntu Groovy): importance Undecided Medium
2021-03-17 14:46:50 Dariusz Gadomski tags server-next sts verification-needed verification-needed-focal verification-needed-groovy server-next sts verification-done-focal verification-needed verification-needed-groovy
2021-03-17 15:33:38 Dariusz Gadomski tags server-next sts verification-done-focal verification-needed verification-needed-groovy server-next sts verification-done-focal verification-done-groovy verification-needed
2021-03-17 15:33:43 Dariusz Gadomski tags server-next sts verification-done-focal verification-done-groovy verification-needed server-next sts verification-done verification-done-focal verification-done-groovy
2021-03-18 12:18:28 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2021-03-18 12:28:32 Launchpad Janitor pacemaker (Ubuntu Groovy): status Fix Committed Fix Released
2021-03-18 13:24:37 Launchpad Janitor pacemaker (Ubuntu Focal): status Fix Committed Fix Released