Activity log for bug #2012740

Date Who What changed Old value New value Message
2023-03-24 14:39:44 Scati Labs I+D bug added bug
2023-03-24 14:39:44 Scati Labs I+D attachment added apport crash view https://bugs.launchpad.net/bugs/2012740/+attachment/5657282/+files/pacemaker_crash.txt
2023-03-27 07:23:17 Athos Ribeiro bug added subscriber Ubuntu Server
2023-03-27 07:23:26 Athos Ribeiro pacemaker (Ubuntu): status New Triaged
2023-03-27 07:24:07 Athos Ribeiro tags bitesize server-todo
2023-03-27 07:24:46 Athos Ribeiro nominated for series Ubuntu Jammy
2023-03-27 07:24:46 Athos Ribeiro bug task added pacemaker (Ubuntu Jammy)
2023-03-27 07:24:53 Athos Ribeiro pacemaker (Ubuntu Jammy): status New Triaged
2023-03-27 07:24:58 Athos Ribeiro pacemaker (Ubuntu): status Triaged Fix Released
2023-03-28 15:16:59 Christian Ehrhardt  pacemaker (Ubuntu Jammy): assignee Michał Małoszewski (michal-maloszewski99)
2023-04-01 13:35:55 Launchpad Janitor merge proposal linked https://code.launchpad.net/~michal-maloszewski99/ubuntu/+source/pacemaker/+git/pacemaker/+merge/440191
2023-04-06 08:30:46 Michał Małoszewski description After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node. Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675 And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6 Please, provide an update for pacemaker because it is unusable this way. [Impact] The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components. Users of mysql clusters migrating from bionic to jammy reported a crash. This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback. [Test Plan] lxc launch ubuntu:22.04 node1 lxc shell node1 apt update && apt dist-upgrade -y apt install pcs mysql-server resource-agents -y echo hacluster:hacluster | chpasswd mysql -e "CREATE USER 'replicator'@'localhost'" mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'" systemctl disable mysql.service systemctl stop mysql.service exit lxc copy node1 node2 lxc start node2 lxc shell node1 pcs host auth node1 node2 -u hacluster -p hacluster pcs cluster setup --force mysqlclx node1 node2 transport udpu pcs cluster enable --all pcs cluster start --all pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs resource create p_mysql ocf:heartbeat:mysql \ replication_user=replicator \ test_user=root \ op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \ interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \ notify interval=0s timeout=90 promote interval=0s timeout=120 start \ interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true pcs resource promotable p_mysql p_mysql-master notify=true Example of failed output: There should be a crash file at /var/crash/ in some of the nodes. Example of successful output: No crash file at /var/crash/. [Where problems could occur] The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd. Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that. ---------------------------------original report-------------------------- After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node. Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675 And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6 Please, provide an update for pacemaker because it is unusable this way.
2023-04-06 08:33:27 Christian Ehrhardt  description [Impact] The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components. Users of mysql clusters migrating from bionic to jammy reported a crash. This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback. [Test Plan] lxc launch ubuntu:22.04 node1 lxc shell node1 apt update && apt dist-upgrade -y apt install pcs mysql-server resource-agents -y echo hacluster:hacluster | chpasswd mysql -e "CREATE USER 'replicator'@'localhost'" mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'" systemctl disable mysql.service systemctl stop mysql.service exit lxc copy node1 node2 lxc start node2 lxc shell node1 pcs host auth node1 node2 -u hacluster -p hacluster pcs cluster setup --force mysqlclx node1 node2 transport udpu pcs cluster enable --all pcs cluster start --all pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs resource create p_mysql ocf:heartbeat:mysql \ replication_user=replicator \ test_user=root \ op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \ interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \ notify interval=0s timeout=90 promote interval=0s timeout=120 start \ interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true pcs resource promotable p_mysql p_mysql-master notify=true Example of failed output: There should be a crash file at /var/crash/ in some of the nodes. Example of successful output: No crash file at /var/crash/. [Where problems could occur] The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd. Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that. ---------------------------------original report-------------------------- After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node. Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675 And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6 Please, provide an update for pacemaker because it is unusable this way. [Impact] * The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components. * Users of mysql clusters migrating from bionic to jammy reported a crash. * This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback. [Test Plan] lxc launch ubuntu:22.04 node1 lxc shell node1   apt update && apt dist-upgrade -y   apt install pcs mysql-server resource-agents -y   echo hacluster:hacluster | chpasswd   mysql -e "CREATE USER 'replicator'@'localhost'"   mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'"   systemctl disable mysql.service   systemctl stop mysql.service   exit lxc copy node1 node2 lxc start node2 lxc shell node1   pcs host auth node1 node2 -u hacluster -p hacluster   pcs cluster setup --force mysqlclx node1 node2 transport udpu   pcs cluster enable --all   pcs cluster start --all   pcs property set stonith-enabled=false   pcs property set no-quorum-policy=ignore   pcs resource create p_mysql ocf:heartbeat:mysql \     replication_user=replicator \     test_user=root \     op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \     interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \     notify interval=0s timeout=90 promote interval=0s timeout=120 start \     interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true   pcs resource promotable p_mysql p_mysql-master notify=true Example of failed output: There should be a crash file at /var/crash/ in some of the nodes. Example of successful output: No crash file at /var/crash/. [Where problems could occur] * The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd. * Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that. ---------------------------------original report-------------------------- After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node. Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675 And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6 Please, provide an update for pacemaker because it is unusable this way.
2023-04-06 08:33:56 Christian Ehrhardt  pacemaker (Ubuntu Jammy): status Triaged Fix Committed
2023-04-06 08:33:59 Christian Ehrhardt  pacemaker (Ubuntu Jammy): status Fix Committed In Progress
2023-04-14 22:19:08 Steve Langasek pacemaker (Ubuntu Jammy): status In Progress Fix Committed
2023-04-14 22:19:10 Steve Langasek bug added subscriber Ubuntu Stable Release Updates Team
2023-04-14 22:19:12 Steve Langasek bug added subscriber SRU Verification
2023-04-14 22:19:15 Steve Langasek tags bitesize server-todo bitesize server-todo verification-needed verification-needed-jammy
2023-04-17 18:54:17 Michał Małoszewski tags bitesize server-todo verification-needed verification-needed-jammy bitesize server-todo verification-done-jammy verification-needed
2023-04-18 08:05:43 Christian Ehrhardt  tags bitesize server-todo verification-done-jammy verification-needed bitesize server-todo verification-done verification-done-jammy
2023-04-27 14:53:16 Launchpad Janitor pacemaker (Ubuntu Jammy): status Fix Committed Fix Released
2023-04-27 14:53:22 Andreas Hasenack removed subscriber Ubuntu Stable Release Updates Team