During upgrade from pacemaker version 1.1.14-2~u14.04+mos1 to version 1.1.14-2~u14.04+mos2 lrmd process hang and does not allow pacemaker to recover from corosync outage.
Long way to reproduce:
~~~~~~~~~~~~~~~~~~~~~
1. Install 9.1 with one controller node in HA mode.
2. Try to upgrade to 9.2
------------------------------
Expected result:
~~~~~~~~~~~~~~~~
Upgrade finished without problems.
------------------------------
Result:
~~~~~~
upgrade failed on some random component outage.
There are errors in log:
error: mainloop_add_ipc_server: Could not start pengine IPC server: Address already in use (-98)
error: main: Failed to create IPC server: shutting down and inhibiting respawn
info: crm_xml_cleanup: Cleaning up memory from libxml2
Fast way to reproduce:
~~~~~~~~~~~~~~~~~~~~~
1. Install 9.0 or 9.1 with one controller node in HA mode.
2. Login to SSH
3. service corosync stop
4. Update packages pacemaker-cli-utils, pacemaker-common, pacemaker-resource-agents, pacemaker to 1.1.14-2~u14.04+mos2
5. service corosync start
------------------------------
Expected result:
~~~~~~~~~~~~~~~~
Pacemaker recovers from corosync outage.
------------------------------
Result:
~~~~~~~
Pacemaker fail to communicate with lrmd and constantly restart.
During upgrade from pacemaker version 1.1.14- 2~u14.04+ mos1 to version 1.1.14- 2~u14.04+ mos2 lrmd process hang and does not allow pacemaker to recover from corosync outage.
Long way to reproduce: ~~~~~~~ ~~~~~~~ ------- ------- ------- --
~~~~~~~
1. Install 9.1 with one controller node in HA mode.
2. Try to upgrade to 9.2
-------
Expected result: ------- ------- ------- --
~~~~~~~~~~~~~~~~
Upgrade finished without problems.
-------
Result:
~~~~~~
upgrade failed on some random component outage.
There are errors in log: add_ipc_ server: Could not start pengine IPC server: Address already in use (-98)
error: mainloop_
error: main: Failed to create IPC server: shutting down and inhibiting respawn
info: crm_xml_cleanup: Cleaning up memory from libxml2
Pacemaker process restart every 2-3 minutes.
For example view https:/ /bugs.launchpad .net/fuel/ +bug/1641947 ------- ------- ------- --
-------
Fast way to reproduce: ~~~~~~~ ~~~~~~~ cli-utils, pacemaker-common, pacemaker- resource- agents, pacemaker to 1.1.14- 2~u14.04+ mos2 ------- ------- ------- --
~~~~~~~
1. Install 9.0 or 9.1 with one controller node in HA mode.
2. Login to SSH
3. service corosync stop
4. Update packages pacemaker-
5. service corosync start
-------
Expected result: ------- ------- ------- --
~~~~~~~~~~~~~~~~
Pacemaker recovers from corosync outage.
-------
Result:
~~~~~~~
Pacemaker fail to communicate with lrmd and constantly restart.