Pacemaker hang during upgrade to 9.2

Bug #1644152 reported by Anton Chevychalov
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Anton Chevychalov

Bug Description

During upgrade from pacemaker version 1.1.14-2~u14.04+mos1 to version 1.1.14-2~u14.04+mos2 lrmd process hang and does not allow pacemaker to recover from corosync outage.

Long way to reproduce:
~~~~~~~~~~~~~~~~~~~~~
1. Install 9.1 with one controller node in HA mode.
2. Try to upgrade to 9.2
------------------------------

Expected result:
~~~~~~~~~~~~~~~~
Upgrade finished without problems.
------------------------------

Result:
~~~~~~
upgrade fails on some random component outage.

There are errors in pacemaker log:
error: mainloop_add_ipc_server: Could not start pengine IPC server: Address already in use (-98)
error: main: Failed to create IPC server: shutting down and inhibiting respawn

Pacemaker process restart every 2-3 minutes.

For example view https://bugs.launchpad.net/fuel/+bug/1641947
==============================

Fast way to reproduce:
~~~~~~~~~~~~~~~~~~~~~
1. Install 9.0 or 9.1 with one controller node in HA mode.
2. Login to controller over ssh
3. service corosync stop
4. Update packages pacemaker-cli-utils, pacemaker-common, pacemaker-resource-agents, pacemaker to 1.1.14-2~u14.04+mos2
5. service corosync start
6. wait 60 second for pacemaker to respawn
7. service pacemaker restart
------------------------------

Expected result:
~~~~~~~~~~~~~~~~
Pacemaker recovers from corosync outage.
------------------------------

Result:
~~~~~~~
Pacemaker fail to communicate with zombi lrmd and constantly restart.

Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Anton Chevychalov (achevychalov)
milestone: none → 9.2
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

Fast way to solve a problem force pacemaker restart right after package update and before corosync restart.

description: updated
description: updated
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/401268

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Importance was increased since issue is blocker for update functionality.

tags: added: blocker-for-qa
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

Init script of pacemaker is not working well in situation when lrmd process still in memory. It takes in count pacmakerd only.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/pacemaker (9.0)

Fix proposed to branch: 9.0
Change author: Anton Chevychalov <email address hidden>
Review: https://review.fuel-infra.org/28909

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/mitaka)

Change abandoned by Anton Chevychalov (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/401268
Reason: No needs to change anything in fuel-library anymore. Fixed in pacemaker package.

description: updated
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/trusty/pacemaker (9.0)

Reviewed: https://review.fuel-infra.org/28909
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0

Commit: e41689f2e6039e9fb92f84fe9a269dee81c46446
Author: Anton Chevychalov <email address hidden>
Date: Tue Nov 29 10:26:55 2016

Fix stop procedure of init script

Fix a problem with stoping of partial runned pacemaker.
Now it count not only pacemakerd but all other processes too.

Change-Id: Idfe51d725a98b5deed686ee36d6aa608c6a069d4
Closes-Bug: #1644152

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 9.2 snapshot #576.

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
no longer affects: pacemaker (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.