Pacemaker service is dead after corosync package upgrade

Bug #1614893 reported by Ivan Berezovskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Denis Egorenko
Mitaka
Fix Released
High
Denis Egorenko

Bug Description

Steps to reproduce:
Deploy 9.0 Environment.
Add 9.1 Mirrors with new Corosync package.
Run:
apt-get upgrade or apt-get install corosync

Expected result:
corosync package is updated, corosync is restarted, pacemaker is alive

Actual result:
pacemaker service is dead:

root@node-1:~# pcs resource
Error: unable to get cluster status from crm_mon
Connection to cluster failed: Transport endpoint is not connected

root@node-1:~# service pacemaker status
pacemakerd is stopped

Workaround:
Need to start pacemaker service manually after corosync package upgrade

Changed in fuel:
status: New → Confirmed
tags: added: area-packaging
tags: added: area-puppet
removed: area-packaging
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/358702

Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :
no longer affects: fuel/newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/355525
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ecfe03f3a8d6d8ebe2c2261c34370d5df66b153c
Submitter: Jenkins
Branch: stable/mitaka

commit ecfe03f3a8d6d8ebe2c2261c34370d5df66b153c
Author: Denis Egorenko <email address hidden>
Date: Mon Aug 15 18:17:52 2016 +0300

    Add upgrade stuff for new versions

    Following items were added to perform environment update:

      1. Add task to perform upgrade with call of apt-get dist-upgrade.
         This commands force to keep old configuration files for all
         services.

      2. If MU upgrade is enabled, we trigger all services to restart.
         The exception is for Ceph, MySQL and RabbitMQ.

    Some workarounds were implemented for following issues:

      MySQL and RabbitMQ:
        MySQL and RabbitMQ restart is managed separately from other
        services and it is disabled by default. To enable it you
        should set mu_upgrade['restart_mysql'] and mu_upgrade['restart_rabbitmq']
        to true in astute.yaml.

      Pacemaker service issue:
        If corosync package was upgraded APT (or dpkg) restarts corosync
        service and this restart kills pacemaker service. So we need to
        start it again. Also we should deny APT to stop pacemaker service
        during upgrade because it leads to unload of all pacemaker resources
        (all services under it will be stopped). That's not appropriate
        behaviour during the update process. Puppet will manage pacemaker
        service itself.

      Ceph:
        Ceph should be upgraded by following next guide [1], otherwise simple
        restart of services might break the cluster or cause a data loss.

        [1] http://docs.ceph.com/docs/master/install/upgrading-ceph/#upgrade-procedures

    DocImpact

    Related-bug: #1614893

    Change-Id: I0d7231aa5900318f75f71c698f3e1c07f8e5cfbe

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/358702
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=c154093cd5c493188669f8c47c77857b0507a77b
Submitter: Jenkins
Branch: master

commit c154093cd5c493188669f8c47c77857b0507a77b
Author: Denis Egorenko <email address hidden>
Date: Mon Aug 15 18:17:52 2016 +0300

    Add upgrade stuff for new versions

    Following items were added to perform environment update:

      1. Add task to perform upgrade with call of apt-get dist-upgrade.
         This commands force to keep old configuration files for all
         services.

      2. If MU upgrade is enabled, we trigger all services to restart.
         The exception is for Ceph, MySQL and RabbitMQ.

    Some workarounds were implemented for following issues:

      MySQL and RabbitMQ:
        MySQL and RabbitMQ restart is managed separately from other
        services and it is disabled by default. To enable it you
        should set mu_upgrade['restart_mysql'] and mu_upgrade['restart_rabbitmq']
        to true in astute.yaml.

      Pacemaker service issue:
        If corosync package was upgraded APT (or dpkg) restarts corosync
        service and this restart kills pacemaker service. So we need to
        start it again. Also we should deny APT to stop pacemaker service
        during upgrade because it leads to unload of all pacemaker resources
        (all services under it will be stopped). That's not appropriate
        behaviour during the update process. Puppet will manage pacemaker
        service itself.

      Ceph:
        Ceph should be upgraded by following next guide [1], otherwise simple
        restart of services might break the cluster or cause a data loss.

        [1] http://docs.ceph.com/docs/master/install/upgrading-ceph/#upgrade-procedures

    DocImpact

    Related-bug: #1614893

    Change-Id: I0d7231aa5900318f75f71c698f3e1c07f8e5cfbe

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Tested during the upgrade from MOS 9.0 to MOS 9.1

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Maksym Shalamov (mshalamov) wrote :

Verified on MOS 9.1 (snapshot #240)

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.