pcmk remote and HA is broken in train

Bug #1859945 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

Currently an HA deployment making use of PacemakerRemote for any HA role
will fail with the following:
2020-01-16 08:40:22.707 33489 DEBUG paunch [ ] Start container mysql_restart_bundle as mysql_restart_bundle.
2020-01-16 08:40:22.708 33489 DEBUG paunch [ ] Path seperator found in volume (/etc/corosync/corosync.conf), but did not exist on the file system
2020-01-16 08:40:22.708 33489 ERROR paunch [ ] /etc/corosync/corosync.conf is not a valid volume source
...
2020-01-16 08:40:53.026 33489 ERROR paunch [ ] The following containers failed validations and were not started: mysql_restart_bundle

The reason for this is that via I92d4ddf2feeac06ce14468ae928c283f3fd04f45 (HA: fix
<service>_restart_bundle with minor update workflow), we consolidated
all the restart bundles into a single place inside
containers-common.yaml but we forgot to conditionalize the inclusion of
the /etc/corosync/corosync.conf bind mount. In fact this bind mount is
not needed since we started using RHEL/CentOS 8 (i.e. since the podman
introduction). See I399098bf734aa3b2862e1713d4b1f429d180afbc (Fix pcmk
remote podman bundle restarts) for more context

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/702826

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/702826
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a30342f253bf63db00a9545aebe50187b9c0324a
Submitter: Zuul
Branch: master

commit a30342f253bf63db00a9545aebe50187b9c0324a
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 16 10:13:07 2020 +0100

    Fix deployment on pacemaker remote nodes

    Currently an HA deployment making use of PacemakerRemote for any HA role
    will fail with the following:
    2020-01-16 08:40:22.707 33489 DEBUG paunch [ ] Start container mysql_restart_bundle as mysql_restart_bundle.
    2020-01-16 08:40:22.708 33489 DEBUG paunch [ ] Path seperator found in volume (/etc/corosync/corosync.conf), but did not exist on the file system
    2020-01-16 08:40:22.708 33489 ERROR paunch [ ] /etc/corosync/corosync.conf is not a valid volume source
    ...
    2020-01-16 08:40:53.026 33489 ERROR paunch [ ] The following containers failed validations and were not started: mysql_restart_bundle

    The reason for this is that via I92d4ddf2feeac06ce14468ae928c283f3fd04f45 (HA: fix
    <service>_restart_bundle with minor update workflow), we consolidated
    all the restart bundles into a single place inside
    containers-common.yaml but we forgot to conditionalize the inclusion of
    the /etc/corosync/corosync.conf bind mount. In fact this bind mount is
    not needed since we started using RHEL/CentOS 8 (i.e. since the podman
    introduction). See I399098bf734aa3b2862e1713d4b1f429d180afbc (Fix pcmk
    remote podman bundle restarts) for more context

    Tested in a composable HA deployment where the Messaging and the
    Database roles were using PacemakerRemote and correctly deployed the
    environment (which would previously fail):
    [root@messaging-0 ~]# crm_mon -1 |grep -e database -e messaging
    RemoteOnline: [ database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
     database-0 (ocf::pacemaker:remote): Started controller-0
     database-1 (ocf::pacemaker:remote): Started controller-1
     database-2 (ocf::pacemaker:remote): Started controller-2
     messaging-0 (ocf::pacemaker:remote): Started controller-0
     messaging-1 (ocf::pacemaker:remote): Started controller-1
     messaging-2 (ocf::pacemaker:remote): Started controller-2
     galera-bundle-0 (ocf::heartbeat:galera): Master database-0
     galera-bundle-1 (ocf::heartbeat:galera): Master database-1
     galera-bundle-2 (ocf::heartbeat:galera): Master database-2
     rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started messaging-0
     rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started messaging-1
     rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started messaging-2

    Change-Id: I7766a75414bf8db75ccd233677e9ffe13ff28e23
    Closes-Bug: #1859945

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/703226

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/703226
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f90eb2caa7b0ea5cfa4b489a0be70be205e1a853
Submitter: Zuul
Branch: stable/train

commit f90eb2caa7b0ea5cfa4b489a0be70be205e1a853
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 16 10:13:07 2020 +0100

    Fix deployment on pacemaker remote nodes

    Currently an HA deployment making use of PacemakerRemote for any HA role
    will fail with the following:
    2020-01-16 08:40:22.707 33489 DEBUG paunch [ ] Start container mysql_restart_bundle as mysql_restart_bundle.
    2020-01-16 08:40:22.708 33489 DEBUG paunch [ ] Path seperator found in volume (/etc/corosync/corosync.conf), but did not exist on the file system
    2020-01-16 08:40:22.708 33489 ERROR paunch [ ] /etc/corosync/corosync.conf is not a valid volume source
    ...
    2020-01-16 08:40:53.026 33489 ERROR paunch [ ] The following containers failed validations and were not started: mysql_restart_bundle

    The reason for this is that via I92d4ddf2feeac06ce14468ae928c283f3fd04f45 (HA: fix
    <service>_restart_bundle with minor update workflow), we consolidated
    all the restart bundles into a single place inside
    containers-common.yaml but we forgot to conditionalize the inclusion of
    the /etc/corosync/corosync.conf bind mount. In fact this bind mount is
    not needed since we started using RHEL/CentOS 8 (i.e. since the podman
    introduction). See I399098bf734aa3b2862e1713d4b1f429d180afbc (Fix pcmk
    remote podman bundle restarts) for more context

    Tested in a composable HA deployment where the Messaging and the
    Database roles were using PacemakerRemote and correctly deployed the
    environment (which would previously fail):
    [root@messaging-0 ~]# crm_mon -1 |grep -e database -e messaging
    RemoteOnline: [ database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
     database-0 (ocf::pacemaker:remote): Started controller-0
     database-1 (ocf::pacemaker:remote): Started controller-1
     database-2 (ocf::pacemaker:remote): Started controller-2
     messaging-0 (ocf::pacemaker:remote): Started controller-0
     messaging-1 (ocf::pacemaker:remote): Started controller-1
     messaging-2 (ocf::pacemaker:remote): Started controller-2
     galera-bundle-0 (ocf::heartbeat:galera): Master database-0
     galera-bundle-1 (ocf::heartbeat:galera): Master database-1
     galera-bundle-2 (ocf::heartbeat:galera): Master database-2
     rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started messaging-0
     rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started messaging-1
     rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started messaging-2

    Change-Id: I7766a75414bf8db75ccd233677e9ffe13ff28e23
    Closes-Bug: #1859945
    (cherry picked from commit a30342f253bf63db00a9545aebe50187b9c0324a)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.1.0

This issue was fixed in the openstack/tripleo-heat-templates 12.1.0 release.

tags: added: stein-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/716593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/716593
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=2d9486adc0d60c522dc8c0ff3d8b443c9db396f2
Submitter: Zuul
Branch: stable/stein

commit 2d9486adc0d60c522dc8c0ff3d8b443c9db396f2
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 16 10:13:07 2020 +0100

    Fix deployment on pacemaker remote nodes

    Currently an HA deployment making use of PacemakerRemote for any HA role
    will fail with the following:
    2020-01-16 08:40:22.707 33489 DEBUG paunch [ ] Start container mysql_restart_bundle as mysql_restart_bundle.
    2020-01-16 08:40:22.708 33489 DEBUG paunch [ ] Path seperator found in volume (/etc/corosync/corosync.conf), but did not exist on the file system
    2020-01-16 08:40:22.708 33489 ERROR paunch [ ] /etc/corosync/corosync.conf is not a valid volume source
    ...
    2020-01-16 08:40:53.026 33489 ERROR paunch [ ] The following containers failed validations and were not started: mysql_restart_bundle

    The reason for this is that via I92d4ddf2feeac06ce14468ae928c283f3fd04f45 (HA: fix
    <service>_restart_bundle with minor update workflow), we consolidated
    all the restart bundles into a single place inside
    containers-common.yaml but we forgot to conditionalize the inclusion of
    the /etc/corosync/corosync.conf bind mount. In fact this bind mount is
    not needed since we started using RHEL/CentOS 8 (i.e. since the podman
    introduction). See I399098bf734aa3b2862e1713d4b1f429d180afbc (Fix pcmk
    remote podman bundle restarts) for more context

    Tested in a composable HA deployment where the Messaging and the
    Database roles were using PacemakerRemote and correctly deployed the
    environment (which would previously fail):
    [root@messaging-0 ~]# crm_mon -1 |grep -e database -e messaging
    RemoteOnline: [ database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
     database-0 (ocf::pacemaker:remote): Started controller-0
     database-1 (ocf::pacemaker:remote): Started controller-1
     database-2 (ocf::pacemaker:remote): Started controller-2
     messaging-0 (ocf::pacemaker:remote): Started controller-0
     messaging-1 (ocf::pacemaker:remote): Started controller-1
     messaging-2 (ocf::pacemaker:remote): Started controller-2
     galera-bundle-0 (ocf::heartbeat:galera): Master database-0
     galera-bundle-1 (ocf::heartbeat:galera): Master database-1
     galera-bundle-2 (ocf::heartbeat:galera): Master database-2
     rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started messaging-0
     rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started messaging-1
     rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started messaging-2

    Change-Id: I7766a75414bf8db75ccd233677e9ffe13ff28e23
    Closes-Bug: #1859945
    (cherry picked from commit a30342f253bf63db00a9545aebe50187b9c0324a)
    (cherry picked from commit f90eb2caa7b0ea5cfa4b489a0be70be205e1a853)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates stein-eol

This issue was fixed in the openstack/tripleo-heat-templates stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.