galera(galera)[34]: ERROR: Could not determine galera name from pacemaker node <galera-bundle-0>

Bug #1721497 reported by Michele Baldessari on 2017-10-05
This bug affects 1 person
Affects Status Importance Assigned to Milestone
John Trowbridge

Bug Description

In we currently fail to bring up galera:

Oct 5 08:24:28 localhost galera(galera)[34]: ERROR: Could not determine galera name from pacemaker node <galera-bundle-0>.
Oct 5 08:24:28 localhost pacemaker_remoted[13]: notice: galera_start_0:34:stderr [ ocf-exit-reason:Could not determine galera name from pacemaker node <galera-bundle-0>. ]
Oct 5 08:24:28 localhost crmd[30980]: notice: Result of start operation for galera on galera-bundle-0: 6 (not configured)
Oct 5 08:24:28 localhost crmd[30980]: notice: galera-bundle-0-galera_start_0:7 [ ocf-exit-reason:Could not determine galera name from pacemaker node <galera-bundle-0>.\n ]
Oct 5 08:24:28 localhost crmd[30980]: warning: Action 37 (galera:0_start_0) on galera-bundle-0 failed (target: 0 vs. rc: 6): Error

The reason is that now that has merged it needs new pacemaker and resource agents. Those do already exist on the host:

The problem is that the following three containers (the ones with OCf resources inside) need to be rebuilt with those packages:
- rabbitmq
- mariadb/galera
- redis

Repos are here

tags: added: containers
Michele Baldessari (michele) wrote :

Once lands we should get: has pacemaker-1.1.16-12.el7_4.2.0.0.rdo1.x86_64 and resource-agents-3.9.5-105.el7.0.0.rdo1.x86_64

Martin André (mandre) wrote :

This should be fixed with

Alex Schultz (alex-schultz) wrote :

Patch is merged, moving to fixed release. If this is still a problem let's reopen it.

Changed in tripleo:
assignee: nobody → John Trowbridge (trown)
status: Triaged → Fix Released
Martin André (mandre) wrote :

Re-opened, this is still occurring in gate-tripleo-ci-centos-7-scenario004-multinode-oooq-container even with the new images from tripleomaster:

Changed in tripleo:
status: Fix Released → Confirmed
Michele Baldessari (michele) wrote :

So when I checked packages yesterday I used this one:
[root@bandini ~]# docker run -it /bin/bash -c "rpm -q pacemaker resource-agents"

But CI pulls 'passed-ci' (vs passed-ci-test which I used to look at):
[root@bandini ~]# docker run -it /bin/bash -c "rpm -q pacemaker resource-agents"

Gabriele Cerami (gcerami) wrote :

passed-ci-test was only a tag used to test container images upload after promotion. Don't consider that, I'll remove the tag from all the containers.

John Trowbridge (trown) wrote :

I think the correct thing to do is to revert

It clearly would have failed the scenario004 job, but that job did not run and is now broken.

The passed-ci-test tag on dockerhub is/was just there as some testing of the new pipeline and the full set of containers with that tag have not actually passed the CI pipeline.

If we dont want to revert the patch that actually broke this...the only other option is to just manually tag a new mariadb container that has not passed the promote CI with everything else. This seems fine(ish) for the current issue, but is a pretty bad habit to carry forward.

Changed in tripleo:
status: Confirmed → In Progress
Gabriele Cerami (gcerami) wrote :

passed-ci-test tags and its associated hash tag were removed from all the containers. Sorry for the confusion

Submitter: Jenkins
Branch: master

commit 1681d3bceb2834e8788cc4456d65a76bcf4e1e55
Author: John Trowbridge <email address hidden>
Date: Fri Oct 6 12:44:16 2017 +0000

    Revert "Set meta container-attribute-target=host attribute"

    This patch broke the containers scenario004 test because it relies on a
    newer mariadb container than has actually passed CI at this time.

    To revert this revert, we need to make sure we test
    scenario004-containers against that patch.

    This reverts commit 6bcb011723ad7b75f18914c887dc4fa4bad4d620.

    Closes-Bug: 1721497

    Change-Id: I34c7c388eed94db1735c45e26661a0af8cdce8e9

Changed in tripleo:
status: In Progress → Fix Released
John Trowbridge (trown) wrote :

Removed alert and lowered to High since the revert landed. We still need to either promote next week, or upload updated ha containers to get the original patch landed.

Changed in tripleo:
importance: Critical → High
tags: removed: alert

This issue was fixed in the openstack/puppet-tripleo 8.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers