ovb jobs broken because pacemaker is unconfigured

Bug #1818994 reported by Juan Antonio Osorio Robles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Juan Antonio Osorio Robles

Bug Description

The error that we're seeing is:

2019-03-07 10:09:00 | "Error: Evaluation Error: Error while evaluating a Function Call, The 'hacluster_pwd' hiera key is undefined, did you forget to include ::tripleo::profile::base::pacemaker in your role? (file: /etc/puppet/modules/tripleo/manifests/profile/base/pacemaker.pp, line: 94, column: 5) on node overcloud-controller-2.localdomain", [1]

And checking the services:

http://logs.rdoproject.org/89/641589/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/feb7adf/logs/overcloud-controller-0/etc/puppet/hieradata/service_names.json.txt.gz

it seems that pacemaker is not configured...even though in this jobs we are clearly enabling docker-ha.yaml

[1] http://logs.rdoproject.org/89/641589/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/feb7adf/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → stein-3
tags: added: alert
tags: added: ci
Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: Triaged → In Progress
tags: added: promotion-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/641660
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=0786ed26d79342df35d35a952c8a496d989b8d75
Submitter: Zuul
Branch: master

commit 0786ed26d79342df35d35a952c8a496d989b8d75
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Thu Mar 7 13:18:46 2019 +0000

    Revert "Modify roles to remove unused services"

    This reverts commit 52fb87ce976d78450f141716cec779cd1daba134.

    The patch resulted in pacemaker not being configured properly, and
    making the OVB jobs fail.

    Closes-Bug: #1818994
    Change-Id: I5b852ad87a2c79b04ca860ce181bdccd1641247c

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Rafael Folco (rafaelfolco) wrote :
Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
Rafael Folco (rafaelfolco) wrote :

Job used openstack-tripleo-common-10.4.1-0.20190308095406.2616a99.el7.noarch, which is commit 2616a99a2b428c4c543f13f165f466642de87366 and it has the revert patch #641660 https://review.openstack.org/#/c/641660/ supposed to solve it, but actually did not fix the issue.

commit 2616a99a2b428c4c543f13f165f466642de87366
Merge: e509e1f4 0786ed26
Author: Zuul <email address hidden>
Date: Fri Mar 8 09:46:36 2019 +0000

    Merge "Revert "Modify roles to remove unused services""

2019-03-10 17:16:01 | "Error: Evaluation Error: Error while evaluating a Function Call, The 'hacluster_pwd' hiera key is undefined, did you forget to include ::tripleo::profile::base::pacemaker in your role? (file: /etc/puppet/modules/tripleo/manifests/profile/base/pacemaker.pp, line: 94, column: 5) on node overcloud-controller-1.localdomain",

Revision history for this message
yatin (yatinkarel) wrote :

<<< Job used openstack-tripleo-common-10.4.1-0.20190308095406.2616a99.el7.noarch, which is commit <<<2616a99a2b428c4c543f13f165f466642de87366 and it has the revert patch #641660 https://review.openstack.org/#/c/641660/ supposed to solve it, but actually did not fix the issue.

>>> tripleo-common package in this case is used from mistral containers and there it's older version(without the Fix https://review.openstack.org/#/c/641660/):-
openstack-tripleo-common-10.4.1-0.20190307155827.957c9bd.el7.noarch

so looks like new container-build-push job is not working as Expected(build/pushed container images both tripleo-ci-testing and version-hashed should contain packages from tripleo-ci-testing repo)

Revision history for this message
Marios Andreou (marios-b) wrote :

o/ ykarel we should be using tripleo-ci-testing but we also could have missed something. It was added in https://tree.taiga.io/project/tripleo-ci-board/task/773 https://review.rdoproject.org/r/#/c/19114/ https://review.openstack.org/#/c/638652/

or it isn't about tripleo-ci-testing repos and I misunderstood the comment

Revision history for this message
Marios Andreou (marios-b) wrote :

thanks ykarel for more discussion on irc.

Looks like the re-tag for tripleo-ci-testing is not happening. For example using skopeo inspect we can see that docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-rabbitmq:tripleo-ci-testing is created "Created": "2019-03-07T22:38:02.787234492Z", vs the most recently built one in docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-rabbitmq:1ac63709436a0230f547040e4a514470a3c19d78_9c2c4c8f "Created": "2019-03-11T15:00:25.259008772Z",

It looks like this is not working https://review.openstack.org/#/c/641348/13/playbooks/tripleo-buildcontainers/run.yaml@89 i.e. the conditional.

Anyway filed a different bug for this so we don't confuse things ==> @ https://bugs.launchpad.net/tripleo/+bug/1819583

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Just a side note, I personally find such mismatching for packages in containers quite annoying to handle on a constant basis. Shall we switch to building containers ad-hoc for the CI jobs perhaps? and ideally leverage that registry as a one-time artifact for the check/gate pipeline in the progress, so other jobs there could share it?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

...to building containers ad-hoc for the CI jobs - and ditching that update packages in containers scripts

tags: added: containers tech-debt
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

10:58:16 - sshnaidm: bogdando, would be easier just to rebuild all containers every N hours and just download them in jobs

Revision history for this message
Marios Andreou (marios-b) wrote :

@ykarel does it matter that we are not tagging container images with 'tripleo-ci-testing'? please see comment https://bugs.launchpad.net/tripleo/+bug/1819583/comments/2 ... containers *are* being pushed with the tripleo-ci-testing delorean hash but not with tripleo-ci-testing. Do we care? if not we'll remove the retag alltogether.

@bogdando that's what the containers build periodic job discussed in https://bugs.launchpad.net/tripleo/+bug/1819583 is doing. Every 12 hrs i believe twice/day @ https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic&job_name=periodic-tripleo-centos-7-master-containers-build-push.... i.e. build containers and push them tagged with the tripleo-ci-testing repo delorean hash. the periodic jobs that use them use that hash when pulling the containers.

Or we missed something and we need both @ykarel?

Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

So, the issue we're experiencing is related with our tagging of containers. This bug has been fixed by the revert though.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.5.0

This issue was fixed in the openstack/tripleo-common 10.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.