ovb jobs broken because pacemaker is unconfigured

Bug #1818994 reported by Juan Antonio Osorio Robles on 2019-03-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Juan Antonio Osorio Robles

Bug Description

The error that we're seeing is:

2019-03-07 10:09:00 | "Error: Evaluation Error: Error while evaluating a Function Call, The 'hacluster_pwd' hiera key is undefined, did you forget to include ::tripleo::profile::base::pacemaker in your role? (file: /etc/puppet/modules/tripleo/manifests/profile/base/pacemaker.pp, line: 94, column: 5) on node overcloud-controller-2.localdomain", [1]

And checking the services:

http://logs.rdoproject.org/89/641589/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/feb7adf/logs/overcloud-controller-0/etc/puppet/hieradata/service_names.json.txt.gz

it seems that pacemaker is not configured...even though in this jobs we are clearly enabling docker-ha.yaml

[1] http://logs.rdoproject.org/89/641589/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/feb7adf/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → stein-3
tags: added: alert
tags: added: ci
Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: Triaged → In Progress
tags: added: promotion-blocker

Reviewed: https://review.openstack.org/641660
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=0786ed26d79342df35d35a952c8a496d989b8d75
Submitter: Zuul
Branch: master

commit 0786ed26d79342df35d35a952c8a496d989b8d75
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Thu Mar 7 13:18:46 2019 +0000

    Revert "Modify roles to remove unused services"

    This reverts commit 52fb87ce976d78450f141716cec779cd1daba134.

    The patch resulted in pacemaker not being configured properly, and
    making the OVB jobs fail.

    Closes-Bug: #1818994
    Change-Id: I5b852ad87a2c79b04ca860ce181bdccd1641247c

Changed in tripleo:
status: In Progress → Fix Released
Changed in tripleo:
status: Fix Released → Triaged
Rafael Folco (rafaelfolco) wrote :

Job used openstack-tripleo-common-10.4.1-0.20190308095406.2616a99.el7.noarch, which is commit 2616a99a2b428c4c543f13f165f466642de87366 and it has the revert patch #641660 https://review.openstack.org/#/c/641660/ supposed to solve it, but actually did not fix the issue.

commit 2616a99a2b428c4c543f13f165f466642de87366
Merge: e509e1f4 0786ed26
Author: Zuul <email address hidden>
Date: Fri Mar 8 09:46:36 2019 +0000

    Merge "Revert "Modify roles to remove unused services""

2019-03-10 17:16:01 | "Error: Evaluation Error: Error while evaluating a Function Call, The 'hacluster_pwd' hiera key is undefined, did you forget to include ::tripleo::profile::base::pacemaker in your role? (file: /etc/puppet/modules/tripleo/manifests/profile/base/pacemaker.pp, line: 94, column: 5) on node overcloud-controller-1.localdomain",

yatin (yatinkarel) wrote :

<<< Job used openstack-tripleo-common-10.4.1-0.20190308095406.2616a99.el7.noarch, which is commit <<<2616a99a2b428c4c543f13f165f466642de87366 and it has the revert patch #641660 https://review.openstack.org/#/c/641660/ supposed to solve it, but actually did not fix the issue.

>>> tripleo-common package in this case is used from mistral containers and there it's older version(without the Fix https://review.openstack.org/#/c/641660/):-
openstack-tripleo-common-10.4.1-0.20190307155827.957c9bd.el7.noarch

so looks like new container-build-push job is not working as Expected(build/pushed container images both tripleo-ci-testing and version-hashed should contain packages from tripleo-ci-testing repo)

Marios Andreou (marios-b) wrote :

o/ ykarel we should be using tripleo-ci-testing but we also could have missed something. It was added in https://tree.taiga.io/project/tripleo-ci-board/task/773 https://review.rdoproject.org/r/#/c/19114/ https://review.openstack.org/#/c/638652/

or it isn't about tripleo-ci-testing repos and I misunderstood the comment

Marios Andreou (marios-b) wrote :

thanks ykarel for more discussion on irc.

Looks like the re-tag for tripleo-ci-testing is not happening. For example using skopeo inspect we can see that docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-rabbitmq:tripleo-ci-testing is created "Created": "2019-03-07T22:38:02.787234492Z", vs the most recently built one in docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-rabbitmq:1ac63709436a0230f547040e4a514470a3c19d78_9c2c4c8f "Created": "2019-03-11T15:00:25.259008772Z",

It looks like this is not working https://review.openstack.org/#/c/641348/13/playbooks/tripleo-buildcontainers/run.yaml@89 i.e. the conditional.

Anyway filed a different bug for this so we don't confuse things ==> @ https://bugs.launchpad.net/tripleo/+bug/1819583

Bogdan Dobrelya (bogdando) wrote :

Just a side note, I personally find such mismatching for packages in containers quite annoying to handle on a constant basis. Shall we switch to building containers ad-hoc for the CI jobs perhaps? and ideally leverage that registry as a one-time artifact for the check/gate pipeline in the progress, so other jobs there could share it?

Bogdan Dobrelya (bogdando) wrote :

...to building containers ad-hoc for the CI jobs - and ditching that update packages in containers scripts

tags: added: containers tech-debt
Bogdan Dobrelya (bogdando) wrote :

10:58:16 - sshnaidm: bogdando, would be easier just to rebuild all containers every N hours and just download them in jobs

Marios Andreou (marios-b) wrote :

@ykarel does it matter that we are not tagging container images with 'tripleo-ci-testing'? please see comment https://bugs.launchpad.net/tripleo/+bug/1819583/comments/2 ... containers *are* being pushed with the tripleo-ci-testing delorean hash but not with tripleo-ci-testing. Do we care? if not we'll remove the retag alltogether.

@bogdando that's what the containers build periodic job discussed in https://bugs.launchpad.net/tripleo/+bug/1819583 is doing. Every 12 hrs i believe twice/day @ https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic&job_name=periodic-tripleo-centos-7-master-containers-build-push.... i.e. build containers and push them tagged with the tripleo-ci-testing repo delorean hash. the periodic jobs that use them use that hash when pulling the containers.

Or we missed something and we need both @ykarel?

So, the issue we're experiencing is related with our tagging of containers. This bug has been fixed by the revert though.

Changed in tripleo:
status: Triaged → Fix Released

This issue was fixed in the openstack/tripleo-common 10.5.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers