periodic train rhel8 ovb overcloud deployment failed with Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers

Bug #1853978 reported by Marios Andreou on 2019-11-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned

Bug Description

At [1][2][3] the periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train fails during the overcloud deploy with trace like:

"+ TAGS=file",
2019-11-25 06:55:23 | "+ CONFIG='include ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers'",
2019-11-25 06:55:23 | "+ EXTRA_ARGS=",
2019-11-25 06:55:23 | "+ '[' -d /tmp/puppet-etc ']'",
2019-11-25 06:55:23 | "+ cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hieradata /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/ssl /etc/puppet",
2019-11-25 06:55:23 | "+ echo '{\"step\": 4}'",
2019-11-25 06:55:23 | "+ export FACTER_deployment_type=containers",
2019-11-25 06:55:23 | "+ FACTER_deployment_type=containers",
2019-11-25 06:55:23 | "+ set +e",
2019-11-25 06:55:23 | "+ puppet apply --verbose --detailed-exitcodes --summarize --color=false --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --tags file -e 'noop_resource('\\''package'\\''); include ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers'",
2019-11-25 06:55:23 | "Error: Facter: error while resolving custom fact \"stonith_levels\": execution of command \"crm_node -n 2> /dev/null\" failed: command not found.",
2019-11-25 06:55:23 | "Warning: ModuleLoader: module 'tripleo' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)",
2019-11-25 06:55:23 | "Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers for overcloud-novacompute-0.localdomain (line: 1, column: 27) on node overcloud-novacompute-0.localdomain",
2019-11-25 06:55:23 | "+ rc=1",
2019-11-25 06:55:23 | "+ set -e",
2019-11-25 06:55:23 | "+ set +ux"
2019-11-25 06:55:23 | ]
2019-11-25 06:55:23 | }

promotion blocker blocks rhel8 train promotions

[1] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/bc4219e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[2] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/d2f08c6/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[3] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/26ae272/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
assignee: nobody → chandan kumar (chkumar246)
milestone: none → ussuri-1
chandan kumar (chkumar246) wrote :

From passed logs: http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/f332f10/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz -> this error is still there:
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] start: '2019-11-21 17:44:52.291213'
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] secontext: unconfined_u:object_r:user_tmp_t:s0
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stderr: 'Error: Unknown repo: ''gating-repo'''
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] size: 0
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stderr_lines: <omitted>
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] state: file
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stdout: No packages were f

May be something wrong is happening there, investigating.

summary: - periodic rhel-8-ovb-3ctlr_1comp-featureset001-train fail with 'Error:
- Unknown repo: ''gating-repo'''
+ periodic rhel8 ovb overcloud deployment failed with error while
+ resolving custom fact \"stonith_levels\"
summary: - periodic rhel8 ovb overcloud deployment failed with error while
- resolving custom fact \"stonith_levels\"
+ periodic rhel8 ovb overcloud deployment failed with Could not find class
+ ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
description: updated
chandan kumar (chkumar246) wrote :

https://review.opendev.org/#/c/696019/ fixed only Error: Facter: error while resolving custom fact \"stonith_levels\": execution of command \"crm_node -n 2> /dev/null\" failed: command not found.", and other error ovn_metadata_agent_wrappers still failing

Changed in tripleo:
assignee: chandan kumar (chkumar246) → nobody
summary: - periodic rhel8 ovb overcloud deployment failed with Could not find class
- ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
+ periodic train rhel8 ovb overcloud deployment failed with Could not find
+ class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
Emilien Macchi (emilienm) wrote :

puppet-tripleo-11.3.1-0.20191122171710.bad7160.el8.noarch is installed on the overcloud nodes, it's the same problem every cycle: we need to produce a new release at each beginning of the cycle or the rpms don't get updated to the right tag.

Emilien Macchi (emilienm) wrote :

This should help:

1) First merge https://review.opendev.org/696273
2) Then release new tags: https://review.opendev.org/696147

Marios Andreou (marios-b) wrote :

The things in comment #5 merged but we haven't had a green run in the periodic yet (skips due to other issues) https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train

Trying it with testproject @ https://review.rdoproject.org/r/23869 for now

Marios Andreou (marios-b) wrote :

bump still skipping at https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train this time because image build fails for https://bugs.launchpad.net/tripleo/+bug/1854685

rechecking at https://review.rdoproject.org/r/#/c/23869/ if we get a green run i'll consider moving to fix released

Marios Andreou (marios-b) wrote :

nope still seeing the same issue @ http://logs.rdoproject.org/69/23869/1/check/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/99659fd/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

        * 2019-12-02 06:52:48 | "Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers for overcloud-novacompute-0.localdomain (line: 1, column: 27) on node overcloud-novacompute-0.localdomain",

Marios Andreou (marios-b) wrote :

for comment #8 still fails in test because the latest for train current-tripleo is

        * https://trunk.rdoproject.org/rhel8-train/current-tripleo/
        * puppet-tripleo-11.3.1-0.20191121191711.bc934d2.el8.noarch.rpm

and then node gets that

        * http://logs.rdoproject.org/69/23869/1/check/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/99659fd/logs/undercloud/var/log/extra/rpm-list.txt.gz
        * puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch

we need > 12.0 based on the new versions from comment 5. I suspect we get this on master too as the versions there are the same with

        * https://trunk.rdoproject.org/rhel8-master/current-tripleo/
        * puppet-tripleo-11.3.1-0.20191114063720.b66ee38.el8.noarch.rpm

So will we have to skip this job for a promotion to fix it? If it is train only then the severity/alert is lower but if it affects master too then its a bigger problem (for rhel8 at least)

Marios Andreou (marios-b) wrote :

per comment #8 and just checked thanks rlandy ci-testing is still getting old content so we need to promote/skip

https://trunk.rdoproject.org/rhel8-train/tripleo-ci-testing/

puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch.rpm

chandan kumar (chkumar246) wrote :

https://review.opendev.org/#/c/696273/ - creates first tag for usseri, which is a tag got created in master -> https://github.com/openstack/puppet-tripleo/commit/4db4af996cca1058f0fd4c0e707a122772b780b2 and will be used by FS01 master job.

But the failure is coming in the train job, Do we need a new tag from stable/train branch for puppet-tripleo which can be used in FS01 train job?

Emilien Macchi (emilienm) wrote :

The job has puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch on undercloud which is the correct version to pull on stable/train banch.
I also checked THT:
openstack-tripleo-heat-templates-11.3.1-0.20191129134212.8343952.el8.noarch
Which is also good.

However I see the version of puppet-tripleo on the overcloud:
puppet-tripleo-11.3.1-0.20191125170655.de4a1bc.el8.noarch
Which is a WRONG version, it's the version taken from master and not from stable/train branch.
So it's likely related to the image deployed for the overcloud, that contains wrong rpms.

description: updated
Marios Andreou (marios-b) wrote :

"Use release and dlrn_hash/tag var instead of hardcoded value" https://review.opendev.org/#/c/697423/ Change-Id: I737c6c272448eca14683b845563102afd0fc0f96 openstack/tripleo-ci

is the fix from chkumar|ruck for this per comments #12 and #13

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers