periodic train rhel8 ovb overcloud deployment failed with Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers

Bug #1853978 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3] the periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train fails during the overcloud deploy with trace like:

"+ TAGS=file",
2019-11-25 06:55:23 | "+ CONFIG='include ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers'",
2019-11-25 06:55:23 | "+ EXTRA_ARGS=",
2019-11-25 06:55:23 | "+ '[' -d /tmp/puppet-etc ']'",
2019-11-25 06:55:23 | "+ cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hieradata /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/ssl /etc/puppet",
2019-11-25 06:55:23 | "+ echo '{\"step\": 4}'",
2019-11-25 06:55:23 | "+ export FACTER_deployment_type=containers",
2019-11-25 06:55:23 | "+ FACTER_deployment_type=containers",
2019-11-25 06:55:23 | "+ set +e",
2019-11-25 06:55:23 | "+ puppet apply --verbose --detailed-exitcodes --summarize --color=false --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --tags file -e 'noop_resource('\\''package'\\''); include ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers'",
2019-11-25 06:55:23 | "Error: Facter: error while resolving custom fact \"stonith_levels\": execution of command \"crm_node -n 2> /dev/null\" failed: command not found.",
2019-11-25 06:55:23 | "Warning: ModuleLoader: module 'tripleo' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)",
2019-11-25 06:55:23 | "Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers for overcloud-novacompute-0.localdomain (line: 1, column: 27) on node overcloud-novacompute-0.localdomain",
2019-11-25 06:55:23 | "+ rc=1",
2019-11-25 06:55:23 | "+ set -e",
2019-11-25 06:55:23 | "+ set +ux"
2019-11-25 06:55:23 | ]
2019-11-25 06:55:23 | }

promotion blocker blocks rhel8 train promotions

[1] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/bc4219e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[2] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/d2f08c6/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[3] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/26ae272/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
assignee: nobody → chandan kumar (chkumar246)
milestone: none → ussuri-1
Revision history for this message
chandan kumar (chkumar246) wrote :

From passed logs: http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/f332f10/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz -> this error is still there:
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] start: '2019-11-21 17:44:52.291213'
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] secontext: unconfined_u:object_r:user_tmp_t:s0
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stderr: 'Error: Unknown repo: ''gating-repo'''
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] size: 0
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stderr_lines: <omitted>
2019-11-21 17:44:57,317 24198 INFO tripleo_common.image.image_uploader [ ] state: file
2019-11-21 17:44:57,317 24196 INFO tripleo_common.image.image_uploader [ ] stdout: No packages were f

May be something wrong is happening there, investigating.

summary: - periodic rhel-8-ovb-3ctlr_1comp-featureset001-train fail with 'Error:
- Unknown repo: ''gating-repo'''
+ periodic rhel8 ovb overcloud deployment failed with error while
+ resolving custom fact \"stonith_levels\"
summary: - periodic rhel8 ovb overcloud deployment failed with error while
- resolving custom fact \"stonith_levels\"
+ periodic rhel8 ovb overcloud deployment failed with Could not find class
+ ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
description: updated
Revision history for this message
chandan kumar (chkumar246) wrote : Re: periodic rhel8 ovb overcloud deployment failed with Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers

https://review.opendev.org/#/c/696019/ in openstack/puppet-tripleo will fix the issue.
testing here: https://review.rdoproject.org/r/23825

Revision history for this message
chandan kumar (chkumar246) wrote :

https://review.opendev.org/#/c/696019/ fixed only Error: Facter: error while resolving custom fact \"stonith_levels\": execution of command \"crm_node -n 2> /dev/null\" failed: command not found.", and other error ovn_metadata_agent_wrappers still failing

Changed in tripleo:
assignee: chandan kumar (chkumar246) → nobody
summary: - periodic rhel8 ovb overcloud deployment failed with Could not find class
- ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
+ periodic train rhel8 ovb overcloud deployment failed with Could not find
+ class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers
Revision history for this message
Emilien Macchi (emilienm) wrote :

puppet-tripleo-11.3.1-0.20191122171710.bad7160.el8.noarch is installed on the overcloud nodes, it's the same problem every cycle: we need to produce a new release at each beginning of the cycle or the rpms don't get updated to the right tag.

Revision history for this message
Emilien Macchi (emilienm) wrote :

This should help:

1) First merge https://review.opendev.org/696273
2) Then release new tags: https://review.opendev.org/696147

Revision history for this message
Marios Andreou (marios-b) wrote :

The things in comment #5 merged but we haven't had a green run in the periodic yet (skips due to other issues) https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train

Trying it with testproject @ https://review.rdoproject.org/r/23869 for now

Revision history for this message
Marios Andreou (marios-b) wrote :

bump still skipping at https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train this time because image build fails for https://bugs.launchpad.net/tripleo/+bug/1854685

rechecking at https://review.rdoproject.org/r/#/c/23869/ if we get a green run i'll consider moving to fix released

Revision history for this message
Marios Andreou (marios-b) wrote :

nope still seeing the same issue @ http://logs.rdoproject.org/69/23869/1/check/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/99659fd/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

        * 2019-12-02 06:52:48 | "Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::tripleo::profile::base::neutron::ovn_metadata_agent_wrappers for overcloud-novacompute-0.localdomain (line: 1, column: 27) on node overcloud-novacompute-0.localdomain",

Revision history for this message
Marios Andreou (marios-b) wrote :

for comment #8 still fails in test because the latest for train current-tripleo is

        * https://trunk.rdoproject.org/rhel8-train/current-tripleo/
        * puppet-tripleo-11.3.1-0.20191121191711.bc934d2.el8.noarch.rpm

and then node gets that

        * http://logs.rdoproject.org/69/23869/1/check/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/99659fd/logs/undercloud/var/log/extra/rpm-list.txt.gz
        * puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch

we need > 12.0 based on the new versions from comment 5. I suspect we get this on master too as the versions there are the same with

        * https://trunk.rdoproject.org/rhel8-master/current-tripleo/
        * puppet-tripleo-11.3.1-0.20191114063720.b66ee38.el8.noarch.rpm

So will we have to skip this job for a promotion to fix it? If it is train only then the severity/alert is lower but if it affects master too then its a bigger problem (for rhel8 at least)

Revision history for this message
Marios Andreou (marios-b) wrote :

per comment #8 and just checked thanks rlandy ci-testing is still getting old content so we need to promote/skip

https://trunk.rdoproject.org/rhel8-train/tripleo-ci-testing/

puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch.rpm

Revision history for this message
chandan kumar (chkumar246) wrote :

https://review.opendev.org/#/c/696273/ - creates first tag for usseri, which is a tag got created in master -> https://github.com/openstack/puppet-tripleo/commit/4db4af996cca1058f0fd4c0e707a122772b780b2 and will be used by FS01 master job.

But the failure is coming in the train job, Do we need a new tag from stable/train branch for puppet-tripleo which can be used in FS01 train job?

Revision history for this message
Emilien Macchi (emilienm) wrote :

The job has puppet-tripleo-11.3.1-0.20191129095904.602547e.el8.noarch on undercloud which is the correct version to pull on stable/train banch.
I also checked THT:
openstack-tripleo-heat-templates-11.3.1-0.20191129134212.8343952.el8.noarch
Which is also good.

However I see the version of puppet-tripleo on the overcloud:
puppet-tripleo-11.3.1-0.20191125170655.de4a1bc.el8.noarch
Which is a WRONG version, it's the version taken from master and not from stable/train branch.
So it's likely related to the image deployed for the overcloud, that contains wrong rpms.

description: updated
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

"Use release and dlrn_hash/tag var instead of hardcoded value" https://review.opendev.org/#/c/697423/ Change-Id: I737c6c272448eca14683b845563102afd0fc0f96 openstack/tripleo-ci

is the fix from chkumar|ruck for this per comments #12 and #13

Revision history for this message
Marios Andreou (marios-b) wrote :

@chandan after https://review.opendev.org/#/c/697423/ looks like it fails differently now do we need a new bug for that?

        * http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-master/ad73dd9/job-output.txt

        * 2019-12-08 20:35:52.180180 | primary | fatal: [undercloud]: FAILED! => {
        2019-12-08 20:35:52.180322 | primary | "attempts": 10,
        2019-12-08 20:35:52.180360 | primary | "changed": true,
        2019-12-08 20:35:52.180387 | primary | "cmd": [
        2019-12-08 20:35:52.180404 | primary | "curl",
        2019-12-08 20:35:52.180414 | primary | "-skfL",
        2019-12-08 20:35:52.180424 | primary | "http://38.145.34.141/rcm-guest/images/redhat8/master/rdo_trunk/bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536/overcloud-full.tar.md5"
        2019-12-08 20:35:52.180453 | primary | ],
        2019-12-08 20:35:52.180466 | primary | "delta": "0:00:00.022345",
        2019-12-08 20:35:52.180477 | primary | "end": "2019-12-08 20:35:52.129822",
        2019-12-08 20:35:52.180488 | primary | "rc": 22,
        2019-12-08 20:35:52.180498 | primary | "start": "2019-12-08 20:35:52.107477"
        2019-12-08 20:35:52.180561 | primary | }

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

@chkumar ACK just grabbed an rdo vm and checked a bit commented https://bugs.launchpad.net/tripleo/+bug/1855826/comments/1

this bug needs to stay open for now then

Revision history for this message
Marios Andreou (marios-b) wrote :

so https://bugs.launchpad.net/tripleo/+bug/1855826 is now fix-released

I *think* this bug is fix-released, with the fixes in https://bugs.launchpad.net/tripleo/+bug/1853978/comments/5 and then https://bugs.launchpad.net/tripleo/+bug/1853978/comments/15

But we can't get a green run to confirm it yet ... latest run seems to be hitting a new issue :/
(will file a bug in due course) like
13:14 < marios|rover> chkumar|ruck: rhel8 fs1 train
13:14 < marios|rover> chkumar|ruck:
http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/6c35249/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
 2019-12-10 19:21:14 | Error: unable to exec into nova_compute: no container with
                      name or ID nova_compute found: no such container

Revision history for this message
chandan kumar (chkumar246) wrote :

From the current run http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/3f9326f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

2019-12-11 06:37:38 | "changed": false
2019-12-11 06:37:38 | }
2019-12-11 06:37:38 |
2019-12-11 06:37:38 | MSG:
2019-12-11 06:37:38 |
2019-12-11 06:37:38 | Unable to enable service network: network.service is not a native service, redirecting to systemd-sysv-install.
2019-12-11 06:37:38 | Executing: /usr/lib/systemd/systemd-sysv-install enable network
2019-12-11 06:37:38 | failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory
2019-12-11 06:37:38 |
2019-12-11 06:37:38 | fatal: [overcloud-controller-2]: FAILED! => {
2019-12-11 06:37:38 | "changed": false
2019-12-11 06:37:38 | }
2019-12-11 06:37:38 |
2019-12-11 06:37:38 | MSG:
2019-12-11 06:37:38 |
2019-12-11 06:37:38 | Unable to enable service network: network.service is not a native service, redirecting to systemd-sysv-install.
2019-12-11 06:37:38 | Executing: /usr/lib/systemd/systemd-sysv-install enable network
2019-12-11 06:37:38 | failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory
2019-12-11 06:37:38 |

Revision history for this message
Marios Andreou (marios-b) wrote :

@chkumar ... same again in the later run after comment 20 (haven't seen the error from comment 19 again)

* http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/1753240/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

        * 2019-12-11 18:30:09 | "[2019/12/11 06:29:38 PM] [WARNING] Error in 'ip route add 0.0.0.0/0 via 10.0.0.1 dev br-ex', restarting br-ex:",
2019-12-11 18:30:09 | "Unexpected error while running command.",
2019-12-11 18:30:09 | "Command: /sbin/ip route add 0.0.0.0/0 via 10.0.0.1 dev br-ex",
2019-12-11 18:30:09 | "Exit code: 1",
2019-12-11 18:30:09 | "Stdout: ''",
2019-12-11 18:30:09 | "Stderr: 'Cannot find device \"br-ex\"\\n'",

        * 2019-12-11 18:30:09 | TASK [Ensure network service is enabled] ***************************************
        2019-12-11 18:30:09 | Wednesday 11 December 2019 18:30:08 -0500 (0:00:00.244) 0:02:46.209 ****
        2019-12-11 18:30:09 | fatal: [overcloud-controller-0]: FAILED! => {
        2019-12-11 18:30:09 | "changed": false
        2019-12-11 18:30:09 | }
        2019-12-11 18:30:09 |
        2019-12-11 18:30:09 | MSG:
        2019-12-11 18:30:09 |
        2019-12-11 18:30:09 | Unable to enable service network: network.service is not a native service, redirecting to systemd-sysv-install.
        2019-12-11 18:30:09 | Executing: /usr/lib/systemd/systemd-sysv-install enable network
        2019-12-11 18:30:09 | failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory

so indeed this seems to be blocked by https://bugs.launchpad.net/tripleo/+bug/1853028

Revision history for this message
Marios Andreou (marios-b) wrote :

well to clarify... not blocked by +bug/1853028 exactly... we think this is fixed per https://bugs.launchpad.net/tripleo/+bug/1853978/comments/19 but +bug/1853028 is blocking us from proving it ;)

Revision history for this message
Marios Andreou (marios-b) wrote :

the latest errors we are seeing in comment #20/21 above are selinux related (see https://bugs.launchpad.net/tripleo/+bug/1853028/comments/13) so we should be able to confirm this today we need to disable selinux

Revision history for this message
Marios Andreou (marios-b) wrote :

looking at

        * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train

latest fail is on https://bugs.launchpad.net/tripleo/+bug/1855826 with trace like

        * 2019-12-17 07:33:24.733905 | primary | FAILED - RETRYING: Get image expected checksum (1 retries left).
2019-12-17 07:33:40.165438 | primary | fatal: [undercloud]: FAILED! => {
        * 2019-12-17 07:33:40.165550 | primary | "http://38.145.34.141/rcm-guest/images/redhat8/train/rdo_trunk/e9d2261951f30a9fce5b5194f5149520b10932f2_7c944a5e/overcloud-full.tar.md5"

        * http://logs.rdoproject.org/89/24189/3/check/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/e6207e3/job-output.txt

i think maybe something more is needed for train? https://review.rdoproject.org/r/#/c/24058/3/zuul.d/tripleo.yaml looks like it covers it though not sure yet

Changed in tripleo:
milestone: ussuri-1 → ussuri-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/700045

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/700045
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=6969a3f5f3d059668f067d8201a07b7c82fc6dde
Submitter: Zuul
Branch: stable/train

commit 6969a3f5f3d059668f067d8201a07b7c82fc6dde
Author: Wes Hayutin <email address hidden>
Date: Thu Dec 19 11:03:45 2019 -0700

    bump metadata for new train version

    Request from RDO team to bump train
    versions.

    Related-Bug: #1853978
    Change-Id: I98de5aa974ebbe6e3e9b48555cd582e59692f755

tags: added: in-stable-train
Revision history for this message
Marios Andreou (marios-b) wrote :

we had a green run on 18th openstack-periodic-latest-released pipeline in http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/71adc35/

so i think we can call this fix-released now

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

note even though moved this to fix released the job is not stable yet... latest run failed with

        * http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-train/4d09b8c/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

        * 2019-12-19 18:32:45 | 2019-12-19 18:27:53 131238 INFO tripleo_common.image.image_uploader [ ] stderr: 'Error: Unknown repo: ''gating-repo'''

not sure if that is a new issue yet but in any case that will not be tracked in this LP.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.