scenario010 jobs running on train/16.2 are failing standalone deploy - Set up group_vars

Bug #1938834 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

periodic-tripleo-ci-rhel-8-scenario010-standalone-rhos-16.2 and upstream equivalent periodic-tripleo-ci-centos-8-scenario010-standalone-train are failing standalone deploy:

2021-08-03 19:58:53.551169 | fa163e0c-5ae7-1260-1698-0000000000d1 | FATAL | Set up group_vars | undercloud | error={"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}

Full logs are included below:

https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-standalone-train/03556fe/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-scenario010-standalone-rhos-16.2/cc94078/logs/undercloud/home/zuul/standalone_deploy.log

The real error is not displayed here.

Note that in 16.2, scenario010 tests were passing before:

http://osp-trunk.hosted.upshift.rdu2.redhat.com/api-rhel8-osp16-2/api/civotes_agg_detail.html?ci_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-rhos-16.2

 aggregrate hash cd67435d88da87a3ca2084a358eda707 had passing tests.

Revision history for this message
Ronelle Landy (rlandy) wrote :
Changed in tripleo:
milestone: none → xena-3
importance: Undecided → Critical
status: New → Triaged
tags: added: ci promotion-blocker
Revision history for this message
Ronelle Landy (rlandy) wrote :

Repo compare:

[delorean-component-tripleo]
name=delorean-tripleo-ansible-36705150b1082763a9597da4e89be03fbae788a2
baseurl=http://osp-trunk.hosted.upshift.rdu2.redhat.com/rhel8-osp16-2/component/tripleo/36/70/36705150b1082763a9597da4e89be03fbae788a2_6edf151d_b2f2fc7e_d232eeaa
enabled=1
gpgcheck=0
priority=1

vs
[delorean-component-tripleo]
name=delorean-ansible-role-tripleo-modify-image-e4c84adc36e4aadee6188778d616a39e7d9bb8f5
baseurl=http://osp-trunk.hosted.upshift.rdu2.redhat.com/rhel8-osp16-2/component/tripleo/e4/c8/e4c84adc36e4aadee6188778d616a39e7d9bb8f5_3e3f659d_f0ec9b13_c57a6712
enabled=1
gpgcheck=0
priority=1

validation also changed (not sure if that matters in this case)

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

looking at zuul builds [1] and in the attached screenshot the issue started ~ 23rd July

Earliest example i can find is 23rd there [2]

        * 2021-07-23 19:56:23.927128 | fa163eb9-abe9-8c58-d720-0000000000d1 | FATAL | Set up group_vars | undercloud | error={"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}

The job isn't in the promotion criteria I just checked the train promoter and see

        # wes waive
        # - periodic-tripleo-ci-centos-8-scenario010-standalone-train
       ...
        # wes waive
        #- periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-train

[1] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-scenario010-standalone-train
[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-standalone-train/d44e997/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote :

I see an error from neutron server log [1]

          2021-08-03 19:56:52.875 17 ERROR ovsdbapp.backend.ovs_idl.idlutils [-] Unable to open stream to tcp:192.168.24.1:6641 to retrieve schema: Connection refused
...
   2021-08-03 19:57:08.829 18 INFO networking_ovn.ovsdb.impl_idl_ovn [-] Getting OvsdbNbOvnIdl for RpcWorker with retry
          2021-08-03 19:57:08.913 18 INFO ovsdbapp.backend.ovs_idl.vlog [-] tcp:192.168.24.1:6641: connected
   2021-08-03 19:57:08.928 17 INFO ovsdbapp.backend.ovs_idl.vlog [-] tcp:192.168.24.1:6641: connecting...

So it *does* manage to connect ultimately... not sure if this error is related yet.

For the failing setup group vars task, this is from [2] but git blame says it hasn't been touched in a long time [3] so I don't think it is something to do with the octavia template.

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-standalone-train/03556fe/logs/undercloud/var/log/containers/neutron/server.log.txt.gz
[2] https://opendev.org/openstack/tripleo-heat-templates/src/commit/34ea70f9089467fafae37101ada12b560ea411d3/deployment/octavia/octavia-deployment-config.j2.yaml#L218
[3] https://opendev.org/openstack/tripleo-heat-templates/blame/branch/master/deployment/octavia/octavia-deployment-config.j2.yaml

Revision history for this message
Rabi Mishra (rabi) wrote :

Looks like https://review.opendev.org/c/openstack/tripleo-heat-templates/+/791036 was merged when both scenario010 jobs have failed.

hide_sensitive_logs var was only added in ussuri with https://review.opendev.org/c/openstack/tripleo-heat-templates/+/721247

I've backported it to stable/train with https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803438. Hopefully it would fix the issue.

Revision history for this message
Rabi Mishra (rabi) wrote :

https://review.opendev.org/c/openstack/tripleo-heat-templates/+/801499 is the train patch that seems to have broken the job. I wrongly mentioned the master patch https://review.opendev.org/c/openstack/tripleo-heat-templates/+/791036 above.

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Brent Eagles (beagles) wrote :

I've proposed a revert https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803467 to the train branch. We'll resubmit an adjusted patch once the train job is stabilized.

Revision history for this message
Rabi Mishra (rabi) wrote :

> We'll resubmit an adjusted patch once the train job is stabilized.

What're we going to adjust? The missing patch in train that adds 'hide_sensitive_logs' ansible var is https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803438 and the job is green with it.

Revision history for this message
Marios Andreou (marios-b) wrote :

I agree with Rabi I think we are done here. I came to move the bug to fix-released but holding for now, let's give Brent a chance to check the latest comments when he comes online. @Brent if you agree please move to fix-released

Revision history for this message
Marios Andreou (marios-b) wrote :

moving fix released please move back if you disagree and update with more info thank you

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.