train minor update job times out reading overcloud_deployment_result.json

Bug #1911451 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Critical
Unassigned

Bug Description

At [1][2][3] the tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train times out after a successful deployment (and before performing the minor update). It hangs trying to include_vars from "{{ local_working_dir }}/overcloud_deployment_result.json":

 2021-01-12 20:35:50.644339 | primary | TASK [ensure the deployment result has been read into memory] ******************
 2021-01-12 20:35:50.644360 | primary | Tuesday 12 January 2021 20:35:50 +0000 (0:00:15.227) 0:19:32.951 *******
 2021-01-12 22:25:48.189821 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]

The failing task is at [4] and the file it is trying to read seems OK at [5]. I suspect some issue with "localhost" vs "undercloud" and ssh but not tracked it down yet. For some reason this affects only train, e.g. victoria all green at [6]. This is a train gate blocker.

[1] https://2cee87dfabf00a3eba01-5dd750a0358fa670bf85ccc2ef690f1c.ssl.cf1.rackcdn.com/769744/1/gate/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/647c611/job-output.txt
[2] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_55a/761413/4/check/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/55a9e97/job-output.txt
[3] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9da/770184/2/check/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/9da224a/job-output.txt
[4] https://opendev.org/openstack/tripleo-quickstart-extras/src/commit/deef6e9641a80cf195dbf5d93bf7d69afa431e05/playbooks/multinode-overcloud.yml#L16
[5] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_55a/761413/4/check/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/55a9e97/logs/quickstart_files/overcloud_deployment_result.json
[6] https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-victoria

Tags: alert ci
Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

thanks rlandy and sshnaidm for the hint yesterday in irc, this seems to be duplicate for https://bugs.launchpad.net/tripleo/+bug/1883843

going to try the single playbook approach here too

Revision history for this message
Marios Andreou (marios-b) wrote :

posted https://review.opendev.org/c/openstack/tripleo-ci/+/770766 Switch tripleo-ci-base-multinode job to use single playbook

and tests @

  https://review.rdoproject.org/r/31555 Run tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train

  https://review.rdoproject.org/r/31556 Run victoria undercloud/overcloud upgrade with single playbook

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

Looks like the alternative at tripleo-quickstart-extras/+/770924 ie avoiding the use of delegate_to is not working according to the testproject run at https://review.rdoproject.org/r/31555 (train test) - we have the same behaviour

        * https://logserver.rdoproject.org/55/31555/2/check/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/e7fb664/job-output.txt
  * 2021-01-15 11:43:21.521529 | primary | TASK [ensure the deployment result has been read into memory] ******************
 2021-01-15 11:43:21.521972 | primary | Friday 15 January 2021 11:43:21 +0000 (0:00:11.067) 0:16:51.825 ********
 2021-01-15 13:42:33.099810 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]

Revision history for this message
Marios Andreou (marios-b) wrote :

followup from comment #5

sshnaidm had some success with v3 at https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/770924

at https://review.rdoproject.org/r/#/c/31555/ => tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train SUCCESS in 1h 25m 44s

we now need to decide how to proceed just added some comments at https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/770924/1/playbooks/multinode-overcloud.yml#16 https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/770924/3/playbooks/multinode-overcloud.yml#b13 & posted v4 of https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/770924 ... not sure if it is acceptable for this task to only execute in CI ?

Revision history for this message
Marios Andreou (marios-b) wrote :

and followup from comment #6... actually the task wasn't executed as far as i can see in the test so I am not so sure anymore that v3 https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/770924 (and same with v4 based on that) works ... i can't find any evidence of the "ensure the deployment result has been read into memory" being executed at all...

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Sorin Sbârnea <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/770766
Reason: Abandoning as I suppose that was a leftover. Resurrect it if not.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.