periodic OVB train centos 7 failing ovb-manage No server with a name or ID

Bug #1891179 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2] the periodic centos *7* train fs1 and at [3] fs2-train-upload jobs are failing during creation of the OVB environment with trace like

        2020-08-10 12:21:34.475631 | TASK [ovb-manage : Attach instance to provision OVB network]
        2020-08-10 12:21:39.637852 | primary | No server with a name or ID of '3424d7e6-3dfe-41ed-a99f-7f4240dc093b' exists.
        2020-08-10 12:21:37.539681 | primary | ERROR
        2020-08-10 12:21:37.540105 | primary | {
        2020-08-10 12:21:37.540205 | primary | "delta": "0:00:02.382104",
        2020-08-10 12:21:37.540271 | primary | "end": "2020-08-10 12:21:39.708015",
        2020-08-10 12:21:37.540397 | primary | "msg": "non-zero return code",
        2020-08-10 12:21:37.540464 | primary | "rc": 1,
        2020-08-10 12:21:37.540523 | primary | "start": "2020-08-10 12:21:37.325911"
        2020-08-10 12:21:37.540580 | primary | }

This blocks the centos 7 train promotion to current-tripleo [4]

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable2-centos7/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train/5479602/job-output.txt
[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable2-centos7/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train/b972c84/job-output.txt
[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable2-centos7/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-train-upload/a2ed9b0/job-output.txt
[4] http://38.102.83.109/config/CentOS-7/train.ini

Revision history for this message
Alex Schultz (alex-schultz) wrote :

just for others who wander across this, ovb-manage is a role in rdoproject. https://github.com/rdo-infra/review.rdoproject.org-config/tree/master/roles/ovb-manage

Revision history for this message
Alex Schultz (alex-schultz) wrote :

So the error message comes from when they are trying to attach the undercloud to the public network.

https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/roles/ovb-manage/tasks/ovb-create-stack.yml#L90-L92

This adds a network to the undercloud_uuid. The undercloud_uuid is queried from the metadata url (evidently)

https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/roles/ovb-manage/tasks/find_undercloud_uuid.yml#L2-L8

IMHO this points to the metadata not returning the correct UUID or the request is being handled incorrectly. Perhaps we used to get the instance id from a metadata file and now we're using the metadata request?

Revision history for this message
Ronelle Landy (rlandy) wrote :

Rerunning:

periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train
periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train
periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-train-upload
periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-train
periodic-tripleo-ci-centos-7-ovb-1ctlr_1cellctrl_1comp-featureset063-train

^^ none of these jobs show that issue. Infra hitch?

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

rlandy tests @

16:38 < rlandy> marios|ruck: https://review.rdoproject.org/r/#/c/28705/

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

still a thing today buildset [1] and example at [2]

        * 2020-08-16 12:06:17.237192 | TASK [ovb-manage : Attach instance to provision OVB network]
2020-08-16 12:06:20.352113 | primary | No server with a name or ID of '9c3fabb9-b96a-42c7-9f80-cd9b316fd1af' exists.
2020-08-16 12:06:20.801618 | primary | ERROR

Looking at dashboard [3] it might be related to the heat stacks per comment #6 above

[1] https://review.rdoproject.org/zuul/buildset/b4ff6c16c67d46b5b3e2c183c0e94902
[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable2-centos7/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-train-upload/a30aa0f/job-output.txt
[3] http://dashboard-ci.tripleo.org/d/wb8HBhrWk/cockpit?orgId=1&fullscreen&panelId=231

Revision history for this message
Marios Andreou (marios-b) wrote :

10:10 < ykarel> marios|ruck, https://review.rdoproject.org/r/28982 should clear : No server with a name or ID issue in train ovb

Revision history for this message
yatin (yatinkarel) wrote :

<< IMHO this points to the metadata not returning the correct UUID or the request is being handled incorrectly. Perhaps we used to get the instance id from a metadata file and now we're using the metadata request?

Actually this happened due to mismatch in nodeset used and cloud_name used, so undercloud was launched in vexxhost but due to wrong cloud_name setting undercloud_uuid was searched in rdocloud and failed. Proposed https://review.rdoproject.org/r/28982 to fix cloud_name for train centos7 ovb jobs.

Revision history for this message
Marios Andreou (marios-b) wrote :

so the review from ykarel at [1] is now merged and it looks like the timing was right since the latest run at [2] seems to be free of this error. However the ovb jobs are still in error there but it appears to be a new issue [3] so we'll file a new bug for that one.

I think this can be closed.

    * 2020-08-17 13:57:44 | The action raised an exception [action_ex_id=9b08c7fb-827c-4e1c-8396-286a4153684f, msg='[Errno 18] Invalid cross-device link', action_cls='<class 'mistral.actions.action_factory.AnsibleGenerateInventoryAction'>', attributes='{}', params='{u'work_dir': u'/var/lib/mistral/overcloud', u'ansible_python_interpreter': None, u'ansible_ssh_user': u'tripleo-admin', u'undercloud_key_file': u'/var/lib/mistral/.ssh/tripleo-admin-rsa', u'plan_name': u'overcloud', u'ssh_network': u'ctlplane'}']

[1] https://review.rdoproject.org/r/#/c/28982/
[2] https://review.rdoproject.org/zuul/buildset/435b17a1c46d44b19362bf0c36d2f25a
[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable2-centos7/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-train-upload/8a9c52f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.