tripleo-quickstart-promote-master-current-tripleo-delorean-minimal is failing to provision nodes - Version requested but version discovery document was not found and allow_version_hack was False

Bug #1951752 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

https://jenkins-cloudsig-ci.apps.ocp.ci.centos.org/job/tripleo-quickstart-promote-master-current-tripleo-delorean-minimal/103/console shows that the node provision step fails:

fatal: [undercloud]: FAILED! => {
    "ansible_job_id": "326603130289.97796",
    "changed": false,
    "cmd": "source /home/stack/stackrc; openstack overcloud node provision -o $PROVISION_OUTPUT --stack $PROVISION_STACK /home/stack/overcloud_baremetal_deploy.yaml >/home/stack/overcloud_node_provision.log 2>&1",

This job has had many failures .... so the last pass is not representative of when the current failure started.

The error log:

https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-103/undercloud/var/log/extra/errors.txt

shows:

2021-11-21 19:31:50.319 ERROR /var/log/containers/neutron/l3-agent.log: 50978 ERROR neutron_lib.rpc [req-86e1aa6d-e33d-4cec-9ae4-495301a52c74 - - - - -] Timeout in RPC method get_host_ha_router_count. Waiting for 12 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 9f8d11928d50490297a92af593460595
2021-11-21 19:30:49.457 ERROR /var/log/containers/neutron/openvswitch-agent.log: 51147 ERROR ovsdbapp.backend.ovs_idl.idlutils [-] Unable to open stream to tcp:127.0.0.1:6640 to retrieve schema: Connection refused
2021-11-21 19:30:55.299 ERROR /var/log/containers/neutron/ironic-neutron-agent.log: 51597 ERROR networking_baremetal.ironic_client [req-241d5be0-22ce-4b4c-a062-61bf882d7e1f - - - - -] Failed to establish a connection with ironic, reason: Version requested but version discovery document was not found and allow_version_hack was False: keystoneauth1.exceptions.discovery.DiscoveryFailure: Version requested but version discovery document was not found and allow_version_hack was False

Revision history for this message
Ronelle Landy (rlandy) wrote :

The node provision log:

https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-103/undercloud/home/stack/overcloud_node_provision.log

shows that the failure is is waiting for the nodes to boot:

2021-11-21 20:07:57.144096 | 00f4794c-8b38-106c-9a8c-00000000000a | FATAL | Wait for provisioned nodes to boot | overcloud-controller-0 | error={"ansible_facts": {"discovered_interpreter_python": "/usr/libexec/platform-python"}, "changed": false, "elapsed": 601, "msg": "Timeout waiting for provisioned nodes to become available"}

Changed in tripleo:
milestone: none → yoga-1
importance: Undecided → Critical
status: New → Triaged
tags: added: ci promotion-blocker
Revision history for this message
Ronelle Landy (rlandy) wrote :

https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-103/undercloud/var/log/containers/neutron/ironic-neutron-agent.log shows the error in the description:

2021-11-21 19:30:55.299 51597 ERROR networking_baremetal.ironic_client [req-241d5be0-22ce-4b4c-a062-61bf882d7e1f - - - - -] Failed to establish a connection with ironic, reason: Version requested but version discovery document was not found and allow_version_hack was False: keystoneauth1.exceptions.discovery.DiscoveryFailure: Version requested but version discovery document was not found and allow_version_hack was False
2021-11-21 19:30:55.302 51597 CRITICAL neutron [req-241d5be0-22ce-4b4c-a062-61bf882d7e1f - - - - -] Unhandled error: keystoneauth1.exceptions.discovery.DiscoveryFailure: Version requested but version discovery document was not found and allow_version_hack was False

Revision history for this message
Rabi Mishra (rabi) wrote :

Sounds like it's trying to use uefi boot with whole disk image and hence the node is not able to boot.

2021-11-21 19:55:37.451 7 DEBUG ironic.drivers.modules.boot_mode_utils [req-94c79c56-2277-42ce-b5d5-4e66cb44ad08 - - - - -] Cannot determine node fd4a9f2c-9085-40f6-9ad5-07db90011520 boot mode: Driver ipmi does not support get_boot_mode (disabled or not implemented). sync_boot_mode /usr/lib/python3.6/site-packages/ironic/drivers/modules/boot_mode_utils.py:105
2021-11-21 19:55:37.452 7 DEBUG ironic.drivers.modules.boot_mode_utils [req-94c79c56-2277-42ce-b5d5-4e66cb44ad08 - - - - -] Deploy boot mode is uefi for fd4a9f2c-9085-40f6-9ad5-07db90011520. get_boot_mode_for_deploy /usr/lib/python3.6/site-packages/ironic/drivers/modules/boot_mode_utils.py:279

This should have been fixed with https://review.opendev.org/c/openstack/tripleo-quickstart/+/818010 I guess? Not sure what job config this one is using.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/818837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)
Revision history for this message
Juan Badia Payno (jbadiapa) wrote :

Not sure if this is relevant:
There are several warning like this one at [1]

2021-11-21 19:45:07.301 30 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://192.168.24.3:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://192.168.24.3:5000: HTTPConnectionPool(host='192.168.24.3', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb199674ef0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))

[1] - https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-103/undercloud/var/log/containers/neutron/server.log

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/818837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/818838

Revision history for this message
Ronelle Landy (rlandy) wrote (last edit ):
Revision history for this message
Ronelle Landy (rlandy) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.