Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with 'Error: Unable to establish IPMI v2 / RMCP+ session\n'

Bug #1900949 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
Critical
Unassigned

Bug Description

Description:-

Ovb jobs are failing on master branch, Imported nodes are not transitioning to manageable state with 'Error: Unable to establish IPMI v2 / RMCP+ session\n'

Affected jobs:-

periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master
periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master
periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master

Logs:-

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/fef9133/job-output.txt

~~~
2020-10-22 02:01:56.168099 | primary | TASK [overcloud-prep-images : Wait until nodes will be manageable] *************
2020-10-22 02:01:56.170670 | primary | Thursday 22 October 2020 02:01:56 +0000 (0:00:00.121) 0:01:07.345 ******
2020-10-22 02:01:59.672773 | primary | FAILED - RETRYING: Wait until nodes will be manageable (10 retries left).
2020-10-22 02:02:32.564128 | primary | FAILED - RETRYING: Wait until nodes will be manageable (9 retries left).
2020-10-22 02:03:05.401209 | primary | FAILED - RETRYING: Wait until nodes will be manageable (8 retries left).
2020-10-22 02:03:38.298295 | primary | FAILED - RETRYING: Wait until nodes will be manageable (7 retries left).
2020-10-22 02:04:11.198026 | primary | FAILED - RETRYING: Wait until nodes will be manageable (6 retries left).
2020-10-22 02:04:44.192785 | primary | FAILED - RETRYING: Wait until nodes will be manageable (5 retries left).
2020-10-22 02:05:17.153385 | primary | FAILED - RETRYING: Wait until nodes will be manageable (4 retries left).
2020-10-22 02:05:50.138978 | primary | FAILED - RETRYING: Wait until nodes will be manageable (3 retries left).
2020-10-22 02:06:23.151856 | primary | FAILED - RETRYING: Wait until nodes will be manageable (2 retries left).
2020-10-22 02:06:56.063303 | primary | FAILED - RETRYING: Wait until nodes will be manageable (1 retries left).
2020-10-22 02:07:29.069739 | primary | fatal: [undercloud]: FAILED! => {
2020-10-22 02:07:29.069775 | primary | "attempts": 10,
2020-10-22 02:07:29.069783 | primary | "changed": false,
2020-10-22 02:07:29.069789 | primary | "cmd": "set -o pipefail && openstack --os-cloud undercloud baremetal node list -f value -c \"Provisioning State\" | grep -v -e manageable -e available",
2020-10-22 02:07:29.069796 | primary | "delta": "0:00:02.637190",
2020-10-22 02:07:29.069802 | primary | "end": "2020-10-22 02:07:29.047163",
2020-10-22 02:07:29.069808 | primary | "failed_when_result": true,
2020-10-22 02:07:29.069815 | primary | "rc": 0,
2020-10-22 02:07:29.069820 | primary | "start": "2020-10-22 02:07:26.409973"
2020-10-22 02:07:29.069826 | primary | }
2020-10-22 02:07:29.069832 | primary |
2020-10-22 02:07:29.069838 | primary | STDOUT:
2020-10-22 02:07:29.069844 | primary |
2020-10-22 02:07:29.069849 | primary | enroll
2020-10-22 02:07:29.069855 | primary | enroll
2020-10-22 02:07:29.069860 | primary |
2020-10-22 02:07:29.069866 | primary |
2020-10-22 02:07:29.069872 | primary | STDERR:
2020-10-22 02:07:29.069878 | primary |
2020-10-22 02:07:29.069883 | primary | /usr/lib64/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
~~~

Other examples:-

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/2b51790/job-output.txt

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/1e7e855/job-output.txt

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

In bmc log:

[ 83.096478] openstackbmc[9312]: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://phx2.cloud.rdoproject.org:13000/v3/auth/tokens: HTTPSConnectionPool(host='phx2.cloud.rdoproject.org', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2609a03450>: Failed to establish a new connection: [Errno -2] Name or service not known',))

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/fef9133/logs/bmc-console.log

Seems like DNS problem on BMC host.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Download full text (3.4 KiB)

In other job there is a different failure:

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/e2a8445/logs/bmc-console.log

[ 2992.583377] openstackbmc[9221]: Traceback (most recent call last):
[ 2992.583729] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 175, in handle_raw_request
[ 2992.591755] openstackbmc[9221]: return self.get_chassis_status(session)
[ 2992.591962] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 91, in get_chassis_status
[ 2992.592623] openstackbmc[9221]: powerstate = self.get_power_state()
[ 2992.592911] openstackbmc[9221]: File "/usr/local/bin/openstackbmc", line 178, in get_power_state
[ 2992.599443] openstackbmc[9221]: state = self._instance_active()
[ 2992.599653] openstackbmc[9221]: File "/usr/local/bin/openstackbmc", line 163, in _instance_active
[ 2992.599946] openstackbmc[9221]: instance = self.novaclient.servers.get(self.instance)
[ 2992.600561] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 855, in get
[ 2992.600872] openstackbmc[9221]: return self._get("/servers/%s" % base.getid(server), "server")
[ 2992.601518] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 353, in _get
[ 2992.611250] openstackbmc[9221]: resp, body = self.api.client.get(url)
[ 2992.611471] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 375, in get
[ 2992.615348] openstackbmc[9221]: return self.request(url, 'GET', **kwargs)
[ 2992.615553] openstackbmc[9221]: File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 78, in request
[ 2992.615856] openstackbmc[9221]: raise exceptions.from_response(resp, body, url, method)
[ 2992.622285] openstackbmc[9221]: ClientException: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-8ecf7656-6f37-4525-b60d-5dec7a17dd54)

And ironic and openvswitch errors:

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/e2a8445/logs/undercloud/var/log/extra/errors.txt.txt.gz

2020-10-21 18:33:07.913 ERROR /var/log/containers/ironic/ironic-conductor.log: 8 ERROR ironic.drivers.modules.ipmitool [req-00fa43f1-14a8-407d-a301-9e0ea0e00da4 57d3e8dce1894e0595a167631727f7a3 f08d5f81df9647eba972c6a6faeaf7f3 - default default] IPMI Error while attempting "ipmitool -I lanplus -H 192.168.100.173 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmpi42i22zj power status" for node 7c28dd1b-ee41-4dc9-bdff-87ca137a36b3. Error: Unexpected error while running command.
2020-10-21 18:33:07.914 ERROR /var/log/containers/ironic/ironic-conductor.log: 8 ERROR ironic.conductor.manager [req-00fa43f1-14a8-407d-a301-9e0ea0e00da4 57d3e8dce1894e0595a167631727f7a3 f08d5f81df9647eba972c6a6faeaf7f3 - default default] Failed to get power state for node 7c28dd1b-ee41-4dc9-bdff-87ca137a36b3. Error: IPMI call failed: power status.: ironic.common.excep...

Read more...

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Download full text (4.1 KiB)

After investigation we can see some BMC machines don't get default GW, some of the get a few IPs on an interface.

Common errors:
[[32m OK [0m] Started Execute cloud user/final scripts.
[[32m OK [0m] Reached target Cloud-init target.
[ 78.245078] openstackbmc[9145]: Traceback (most recent call last):
[ 78.245971] openstackbmc[9145]: File "/usr/local/bin/openstackbmc", line 335, in <module>
[ 78.246607] openstackbmc[9145]: main()
[ 78.246923] openstackbmc[9145]: File "/usr/local/bin/openstackbmc", line 330, in main
[ 78.247544] openstackbmc[9145]: os_cloud=args.os_cloud)
[ 78.247864] openstackbmc[9145]: File "/usr/local/bin/openstackbmc", line 52, in __init__
[ 78.248486] openstackbmc[9145]: address=address)
[ 78.248791] openstackbmc[9145]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/serversession.py", line 277, in __init__
[ 78.249133] openstackbmc[9145]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 78.249420] openstackbmc[9145]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 377, in _assignsocket
[ 78.249736] openstackbmc[9145]: tmpsocket.bind(server[4])
[ 78.250064] openstackbmc[9145]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 78.250361] openstackbmc[9145]: return getattr(self._sock,name)(*args)
[ 78.250668] openstackbmc[9145]: socket.error: [Errno 99] Cannot assign requested address
[ 78.540080] openstackbmc[9144]: Traceback (most recent call last):
[ 78.540396] openstackbmc[9144]: File "/usr/local/bin/openstackbmc", line 335, in <module>
[ 78.540689] openstackbmc[9144]: main()
[ 78.541024] openstackbmc[9144]: File "/usr/local/bin/openstackbmc", line 330, in main
[ 78.541330] openstackbmc[9144]: os_cloud=args.os_cloud)
[ 78.541644] openstackbmc[9144]: File "/usr/local/bin/openstackbmc", line 52, in __init__
[ 78.541974] openstackbmc[9144]: address=address)
[ 78.542611] openstackbmc[9144]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/serversession.py", line 277, in __init__
[ 78.542935] openstackbmc[9144]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 78.543566] openstackbmc[9144]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 377, in _assignsocket
[ 78.543891] openstackbmc[9144]: tmpsocket.bind(server[4])
[ 78.544523] openstackbmc[9144]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 78.544844] openstackbmc[9144]: return getattr(self._sock,name)(*args)
[ 78.545479] openstackbmc[9144]: socket.error: [Errno 99] Cannot assign requested address
[ 78.640063] openstackbmc[9143]: Traceback (most recent call last):
[ 78.640709] openstackbmc[9143]: File "/usr/local/bin/openstackbmc", line 335, in <module>
[ 78.640995] openstackbmc[9143]: main()
[ 78.641611] openstackbmc[9143]: File "/usr/local/bin/openstackbmc", line 330, in main
[ 78.641931] openstackbmc[9143]: os_cloud=args.os_cloud)
[ 78.642547] openstackbmc[9143]: File "/usr/local/bin/openstackbmc", line 52, in __init__
[ 78.642872] openstackbmc[9143]: address=address)
[ 78.643485] openstackbmc[9143]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/ser...

Read more...

Changed in tripleo:
milestone: victoria-rc1 → wallaby-1
Revision history for this message
Marios Andreou (marios-b) wrote :

jobs were moved to vexx - seems that there were issues with rdo cloud

i just checked fs1 that was referenced in the description for example

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/51c2621/zuul-info/inventory.yaml

        cloud: vexxhost-nodepool-tripleo

going to close this out as won't fix please move back if you disagree.

Changed in tripleo:
status: Triaged → Won't Fix
status: Won't Fix → Incomplete
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.