Possible network issues in rdo-cloud causing introspection failures

Bug #1824256 reported by wes hayutin on 2019-04-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Ronelle Landy

Bug Description

This issue needs more debugging and information that what I am providing here, more work to be done.

Introspection failing in master due to bmc node missing ip

http://logs.rdoproject.org/98/604298/318/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/3159fad/logs/bmc-console.log

[ 103.282595] os-net-config[2876]: [2019/04/10 04:18:49 PM] [INFO] No changes required for interface: eth0
[ 103.575088] openstackbmc[2891]: /usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.16) or chardet (2.2.1) doesn't match a supported version!
[ 103.577337] openstackbmc[2891]: RequestsDependencyWarning)
[ 103.737079] openstackbmc[2891]: Traceback (most recent call last):
[ 103.739160] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 322, in <module>
[ 103.740951] openstackbmc[2891]: main()
[ 103.742694] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 317, in main
[ 103.744453] openstackbmc[2891]: os_cloud=args.os_cloud)
[ 103.746509] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 52, in __init__
[ 103.748794] openstackbmc[2891]: address=address)
[ 103.750739] openstackbmc[2891]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/serversession.py", line 271, in __init__
[ 103.752503] openstackbmc[2891]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 103.753479] openstackbmc[2891]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 373, in _assignsocket
[ 103.754463] openstackbmc[2891]: tmpsocket.bind(server[4])
[ 103.755443] openstackbmc[2891]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 103.756442] openstackbmc[2891]: return getattr(self._sock,name)(*args)
[ 103.757425] openstackbmc[2891]: socket.error: [Errno 99] Cannot assign requested address

Failing ipmi set power state

http://logs.rdoproject.org/93/604293/166/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/1db457d/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

2019-04-11 01:24:39 | Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node 47dbac7f-0d9b-45af-a2e8-641b5fe96124 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 47dbac7f-0d9b-45af-a2e8-641b5fe96124. Error: IPMI call failed: power status.'}, {u'result': u'Node 7638a404-85d1-461b-a655-96ad8e19bdd9 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 7638a404-85d1-461b-a655-96ad8e19bdd9. Error: IPMI call failed: power status.'}, {u'result': u'Node 68b50ec8-e43d-4bc5-9cfd-fbc08bc9a779 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 68b50ec8-e43d-4bc5-9cfd-fbc08bc9a779. Error: IPMI call failed: power status.'}, {u'result': u'Node 452912ad-c247-4dca-8063-b28306de4ee9 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 452912ad-c247-4dca-8063-b28306de4ee9. Error: IPMI call failed: power status.'}], u'result': u'Failure caused by error in tasks: send_message\n\n send_message [task_ex_id=6bcb8d08-d813-4f62-b929-078b352cd795] -> Workflow failed due to message status\n [wf_ex_id=43b1173a-ccb2-4703-8aea-1be1b3484e06, idx=0]: Workflow failed due to message status\n'}

http://logs.rdoproject.org/93/604293/166/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/1db457d/logs/bmc-console.log

[ 304.614225] cloud-init[1952]: ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ 304.735875] cloud-init[1952]: 2019-04-11 00:22:36,948 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: unexpected error ['NoneType' object has no attribute 'status_code']
[ 305.738430] cloud-init[1952]: 2019-04-11 00:22:37,952 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

wes hayutin (weshayutin) wrote :

A number of engineers have seen these issues lately, the main intent here is warn infra, collect data and report details as we discover them

tags: added: promotion-blocker
removed: promo
Changed in tripleo:
importance: Critical → Medium
wes hayutin (weshayutin) on 2019-04-11
summary: - network issues in rdo-cloud causing introspection failures
+ Possible network issues in rdo-cloud causing introspection failures
Ronelle Landy (rlandy) on 2019-04-11
Changed in tripleo:
assignee: nobody → Ronelle Landy (rlandy)
Quique Llorente (quiquell) wrote :

Looks like this is similar to https://bugs.launchpad.net/tripleo/+bug/1790127/, cshastri is looking into it.

Changed in tripleo:
status: Triaged → In Progress
importance: Medium → Critical
Ronelle Landy (rlandy) wrote :

<nhicher> apevec, weshay, rlandy: ticket #1690 for network issues on openstack-nodepool tenant

Changed in tripleo:
milestone: stein-rc1 → train-1
wes hayutin (weshayutin) on 2019-05-15
Changed in tripleo:
status: In Progress → Invalid
status: Invalid → Incomplete
Marios Andreou (marios-b) wrote :

also removing promotion-blocker tag

tags: removed: promotion-blocker
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers