Possible network issues in rdo-cloud causing introspection failures

Bug #1824256 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
Wishlist
Ronelle Landy

Bug Description

This issue needs more debugging and information that what I am providing here, more work to be done.

Introspection failing in master due to bmc node missing ip

http://logs.rdoproject.org/98/604298/318/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/3159fad/logs/bmc-console.log

[ 103.282595] os-net-config[2876]: [2019/04/10 04:18:49 PM] [INFO] No changes required for interface: eth0
[ 103.575088] openstackbmc[2891]: /usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.16) or chardet (2.2.1) doesn't match a supported version!
[ 103.577337] openstackbmc[2891]: RequestsDependencyWarning)
[ 103.737079] openstackbmc[2891]: Traceback (most recent call last):
[ 103.739160] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 322, in <module>
[ 103.740951] openstackbmc[2891]: main()
[ 103.742694] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 317, in main
[ 103.744453] openstackbmc[2891]: os_cloud=args.os_cloud)
[ 103.746509] openstackbmc[2891]: File "/usr/local/bin/openstackbmc", line 52, in __init__
[ 103.748794] openstackbmc[2891]: address=address)
[ 103.750739] openstackbmc[2891]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/serversession.py", line 271, in __init__
[ 103.752503] openstackbmc[2891]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 103.753479] openstackbmc[2891]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 373, in _assignsocket
[ 103.754463] openstackbmc[2891]: tmpsocket.bind(server[4])
[ 103.755443] openstackbmc[2891]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 103.756442] openstackbmc[2891]: return getattr(self._sock,name)(*args)
[ 103.757425] openstackbmc[2891]: socket.error: [Errno 99] Cannot assign requested address

Failing ipmi set power state

http://logs.rdoproject.org/93/604293/166/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/1db457d/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

2019-04-11 01:24:39 | Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node 47dbac7f-0d9b-45af-a2e8-641b5fe96124 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 47dbac7f-0d9b-45af-a2e8-641b5fe96124. Error: IPMI call failed: power status.'}, {u'result': u'Node 7638a404-85d1-461b-a655-96ad8e19bdd9 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 7638a404-85d1-461b-a655-96ad8e19bdd9. Error: IPMI call failed: power status.'}, {u'result': u'Node 68b50ec8-e43d-4bc5-9cfd-fbc08bc9a779 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 68b50ec8-e43d-4bc5-9cfd-fbc08bc9a779. Error: IPMI call failed: power status.'}, {u'result': u'Node 452912ad-c247-4dca-8063-b28306de4ee9 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 452912ad-c247-4dca-8063-b28306de4ee9. Error: IPMI call failed: power status.'}], u'result': u'Failure caused by error in tasks: send_message\n\n send_message [task_ex_id=6bcb8d08-d813-4f62-b929-078b352cd795] -> Workflow failed due to message status\n [wf_ex_id=43b1173a-ccb2-4703-8aea-1be1b3484e06, idx=0]: Workflow failed due to message status\n'}

http://logs.rdoproject.org/93/604293/166/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/1db457d/logs/bmc-console.log

[ 304.614225] cloud-init[1952]: ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ 304.735875] cloud-init[1952]: 2019-04-11 00:22:36,948 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: unexpected error ['NoneType' object has no attribute 'status_code']
[ 305.738430] cloud-init[1952]: 2019-04-11 00:22:37,952 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

Revision history for this message
wes hayutin (weshayutin) wrote :

A number of engineers have seen these issues lately, the main intent here is warn infra, collect data and report details as we discover them

tags: added: promotion-blocker
removed: promo
Changed in tripleo:
importance: Critical → Medium
wes hayutin (weshayutin)
summary: - network issues in rdo-cloud causing introspection failures
+ Possible network issues in rdo-cloud causing introspection failures
Ronelle Landy (rlandy)
Changed in tripleo:
assignee: nobody → Ronelle Landy (rlandy)
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :

Looks like this is similar to https://bugs.launchpad.net/tripleo/+bug/1790127/, cshastri is looking into it.

Changed in tripleo:
status: Triaged → In Progress
importance: Medium → Critical
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

<nhicher> apevec, weshay, rlandy: ticket #1690 for network issues on openstack-nodepool tenant

Changed in tripleo:
milestone: stein-rc1 → train-1
wes hayutin (weshayutin)
Changed in tripleo:
status: In Progress → Invalid
status: Invalid → Incomplete
Revision history for this message
Marios Andreou (marios-b) wrote :

also removing promotion-blocker tag

tags: removed: promotion-blocker
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → train-rc1
Changed in tripleo:
importance: Critical → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.