Nova operates network failed by "Maximum attempts reached" error when Cloud under heavy workload

Bug #1304481 reported by Zhi Yan Liu
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Zhi Yan Liu
neutron
Won't Fix
Undecided
Unassigned

Bug Description

In my deployment when I provision or terminate numerous VM, e.g. 100+, I got some "Maximum attempts reached" issues in Nova internal which happened on Nova operates network/neutron by neutronclient. You might check follow log snippet for details.

I propose adding a "neutron_http_retries" option to Nova, and to make each neutron client object leverage this option (in nova/network/neutronv2/api.py). IMO we could make the fix base on this change: https://review.openstack.org/#/c/71464/

IFY, in my deployment I verified that when I set http retries of neutronclient to 5 times (most time 3 is just fine) the problem is gone.

216492 2014-04-08 02:16:09.844 3777 ERROR oslo.messaging._executors.base [-] Exception during message handling
216493 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base Traceback (most recent call last):
216494 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/oslo/messaging/_executors/base.py", line 36, in _dispatch
216495 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base incoming.reply(self.callback(incoming.ctxt, incoming.message))
216496 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in __call__
216497 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base return self._dispatch(endpoint, method, ctxt, args)
216498 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 92, in _dispatch
216499 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base result = getattr(endpoint, method)(ctxt, **new_args)
216500 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/exception.py", line 88, in wrapped
216501 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base payload)
216502 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
216503 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base six.reraise(self.type_, self.value, self.tb)
216504 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/exception.py", line 71, in wrapped
216505 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base return f(self, context, *args, **kw)
216506 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 249, in decorated_function
216507 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base pass
216508 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
216509 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base six.reraise(self.type_, self.value, self.tb)
216510 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 235, in decorated_function
216511 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base return function(self, context, *args, **kwargs)
216512 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 300, in decorated_function
216513 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base function(self, context, *args, **kwargs)
216514 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 277, in decorated_function
216515 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base e, sys.exc_info())
216516 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
216517 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base six.reraise(self.type_, self.value, self.tb)
216518 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 264, in decorated_function
216519 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base return function(self, context, *args, **kwargs)
216520 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1894, in run_instance
216521 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base do_run_instance()
216522 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 249, in inner
216523 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base return f(*args, **kwargs)
216524 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1893, in do_run_instance
216525 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base legacy_bdm_in_spec)
216526 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1008, in _run_instance
216527 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base notify("error", msg=unicode(e)) # notify that build failed
216528 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
216529 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base six.reraise(self.type_, self.value, self.tb)
216530 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 992, in _run_instance
216531 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base instance, image_meta, legacy_bdm_in_spec)
216532 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1110, in _build_instance
216533 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base filter_properties, bdms, legacy_bdm_in_spec)
216534 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1154, in _reschedule_or_error
216535 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base self._log_original_error(exc_info, instance_uuid)
216536 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
216537 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base six.reraise(self.type_, self.value, self.tb)
216538 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1149, in _reschedule_or_error
216539 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base bdms, requested_networks)
216540 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1920, in _shutdown_instance
216541 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base network_info = self._get_instance_nw_info(context, instance)
216542 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 920, in _get_instance_nw_info
216543 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base instance)
216544 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 48, in wrapper
216545 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base res = f(self, context, *args, **kwargs)
216546 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/network/neutronv2/api.py", line 451, in get_instance_nw_info
216547 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base result = self._get_instance_nw_info(context, instance, networks)
216548 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/network/neutronv2/api.py", line 458, in _get_instance_nw_info
216549 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base nw_info = self._build_network_info_model(context, instance, networks)
216550 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/nova/network/neutronv2/api.py", line 1022, in _build_network_info_model
216551 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base data = client.list_ports(**search_opts)
216552 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 111, in with_params
216553 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base ret = self.function(instance, *args, **kwargs)
216554 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 306, in list_ports
216555 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base **_params)
216556 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1250, in list
216557 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base for r in self._pagination(collection, path, **params):
216558 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1263, in _pagination
216559 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base res = self.get(path, params=params)
216560 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1236, in get
216561 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base headers=headers, params=params)
216562 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1228, in retry_request
216563 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base raise exceptions.ConnectionFailed(reason=_("Maximum attempts reached"))
216564 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base ConnectionFailed: Connection to neutron failed: Maximum attempts reached
216565 2014-04-08 02:16:09.844 3777 TRACE oslo.messaging._executors.base
216566 2014-04-08 02:16:09.847 3777 ERROR oslo.messaging._drivers.common [-] Returning exception Connection to neutron failed: Maximum attempts reached to caller

Tags: network
Zhi Yan Liu (lzy-dev)
Changed in nova:
assignee: nobody → Zhi Yan Liu (lzy-dev)
Tracy Jones (tjones-i)
tags: added: network
Revision history for this message
Joe Gordon (jogo) wrote :

This isn't a nova issue directly, nova is failing to reach neutron. So adding neutron to this bug.

Changed in nova:
status: New → Invalid
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This is a known issue with neutron having lock wait timeout issues.
We have opened bugs for all the instances of lock wait timeouts observed in upstream testing.

closing this one as targeted fixes are already being proposed for the relevant bugs.

Changed in neutron:
status: New → Invalid
status: Invalid → Won't Fix
Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :

Hi @Joe Gordon, I agreed this is not a bug for nova directly as I mentioned in above report, but IIUC I think follow current solution of neutron side made/making, after neutron fixes it own issues [0][1], nova side needs to leverage the capability from neutron/neutronclient to fix this "Maximum attempts reached" issue. This is a real problem from end user's POV.

Any thoughts? thanks.

[0] https://review.openstack.org/#/c/71464/
[1] https://review.openstack.org/#/c/90104/

Zhi Yan Liu (lzy-dev)
Changed in nova:
status: Invalid → New
Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :
Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :

Oops, pls ignore my above posting, wrong place.

Revision history for this message
haruka tanizawa (h-tanizawa) wrote :

Hi Salvatore

'the relevant bugs' means https://review.openstack.org/#/c/100963/ ?
If not, could you tell me that bug report or patch ?

Sean Dague (sdague)
Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.