Activity log for bug #1381094

Date Who What changed Old value New value Message
2014-10-14 14:29:31 Han Zhou bug added bug
2014-10-16 09:12:13 Han Zhou summary NSX plugin request retry should delay in case of "[Errno 104] Connection reset by peer" NSX plugin request retry failure with "[Errno 104] Connection reset by peer"
2014-10-16 09:26:20 Han Zhou description When sending request to NSX controller, the TCP connection is reset by remote (in our case, by Load-Balancer), which is a transient failure (could be caused by LB ratelimit). The return code is [Errno 104] Connection reset by peer. In this case, retry should work. However, error 104 is returned immediately after sending the request, and the retry logic in NSX plugin doesn't delay at all, so continuous retries in a row failed. From the log below we can see all the 3 requests were performed and failed in 1 ms. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... Above log is from Havana. Below is my patch based on master branch to solve the issue: diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py index c55ccfa..46232ea 100644 --- a/neutron/plugins/vmware/api_client/eventlet_request.py +++ b/neutron/plugins/vmware/api_client/eventlet_request.py @@ -17,6 +17,8 @@ import eventlet import httplib import urllib +import socket +import errno from neutron.openstack.common import jsonutils from neutron.openstack.common import log as logging @@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest): self._request_error = None response = req else: + if (isinstance(req, socket.error) and + req.errno == errno.ECONNRESET): + timeout = 0.5 LOG.info(_('[%(rid)d] Error while handling request: %(req)s'), {'rid': self._rid(), 'req': req}) self._request_error = req Http connections in connection pool to NSX controller can be reset due to reasons such as when LB is in the middle, and connection idle timeout. In such case, while recreating the connection, NSX plugin would retry the request with "next" connection, which could also be reset already because of idle timeout. This would leads to continuous retry failure and the request fails finally, with a misleading error log: "Request timeout ...". Error log example: 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... Suggestion to the fix issue by using the newly created connection when retry instead of using the "next" connection.
2014-10-16 10:16:25 Han Zhou attachment added patch.txt https://bugs.launchpad.net/neutron/+bug/1381094/+attachment/4238072/+files/patch.txt
2014-10-16 10:45:16 OpenStack Infra neutron: status New In Progress
2014-10-16 10:45:16 OpenStack Infra neutron: assignee Han Zhou (zhouhan)
2014-10-16 15:04:41 OpenStack Infra neutron: status In Progress Fix Committed
2014-10-30 13:32:37 Eugene Nikanorov neutron: importance Undecided Medium
2014-10-30 13:32:45 Eugene Nikanorov neutron: milestone kilo-1
2014-12-18 14:30:59 Thierry Carrez neutron: status Fix Committed Fix Released
2015-04-30 09:33:10 Thierry Carrez neutron: milestone kilo-1 2015.1.0