2014-10-16 09:26:20 |
Han Zhou |
description |
When sending request to NSX controller, the TCP connection is reset by remote (in our case, by Load-Balancer), which is a transient failure (could be caused by LB ratelimit). The return code is [Errno 104] Connection reset by peer.
In this case, retry should work. However, error 104 is returned immediately after sending the request, and the retry logic in NSX plugin doesn't delay at all, so continuous retries in a row failed.
From the log below we can see all the 3 requests were performed and failed in 1 ms.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
Above log is from Havana. Below is my patch based on master branch to solve the issue:
diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py
index c55ccfa..46232ea 100644
--- a/neutron/plugins/vmware/api_client/eventlet_request.py
+++ b/neutron/plugins/vmware/api_client/eventlet_request.py
@@ -17,6 +17,8 @@
import eventlet
import httplib
import urllib
+import socket
+import errno
from neutron.openstack.common import jsonutils
from neutron.openstack.common import log as logging
@@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest):
self._request_error = None
response = req
else:
+ if (isinstance(req, socket.error) and
+ req.errno == errno.ECONNRESET):
+ timeout = 0.5
LOG.info(_('[%(rid)d] Error while handling request: %(req)s'),
{'rid': self._rid(), 'req': req})
self._request_error = req |
Http connections in connection pool to NSX controller can be reset due to reasons such as when LB is in the middle, and connection idle timeout.
In such case, while recreating the connection, NSX plugin would retry the request with "next" connection, which could also be reset already because of idle timeout. This would leads to continuous retry failure and the request fails finally, with a misleading error log: "Request timeout ...".
Error log example:
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
Suggestion to the fix issue by using the newly created connection when retry instead of using the "next" connection. |
|