NSX Plugin: UnAuthorizedRequest without retry
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
vmware-nsx |
Fix Released
|
Undecided
|
Han Zhou |
Bug Description
An occasional failure found in our production environment because of a race condition in NSX plugin, which results in neutron port creation failure. Here is the root cause.
When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie.
Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying.
However, there is a race condition that after connection is acquired but before request is issued, there can be another thread failed to authenticate and resetting the cookie (which is shared data between connections to same controller) to None (so that during retry it will know to re-login after acquiring the connection). So the first thread will then abort retrying because of the logic checking the cookie and aborting if the cookie is None.
This has high probability because there is a blocking operation after connection is acquired: the sock.connect(). It is likely the cookie being changed to None during this blocking operation.
The fix would be simply remove the abort logic, because even if the cookie is None, it would then re-login and get a new cookie before retry. Another fix could be moving the assignment of "cookie" before the socket.connect() operation.
Related logs:
(note that the log is from havana version, but the code logic is the same in upstream)
-------
2015-01-13 06:44:21,326 64499216 INFO [neutron.
2015-01-13 06:44:21,326 64499216 DEBUG [neutron.
2015-01-13 06:44:21,327 64499216 DEBUG [neutron.
2015-01-13 06:44:21,389 64496976 INFO [neutron.
2015-01-13 06:44:21,389 64496976 DEBUG [neutron.
2015-01-13 06:44:21,390 64496976 DEBUG [neutron.
2015-01-13 06:44:21,394 64499216 DEBUG [neutron.
2015-01-13 06:44:21,398 64499216 DEBUG [neutron.
2015-01-13 06:44:21,399 64499216 DEBUG [neutron.
2015-01-13 06:44:21,399 64499216 DEBUG [neutron.
2015-01-13 06:44:21,400 64496176 DEBUG [neutron.
2015-01-13 06:44:21,400 64496176 DEBUG [neutron.
2015-01-13 06:44:21,403 64496976 DEBUG [neutron.
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.
2015-01-13 06:44:21,439 57729424 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port
2015-01-13 06:44:21,584 64496176 DEBUG [neutron.
2015-01-13 06:44:21,585 64496176 DEBUG [neutron.
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.
tags: | added: vmware |
Changed in neutron: | |
assignee: | nobody → Han Zhou (zhouhan) |
affects: | neutron → vmware-nsx |
Changed in vmware-nsx: | |
status: | New → In Progress |
Changed in vmware-nsx: | |
status: | Fix Committed → Fix Released |
Please review the fix: /review. openstack. org/#/c/ 147165/
https:/