NSX Plugin: UnAuthorizedRequest without retry

Bug #1410805 reported by Han Zhou on 2015-01-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vmware-nsx
Undecided
Han Zhou

Bug Description

An occasional failure found in our production environment because of a race condition in NSX plugin, which results in neutron port creation failure. Here is the root cause.

When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie.
Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying.

However, there is a race condition that after connection is acquired but before request is issued, there can be another thread failed to authenticate and resetting the cookie (which is shared data between connections to same controller) to None (so that during retry it will know to re-login after acquiring the connection). So the first thread will then abort retrying because of the logic checking the cookie and aborting if the cookie is None.

This has high probability because there is a blocking operation after connection is acquired: the sock.connect(). It is likely the cookie being changed to None during this blocking operation.

The fix would be simply remove the abort logic, because even if the cookie is None, it would then re-login and get a new cookie before retry. Another fix could be moving the assignment of "cookie" before the socket.connect() operation.

Related logs:
(note that the log is from havana version, but the code logic is the same in upstream)
---------------------------------------------------------------------
2015-01-13 06:44:21,326 64499216 INFO [neutron.plugins.nicira.api_client.client] [5672] Connection https://xxx:443 idle for 1614.74 seconds; reconnecting.
2015-01-13 06:44:21,326 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,327 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
2015-01-13 06:44:21,389 64496976 INFO [neutron.plugins.nicira.api_client.client] [5673] Connection https://xxx:443 idle for 5833.87 seconds; reconnecting.
2015-01-13 06:44:21,389 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Acquired connection https://xxx:443. 13 connection(s) available.
2015-01-13 06:44:21,390 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Issuing - request GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus
2015-01-13 06:44:21,394 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,398 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 401 (0.07 seconds)
2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 13 connection(s) available.
2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Issuing - request POST https://xxx:443//ws.v1/login
2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,403 64496976 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Completed request 'GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 (0.05 seconds)
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Released connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5673] Completed request 'GET /ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401
2015-01-13 06:44:21,439 57729424 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port

2015-01-13 06:44:21,584 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Completed request 'POST https://xxx:443//ws.v1/login': 200 (0.18 seconds)
2015-01-13 06:44:21,585 64496176 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5674] Completed request 'POST /ws.v1/login': 200
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.client_eventlet] Saving new authentication cookie 'nvp_sessionid=15d4602ff7fb3dc34c00901ef80a7cc4; Path=/; secure'
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 (0.11 seconds)
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 15 connection(s) available.
2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5672] Completed request 'GET /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200

Han Zhou (zhouhan) on 2015-01-14
tags: added: vmware
Han Zhou (zhouhan) on 2015-01-14
Changed in neutron:
assignee: nobody → Han Zhou (zhouhan)
affects: neutron → vmware-nsx
Changed in vmware-nsx:
status: New → In Progress
Han Zhou (zhouhan) wrote :

Reviewed: https://review.openstack.org/147165
Committed: https://git.openstack.org/cgit/stackforge/vmware-nsx/commit/?id=89beca8f87056cf5bac90e4456f4297d2f0d92e8
Submitter: Jenkins
Branch: master

commit 89beca8f87056cf5bac90e4456f4297d2f0d92e8
Author: Han Zhou <email address hidden>
Date: Wed Jan 14 14:10:56 2015 +0000

    Fix retry logic for UnAuthorizedRequest in race-condition

    Authentication cookie can be set to None by another thread thus
    cause current thread retry get aborted by mistake.

    The fix is removing the abort logic, because even if the cookie is
    None, it would then re-login and get a new cookie before retry.

    Change-Id: I8b94de41fa2f3be17d317fd80e4becdad9285f7d
    Closes-Bug: 1410805

Changed in vmware-nsx:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers