NSX-mh: bad retry behaviour on controller connection issues

Bug #1485883 reported by Salvatore Orlando on 2015-08-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Undecided
Unassigned
Juno
Undecided
Salvatore Orlando
vmware-nsx
High
Salvatore Orlando

Bug Description

If the connection to a NSX-mh controller fails - for instance because there is a network issue or the controller is unreachable - the neutron plugin keeps retrying the connection to the same controller until it times out, whereas a correct behaviour would be to try to connect to the other controllers in the cluster.

The issue can be reproduced with the following steps:
1. Three Controllers in the cluster 10.25.56.223,10.25.101.133,10.25.56.222
2. Neutron net-create dummy-1 from openstack cli
3. Vnc into controller-1, ifconfig eth0 down
4. Do neutron net-create dummy-2 from openstack cli

The API requests were forwarded to 10.25.56.223 originally. eth0 interface was shutdown on 10.25.56.223. But the requests continued to get forwarded to the same Controllers and timed out.

Changed in vmware-nsx:
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/214060

Changed in vmware-nsx:
status: New → In Progress

Reviewed: https://review.openstack.org/214060
Committed: https://git.openstack.org/cgit/openstack/vmware-nsx/commit/?id=1602baf661b7e2cd951bf1603c6e99ab9638e1b0
Submitter: Jenkins
Branch: master

commit 1602baf661b7e2cd951bf1603c6e99ab9638e1b0
Author: Salvatore Orlando <email address hidden>
Date: Tue Aug 18 00:55:32 2015 -0700

    NSX-mh: Failover controller connections on socket failures

    Upon a socket connection failure, release the current connection
    and acquire a new one to a different controller.
    This is achieved by treating socket connection failures as 503
    errors returned by the controller.

    Also, ensure an even distribution of initial connection priorities
    across controllers.

    Change-Id: I988b46a4d1f51e4ad6b22ed3d892eab6a96a3acd
    Closes-Bug: 1485883

Changed in vmware-nsx:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/216261
Committed: https://git.openstack.org/cgit/openstack/vmware-nsx/commit/?id=d8108779eaf65e04e88e958c105d6ccb7eae0ca4
Submitter: Jenkins
Branch: stable/kilo

commit d8108779eaf65e04e88e958c105d6ccb7eae0ca4
Author: Salvatore Orlando <email address hidden>
Date: Tue Aug 18 00:55:32 2015 -0700

    NSX-mh: Failover controller connections on socket failures

    Upon a socket connection failure, release the current connection
    and acquire a new one to a different controller.
    This is achieved by treating socket connection failures as 503
    errors returned by the controller.

    Also, ensure an even distribution of initial connection priorities
    across controllers.

    Cherry-picked from commit: 1602baf661b7e2cd951bf1603c6e99ab9638e1b0

    Change-Id: I988b46a4d1f51e4ad6b22ed3d892eab6a96a3acd
    Closes-Bug: 1485883

tags: added: in-stable-kilo
Alan Pevec (apevec) on 2015-11-14
Changed in neutron:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers