ovn ml2 mechanism driver tcp connectors

Bug #1604064 reported by Daniel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Undecided
Richard Theis
neutron
Invalid
Undecided
Unassigned

Bug Description

Bug description:
When a TCP connection from the OVN ml2 mechanism driver dies (in my scenario, this is due to a UCARP fail over) a new TCP connection does not get generated for port monitoring.

Reproduction steps:
1. Set up UCARP between 2 nodes
2. Set OVN north database and south database on both nodes
3. Point the ml2 driver to the UCARP address (north and south ports)
4. Point the ovn-controllers to the UCARP address (south database port)
5. Boot a VM
6. View VM entries in the north database and south database OVN tables
7. See that port status is UP in north database
8. See that Neutron still has status of VM as down

**Temporary solution is to reboot neutron-server, thus resetting the TCP connections
**I have not verified the problem is TCP connections, but it's currently my best guess.

Linux Version: Ubuntu 14.04

Tags: ovn ml2
Revision history for this message
Ryan Moats (rmoats) wrote :

This may have neutron pieces that need to be fixed, but the defect as written should also include the networking-ovn project.

Also, removed the ovn tag because that's not valid. Does neutron have a pure ml2 tag now?

Revision history for this message
Russell Bryant (russellb) wrote :

Do you know that the plugin actually sees the connection go down, and it just doesn't try to reconnect?

Revision history for this message
Daniel (dlevy) wrote :

@Russel, I'll be looking at this in more detail in about a week, but currently I'm not sure.

Revision history for this message
Russell Bryant (russellb) wrote :

I'd start with a simpler test without failover. Just try restarting the OVN database server and make sure that Neutron reconnects successfully.

Revision history for this message
Richard Theis (rtheis) wrote :

I can confirm that a simple reconnect works. Here's my test scenario.

1) Start neutron server.
2) Kill ovsdb-server processes.
3) Create a neutron network. This will fail.
4) Start ovsdb-server processes.
5) Create a neutron network. This will succeed.

However, the following reconnect scenario doesn't work:
1) Kill ovsdb-server processes.
2) Start neutron server.
3) Create a neutron network. This will fail.
4) Start ovsdb-server processes.
5) Create a neutron network. This will fail.

Revision history for this message
Richard Theis (rtheis) wrote :

I've opened https://bugs.launchpad.net/networking-ovn/+bug/1612435 for the start-up reconnect problem.

Revision history for this message
Richard Theis (rtheis) wrote :

I can recreate this failure. It appears that this is limited to the notification connection.

1) Start neutron server.
2) Kill ovsdb-server processes.
3) Create a neutron network. This will fail.
4) Start ovsdb-server processes.
5) Create a neutron network. This will succeed.
6) Launch a VM. This will fail due to the port not moving to Up status.

Changed in networking-ovn:
assignee: nobody → Richard Theis (rtheis)
status: New → Confirmed
Revision history for this message
Richard Theis (rtheis) wrote :

@Daniel in my recreate, ovn-controller went down after ovsdb-server processes were restarted. Once I started ovn-controller then VMs could be launched successfully. Can you please confirm that ovn-controller is up and marking the VM port up?

Revision history for this message
Richard Theis (rtheis) wrote :

@Daniel, I've broken down the ovn-controller crash further. Testing in both a networking-ovn Vagrant and DevStack environment, if I restart the ovsdb-server for the SB DB then ovn-controller on all compute nodes will crash. Ryan Moats mentioned that he is seeing possible "port/string corruption which could lead to the ovn-controller process segfaulting". He was planning to investigate further.

Revision history for this message
Richard Theis (rtheis) wrote :

I don't think this bug impacts neutron.

Changed in neutron:
status: New → Invalid
Revision history for this message
Richard Theis (rtheis) wrote :

This bug was reported to ovs-dev mailing list: http://openvswitch.org/pipermail/dev/2016-August/078578.html

Revision history for this message
Richard Theis (rtheis) wrote :

Using Ryan Moats' add-quiet-mode branch on https://github.com/jayhawk87/ovs.git fixed the problem listed in comment #7.

Revision history for this message
Richard Theis (rtheis) wrote :

@Daniel, can you please re-run your test using the same fix listed in comment #12?

Revision history for this message
Daniel (dlevy) wrote :

@Richard I am no longer able to reproduce the problem

Revision history for this message
Richard Theis (rtheis) wrote :

Closing this bug since Ryan's and Ben's OVN patches to remove incremental processing have merged upstream (see https://patchwork.ozlabs.org/patch/664565/ and related).

Changed in networking-ovn:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.