[OVN] The "neutron_sync_mode = repair" option breaks the whole cloud!

Bug #1689880 reported by Thiago Martins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Undecided
Unassigned

Bug Description

Hello,

 I'm using Ubuntu 16.04 with Kernel 4.8 (HWE), plus Ocata from Cloud Archive. Playing with Networking OVN, and planning to deploy it into production in a couple weeks.

 After deploying everything and starting using OVN, with Floating IPs, Security Groups, multiple compute nodes and everything else, I can say that it looks awesome! Way better than the "neutron-*-agents".

 However, I noted that after running:

---
 systemctl restart neutron-server
---

 Literally ALL my stacks, on all projects, becomes unreachable!!!

 The Instances could not even ping its own Floating IP anymore, and of course, the Internet.

 Also, the Internet could not reach the Instances via its Floating IPs...

 After double checking the config files, and comparing it with the doc*, I did one single change in my ml2_conf.ini, from:

---
neutron_sync_mode = repair
---

 To:

---
# neutron_sync_mode = off
---

 Then, problem solved!

 Now, I can restart the neutron-server without any problems!

 What to do if this happens again?

 Ask ALL my customers to rebuild their stacks?!

 How to REALLY repair OVN if something like this happen again?

 Now that the problem is solved, I'll keep trying to use and stress test it even more but, I'm losing confidence on Networking OVN on its current state.

* doc: https://docs.openstack.org/developer/networking-ovn/install.html

Tags: ovn
affects: neutron → networking-ovn
Revision history for this message
Numan Siddique (numansiddique) wrote :

Hi Thiago,

You can use the "neutron-ovn-db-sync-util" utility to repair the OVN database.
You can run it as "neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini"

Make sure to change the mode to repair in ml2_conf.ini or pass the neutron_sync_mode=repair in the command line.

Having said that, neutron-server should repair the OVN database as well.
The sync util and neutron-sever share the same code for sync.

Can you please share the neutron server logs to see what is happening there.

Revision history for this message
Numan Siddique (numansiddique) wrote :

I tested with the master using devstack and I couldn't reproduce the issue. I deleted the OVN NB db and restarted the neutron-server with sync mode set to repair. The OVN DB is recreated successfully.

Revision history for this message
Justin (jneese) wrote :
Download full text (5.2 KiB)

I just ran into this issue on a brand new ocata deployment made from scratch, followed official documentation. It is a bare min deployment for proof of concept.

https://docs.openstack.org/networking-ovn/ocata/install.html

1 - controller
2 - compute nodes

Using neutron-ovn-db-sync-util or "neutron_sync_mode = repair" breaks floating ips and gateways, even when deleting ovnsb_db.db first. The only fix is to do the following steps per external network that you want to fix:
1) disassociate all floating IPs (release not required)
2) clear router's external gateway
3) re-add router's external gateway - all vms now have full connectivity
4) re-attach floating ips

Things I noticed after neutron-ovn-db-sync-util is used or neutron-server is restarted with repair mode:
* Instances can still talk to each other and ping their gateways even, but no external access
* Even after things break I can sometimes still ping the router gateway IPs from other machines
* All of my associated floating IPs that stopped working show up in "ovn-nbctl show" as switch ports, but when I do the 4 steps above to fix everything they no longer show up - see below

floating IPS were 10.0.0.226 10.0.0.229, along with 2 vms without floating ips.

BROKEN:
[root@controller ~(keystone_admin)]$ ovn-nbctl show | grep 10.200.80
            addresses: ["fa:16:3e:b1:55:26 10.0.0.227"]
            addresses: ["fa:16:3e:9b:32:fe 10.0.0.226"]
            addresses: ["fa:16:3e:78:5a:04 10.0.0.229"]
            networks: ["10.200.0.227/24"]

WORKING:
[root@controller ~(keystone_admin)]$ ovn-nbctl show | grep 10.200.80
            addresses: ["fa:16:3e:b1:55:26 10.0.0.227"]
            networks: ["10.0.0.227/24"]

* When trying the repair in repair mode, I always get the same messages, they are never "fixed"

1st time
[root@controller openvswitch]# neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair
...
2017-10-04 18:06:26.350 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Router found in OVN but not in Neutron, router id=ogr-8e22bbc8-bd0c-4018-96de-ac289619ee91
2017-10-04 18:06:26.350 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Deleting the router ogr-8e22bbc8-bd0c-4018-96de-ac289619ee91 from OVN NB DB
2017-10-04 18:06:26.351 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Router found in OVN but not in Neutron, router id=ogr-2b37fdc2-64da-4521-9dc8-5bc5e5ea158d
2017-10-04 18:06:26.351 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Deleting the router ogr-2b37fdc2-64da-4521-9dc8-5bc5e5ea158d from OVN NB DB
2017-10-04 18:06:26.352 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Router found in OVN but not in Neutron, router id=ogr-ba2e6a92-76e8-44e1-9504-5839a3d0fe7f
2017-10-04 18:06:26.352 6923 WARNING networking_ovn.ovn_db_sync [req-4cebd9e8-a4f1-438f-8d1e-370804eb9ca5 - - - - -] Deleting the router ogr-ba2e6a92-76e8-44e1-9504-5839a3d0fe7f from OVN NB DB
2017-10-04 ...

Read more...

Revision history for this message
Justin (jneese) wrote :

Forgot to mention this was centos 7 dist

Changed in networking-ovn:
importance: Undecided → High
Revision history for this message
Alan Perker (alanperker007) wrote :

I was recently encountering many errors while logging in to the gmail from my laptop. I was literally searching for the fix and then I got help from https://gmailtechnicalsupportnumbers.com/fix-gmail-error-codes-messages/ where it is clearly said about the fix.

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

The OVN driver now lives in the neutron repository. Moving this bug to their tracker.

tags: added: ovn
no longer affects: networking-ovn
summary: - The "neutron_sync_mode = repair" option breaks the whole cloud!
+ [OVN] The "neutron_sync_mode = repair" option breaks the whole cloud!
Revision history for this message
Brian Haley (brian-haley) wrote :

Is this still an issue in Zed?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Closing this bug, this issue is no longer happening.

Changed in neutron:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.