Must run neutron-ovn-db-sync-util after launching multiple instances

Bug #1723701 reported by Justin
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
networking-ovn
Won't Fix
High
Unassigned

Bug Description

It appears that after launching multiple instances at once in OpenStack Pike with networking-ovn, ACLs are not automatically applied to the OVN Database and the below command must be run.

neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair

If I launch 24 instances some networking will be broken like dns or ldap until I run the command, I will then see:

WARNING networking_ovn.ovn_db_sync [req-ff1a38d9-aeee-412e-b78f-09e807f63b91 - - - - -] ACLs-to-be-added 20 ACLs-to-be-removed 0

If I launch only a couple of instances at a time this problem is not always present.
Not sure if it is related but I am seeing a very high load on the ovn-controller process on the compute nodes

I also see these errors in the neutron.conf log when restarting neutron-server
2017-10-14 21:31:16.449 29172 INFO networking_ovn.ml2.mech_driver [-] OVN reports status up for port: b84c33ef-92a1-4e0e-b6c4-1fa418e9aaa9
2017-10-14 21:31:16.449 29172 ERROR networking_ovn.ovsdb.ovsdb_monitor [-] Unexpected exception in notify_loop: AttributeError: 'NoneType' object has no attribute 'get_parent_port'

I will include log files

------------------------------------------------------------------
Controller -- Centos 7
$ rpm -qa | grep 'ovn-\|openvswitch'
python-networking-ovn-3.0.0-1.el7.noarch
openvswitch-ovn-central-2.7.2-3.1fc27.el7.x86_64
openvswitch-2.7.2-3.1fc27.el7.x86_64
openvswitch-ovn-common-2.7.2-3.1fc27.el7.x86_64
python2-openvswitch-2.7.2-3.1fc27.el7.noarch

/etc/neutron/neutron.conf
[DEFAULT]
verbose = True
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin
allow_overlapping_ips = True
notify_nova_on_port_status_changes = true
notify_nova_on_port_data_changes = true
auth_strategy = keystone

/etc/neutron/plugins/ml2/ml2_conf.ini
[DEFAULT]

[ml2]
mechanism_drivers = ovn
type_drivers = local,flat,vlan,geneve
tenant_network_types = geneve
extension_drivers = port_security
overlay_ip_version = 4

[ml2_type_flat]

[ml2_type_geneve]
vni_ranges = 1:65536
max_header_size = 38

[ml2_type_gre]

[ml2_type_vlan]
vlan_networks = extnet
network_vlan_ranges = extnet:30:300

[ml2_type_vxlan]

[securitygroup]
enable_security_group = true

[ovn]
ovn_nb_connection = tcp:192.168.195.210:6641
ovn_sb_connection = tcp:192.168.195.210:6642
ovn_l3_scheduler = leastloaded
------------------------------------------------------------------

------------------------------------------------------------------
Compute Nodes - Centos 7:

# rpm -qa | grep 'ovn-\|openvswitch'
openvswitch-ovn-common-2.7.2-3.1fc27.el7.x86_64
openvswitch-2.7.2-3.1fc27.el7.x86_64
openvswitch-ovn-host-2.7.2-3.1fc27.el7.x86_64

ovn-bridge-mappings="extnet:br-ex", ovn-encap-ip="192.168.195.211", ovn-encap-type=geneve, ovn-remote="tcp:192.168.195.210:6642"

------------------------------------------------------------------

Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Justin (jneese) wrote :
Revision history for this message
Numan Siddique (numansiddique) wrote :

To resolve the high load issue, could you please try OVS 2.7.3 or OVS 2.8.1. OVS 2.7.3/2.8.1 has a fix which should fix this issue.

Changed in networking-ovn:
importance: Undecided → High
milestone: none → 2015.1.1
milestone: 2015.1.1 → none
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

2.7.3 of ovs/ovn stable must fix the high load. But still, I'm not sure if the ACL issue you're finding would be fixed by that (I say that because the ovn-db-sync fixes it)

Can you try this:

[DEFAULT]
service_plugins = trunk

To see if "Unexpected exception in notify_loop: AttributeError: 'NoneType' object has no attribute 'get_parent_port'" goes away? (just a blind shot)

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

for
[DEFAULT]
service_plugins = trunk

I mean in /etc/neutron.conf

Revision history for this message
Justin (jneese) wrote :

Thank you for the reply, settings service_plugins = trunk did not resolve the get_parent_port error.

Fortunately all connectivity works, having to run repair is just a nuisance for now.

As for the high load and upgrading to openvswitch-2.7.3, it isn't available in the repos yet for centos7, I was encountering issues building from source or making the RPM so I will just wait for 2.7.3, hopefully that comes soon.

The networking-ovn project has opened a lot of doors for our company into the cloud, we appreciate all of the hard work.

Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

@Justin, could you please repost the neutron server logs with debug enabled?
Thanks a lot!
Daniel

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Thanks for reporting.

I believe this is not the case anymore, since this bug we've already implemented the maintenance mechanism that should take care of syncing the databases. Marking as WONTFIX (please re-open it if the bug is still there).

Changed in networking-ovn:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.