[OVN] infinite loop in ovsdb_monitor
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
High
|
Unassigned |
Bug Description
I am running the ovn sandbox, a second chassis, and neutron. I synchronize neutron database with the databases of the sandbox, run neutron-server, and possibly run a few ovs-vsctl commands on chassis to set up ovs ports.
I notice that some commands on the chassis can trigger some sort of infinite loop in neutron. For example
ovs-vsctl set open . external-
ovs-vsctl set open . external-
ovs-vsctl set open . external-
on the second chassis, will trigger transactions "in a loop" on the neutron-server:
...
Successfully bumped revision number for resource f32ac6cc (type: ports) to 571
Router 079cde19-
Running txn n=1 command(idx=0): CheckRevisionNu
Running txn n=1 command(idx=1): UpdateLRouterPo
Running txn n=1 command(idx=2): SetLRouterPortI
Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 572
Running txn n=1 command(idx=0): CheckRevisionNu
Running txn n=1 command(idx=1): SetLSwitchPortC
Running txn n=1 command(idx=2): PgDelPortCommand
Successfully bumped revision number for resource f32ac6cc (type: ports) to 572
Router 079cde19-
Running txn n=1 command(idx=0): CheckRevisionNu
Running txn n=1 command(idx=1): UpdateLRouterPo
Running txn n=1 command(idx=2): SetLRouterPortI
Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 573
Running txn n=1 command(idx=0): CheckRevisionNu
Running txn n=1 command(idx=1): SetLSwitchPortC
Running txn n=1 command(idx=2): PgDelPortCommand
...
This is not limited to the change of external-
neutron-server CPU consumption jumps to 100% and the revision_number of ports keep increasing. Restarting neutron-server fixes the issue temporarily.
I am not sure how to provide a simple reproducer because I did not found any instructions to run neutron standalone and two OVN chassis. I will investigate what is happening locally.
Version: main branch from OVN (d41a337fe3b608
It's not a blocker as long as it happens only on my laptop.
Changed in neutron: | |
importance: | Undecided → High |
It looks like PortBindingChas sisEvent is feeding itself with events.
When a PortBindingChas sisEvent is received, the revision_number is incremented, port bindings are updated (with the only change being the revision_number), which triggers a new PortBindingChas sisEvent.
When looking at the ddlog replay, the update of the revision_number in northd for logical_switch_port and logical_ router_ port, actually deletes and recreates the keys in the northbound db, which cause deletion and recreation of multicast_group and port_binding in the southbound db (so just a change of the revision_number does have some cost).