interrupted vlan connection after live migration

Bug #1880455 reported by norman shen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

After https://github.com/openstack/neutron/commit/efa8dd08957b5b6b1a05f0ed412ff00462a9f216 this patch, I saw unexpected vlan interruption after live migration.

The steps to reproduce the problem is simple,

first create two vm01, vm02 on compute01 and compute02 separately, then live migrate vm02 to compute01, after it completes live migrate vm02 to compute02. After this you saw vm01 cannot access vm02. And ovs-appctl dpif/dump-flows br-int saw flow from vm01 to vm02 are dropped.

I am now suspecting the following code are never executed

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L685

because for nova port are removed before delete port get called.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Norman:

Can you provide some info about the deployment you are using to test? Neutron version, deployment tool and definition, etc.

I guess, for your comment in the bug description, that you are not using firewall. Can you print the flows in both compute nodes before and after the migration?

The code you are referring [1] is executed when the OVS agent, in a polling cycle, detects that the port has been deleted (in this case, in the origin host). If you can really assure that this method is not called when the VM is moved (and the port deleted), then we have a culprit.

Can you please provide more info?

Regards.

[1]https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L685

Revision history for this message
norman shen (jshen28) wrote :

Hello,

Thank you for reply. actually we are using hybrid firewall with openvswitch (and I think it is not merged to upstream yet https://review.opendev.org/#/c/712640/, but I am guessing you can reproduce it without using any firewall at all).

The openstack is containerized and using latest Rocky.

The code path dump is attached below, please contact me if you need more info.

# before live migration
root@compute02:~# ovs-appctl dpif/dump-flows br-int | grep fa:16:3e:33:33:33
recirc_id(0),in_port(49),skb_mark(0),eth(src=fa:16:3e:33:33:33,dst=fa:16:3e:aa:aa:aa),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:4, bytes:392, used:0.505s, actions:set(tunnel(tun_id=0x270f,src=192.168.4.102,dst=192.168.4.101,ttl=64,tp_dst=4792,flags(df|key))),17
recirc_id(0),in_port(49),eth(src=fa:16:3e:33:33:33,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.30.0.33,tip=172.30.0.100,op=1/0xff,sha=fa:16:3e:33:33:33,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, actions:userspace(pid=2874133717,slow_path(action))
root@compute02:~# ovs-appctl dpif/dump-flows br-int | grep fa:16:3e:33:33:33
recirc_id(0),in_port(49),skb_mark(0),eth(src=fa:16:3e:33:33:33,dst=fa:16:3e:aa:aa:aa),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:6, bytes:588, used:0.708s, actions:set(tunnel(tun_id=0x270f,src=192.168.4.102,dst=192.168.4.101,ttl=64,tp_dst=4792,flags(df|key))),17
recirc_id(0),in_port(49),eth(src=fa:16:3e:33:33:33,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.30.0.33,tip=172.30.0.100,op=1/0xff,sha=fa:16:3e:33:33:33,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, actions:userspace(pid=2874133717,slow_path(action))
root@compute02:~# ovs-ofctl dump-flows br-int table=61,dl_dst=fa:16:3e:33:33:33 --names
cookie=0x90ab5cb199cefed4, duration=151.987s, table=61, n_packets=0, n_bytes=0, priority=12,dl_dst=fa:16:3e:33:33:33 actions=output:"qvoe371f3d5-56"
root@compute02:~# ovs-ofctl dump-flows br-int table=61,dl_dst=fa:16:3e:33:33:33 --no-names
NXST_FLOW reply (xid=0x4):
cookie=0x90ab5cb199cefed4, duration=154.296s, table=61, n_packets=0, n_bytes=0, idle_age=157, priority=12,dl_dst=fa:16:3e:33:33:33 actions=output:2542
root@compute02:~#

# after live migration
root@compute02:~# ovs-appctl dpif/dump-flows br-int | grep fa:16:3e:33:33:33
root@compute02:~# ovs-ofctl dump-flows br-int table=61,dl_dst=fa:16:3e:33:33:33 --names
cookie=0x90ab5cb199cefed4, duration=230.243s, table=61, n_packets=0, n_bytes=0, priority=12,dl_dst=fa:16:3e:33:33:33 actions=output:2542
root@compute02:~# ovs-ofctl dump-flows br-int table=61,dl_dst=fa:16:3e:33:33:33 --no-names
NXST_FLOW reply (xid=0x4):
cookie=0x90ab5cb199cefed4, duration=233.202s, table=61, n_packets=0, n_bytes=0, idle_age=236, priority=12,dl_dst=fa:16:3e:33:33:33 actions=output:2542

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

It looks for me like duplicate of https://bugs.launchpad.net/neutron/+bug/1881070 - can You check if that is correct understanding?

Revision history for this message
norman shen (jshen28) wrote :

yes, it is a duplicate....

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.