Comment 2 for bug 1945306

Revision history for this message
Hua Zhang (zhhuabj) wrote : Re: north-south traffic not working when VM and main router are not on the same host

@Bence, thank you for confirming the problem.

and we also did some debugging work, we found so far:

1, There seems to be no problem with the flow, because 'ovs-dpctl dump-flows' on stein and ussuri are the same when ping vm from sg-xxx

# ussuri
recirc_id(0),in_port(13),ct_state(-trk),eth(src=fa:16:3e:d3:6f:80),eth_type(0x0800),ipv4(src=192.168.21.151,proto=6,frag=no), packets:24, bytes:2560, used:0.396s, flags:SP., actions:ct(zone=3),recirc(0x4b)

#stein
recirc_id(0),in_port(12),ct_state(-trk),eth(src=fa:16:3e:4c:29:6d),eth_type(0x0800),ipv4(src=192.168.21.5,proto=6,frag=no), packets:3271, bytes:307846, used:1.656s, flags:SP., actions:ct(zone=3),recirc(0x1)

and we also reviewed the flow in the whole path: vm -> qrouter-xxx -> br-int -> br-tun -> vxlan-xxx, pls see - https://pastebin.ubuntu.com/p/vzNjb3JT5W/

2, There seems to be no problem with conntrack, because 'conntrack -L | grep mark=1 |grep 192.168.21' is empty when ping vm from sg-xxx

3, There seems to be no problem with the route

# ip netns exec qrouter-7a918e87-1cc8-4252-87d1-e84b4c12c616 ip rule list |grep 192
3232240897: from 192.168.21.1/24 lookup 3232240897
# ip netns exec qrouter-7a918e87-1cc8-4252-87d1-e84b4c12c616 ip route list table 3232240897
default via 192.168.21.168 dev qr-218eed51-82 proto static

4, There seem to be some differences in firewall rules on stein and ussuri, ussuri has table 94

ovs-appctl ofproto/trace br-int 'in_port=9,ip,nw_proto=1,nw_src=192.168.21.151,nw_dst=192.168.21.168,dl_src=fa:16:3e:d3:6f:80,dl_dst=fa:16:3e:34:9e:64' --ct-next 'trk,est'

ussuri - https://pastebin.ubuntu.com/p/tSQXQFfPBw/
stein - https://pastebin.ubuntu.com/p/ZTfXd6rVZ9/

ussuri has the following flow rule so that it didn't go through br-tun

94. reg6=0x3,dl_dst=fa:16:3e:34:9e:64, priority 12, cookie 0x8a4738b01717a42e
    output:8

so then we get the following flow rule for sg-xxx interface on compute node.

root@juju-21f0ba-focal-10:/home/ubuntu# ovs-ofctl dump-flows br-int | grep fa:16:3e:34:9e:64
 cookie=0x8a4738b01717a42e, duration=13204.575s, table=1, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:34:9e:64 actions=mod_dl_src:fa:16:3e:5e:d6:96,resubmit(,60)
 cookie=0x8a4738b01717a42e, duration=13204.573s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:34:9e:64 actions=strip_vlan,output:8
 cookie=0x8a4738b01717a42e, duration=13202.646s, table=94, n_packets=12485, n_bytes=1063359, idle_age=0, priority=12,reg6=0x3,dl_dst=fa:16:3e:34:9e:64 actions=output:8
 cookie=0x8a4738b01717a42e, duration=13202.646s, table=94, n_packets=0, n_bytes=0, idle_age=65534, priority=10,reg6=0x3,dl_src=fa:16:3e:34:9e:64,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:3,output:2

root@juju-824e75-train2-8:~# ovs-ofctl dump-flows br-int | grep fa:16:3e:6b:60:7d
 cookie=0x4580a22bf3824b00, duration=13818.613s, table=1, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:6b:60:7d actions=mod_dl_src:fa:16:3e:4b:f5:19,resubmit(,60)
 cookie=0x4580a22bf3824b00, duration=13818.611s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:6b:60:7d actions=strip_vlan,output:7

5, so now it seems table=94 is the key, 94 is ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE, it's realted to firewall, so I doubt the following commits

$ git log --oneline 1c2e10f859...16.0.0 neutron/agent/linux/openvswitch_firewall/firewall.py
6dbba8d5ce Check SG members instead of ports to skip flow update
efa8dd0895 Add accepted egress direct flow
991126eb6e Merge "[OVS FW] Clean port rules if port not found in ovsdb"
b01e0c2aa9 [OVS FW] Clean port rules if port not found in ovsdb
5cb0ff418a Add more condition to check sg member exist
a94cb83e18 Merge "Handle OVSFWPortNotFound and OVSFWTagNotFound in ovs firewall"
e801159003 Handle OVSFWPortNotFound and OVSFWTagNotFound in ovs firewall
4b67a06403 Log OVS firewall conjunction creation

and I have tried efa8dd0895, set explicitly_egress_direct=false, but the problem still exists.