[dvr+l3ha] north-south traffic not working when VM and main router are not on the same host

Bug #1945306 reported by Hua Zhang
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Triaged
High
Unassigned

Bug Description

Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on the l3ha + dvr env, this problem can be easily reproduced as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken.

Here are steps to easily reproduce it.

1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby)
2, create a test vm, query host by: nova show <VM> |grep host
3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid)
4, make sure VM and main router are not on the same host
5, on main router host, it will fail to run: ip netns exec snat-xxx ping <VM-IP> -c1

I've done some bisect, I found:

15.3.4 (bionic-train) - no problem
1c2e10f859 - no problem
16.4.0 (bionic-ussuri) - has problem
16.0.0-0ubuntu3 - has problem, and also have multiple active routers problem
16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test
16.1.0 (focal) - has problem, and also have multiple active routers problem
16.2.0 (focal) - has problem
16.3.0 (focal) - has problem
16.4.0 (focal-ussuri) - has problem
focal-wallaby - has problem

Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect.

I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows.

train - works
sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/
tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/
tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/
tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get icmp reply

ussuri - not work
sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/
tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx can't get icmp reply
tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/ - VM can't get sg-xxx's arp reply
tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get arp reply

It looks like VM can't get arp reply for sg-xxx interface,

Hua Zhang (zhhuabj)
description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote :

I think I'm able to reproduce this on master (neutron commit ae4d8a0c20). I used a two-host ml2/ovs devstack setup:

devstack0 - all in one
local.conf excerpt:

[[local|localrc]]
Q_DVR_MODE=dvr_snat
[[post-config|/etc/neutron/neutron.conf]]
[DEFAULT]
router_distributed = True
l3_ha = True
l3_ha_net_cidr = 169.254.192.0/18
max_l3_agents_per_router = 2
[[post-config|/etc/neutron/plugins/ml2/ml2_conf.ini]]
enable_distributed_routing = True
l2_population = True
[[post-config|/etc/neutron/l3_agent.ini]]
[DEFAULT]
agent_mode = dvr_snat
ha_vrrp_auth_password = password
ha_vrrp_health_check_interval = 0

devstack0a - compute
local.conf excerpt:

[[local|localrc]]
Q_DVR_MODE=dvr
[[post-config|/etc/neutron/neutron.conf]]
[DEFAULT]
router_distributed = True
[[post-config|/etc/neutron/plugins/ml2/ml2_conf.ini]]
[agent]
enable_distributed_routing = True
l2_population = True
[[post-config|/etc/neutron/l3_agent.ini]]
[DEFAULT]
agent_mode = dvr

Then opened up the default security group totally:

project_id="$( openstack project show "$OS_PROJECT_NAME" | awk '/ id / { print $4 }' )"
default_sg_id="$( neutron security-group-list --tenant-id "$project_id" | awk '/ default / { print $2 }' )"
openstack security group rule list "$default_sg_id"
openstack security group rule list "$default_sg_id" | egrep -w None | egrep -wv 'None.*None.*None' | awk '{ print $2 }' | xargs -r openstack security group rule delete
neutron security-group-rule-create --direction ingress --ethertype IPv4 "$default_sg_id"
neutron security-group-rule-create --direction ingress --ethertype IPv6 "$default_sg_id"
openstack security group rule list "$default_sg_id"

devstack's default router1 was indeed in dvr+l3ha mode:

$ openstack router show router1 -f table -c ha -c distributed
+-------------+-------+
| Field | Value |
+-------------+-------+
| distributed | True |
| ha | True |
+-------------+-------+

Booted a vm on the connected private network:
$ openstack server create --image cirros-0.5.2-x86_64-disk --flavor cirros256 --nic net-id=private --availability-zone :devstack0a vm0 --wait

Took its address and pinged it:
$ openstack server show vm0 -f yaml -c addresses
$ sudo ip netns exec snat-$( openstack router show router1 -f value -c id ) ping -c3 10.0.0.55

And got no response.

While pinging on the relevant subnet's sg interface tcpdump got this:
$ sudo ip netns exec snat-$( openstack router show router1 -f value -c id ) tcpdump -i sg-7a37d0b0-e6 -n -vvv
tcpdump: listening on sg-7a37d0b0-e6, link-type EN10MB (Ethernet), capture size 262144 bytes
^C13:03:57.204512 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.55 tell 10.0.0.45, length 28
13:03:58.228329 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.55 tell 10.0.0.45, length 28
13:03:59.252240 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.55 tell 10.0.0.45, length 28
13:04:00.276460 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.55 tell 10.0.0.45, length 28
13:04:01.300116 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.55 tell 10.0.0.45, length 28

5 packets captured
5 packets received by filter
0 packets dropped by kernel

Changed in neutron:
status: New → Triaged
importance: Undecided → High
tags: added: l3-dvr-backlog l3-ha
Revision history for this message
Hua Zhang (zhhuabj) wrote :
Download full text (4.0 KiB)

@Bence, thank you for confirming the problem.

and we also did some debugging work, we found so far:

1, There seems to be no problem with the flow, because 'ovs-dpctl dump-flows' on stein and ussuri are the same when ping vm from sg-xxx

# ussuri
recirc_id(0),in_port(13),ct_state(-trk),eth(src=fa:16:3e:d3:6f:80),eth_type(0x0800),ipv4(src=192.168.21.151,proto=6,frag=no), packets:24, bytes:2560, used:0.396s, flags:SP., actions:ct(zone=3),recirc(0x4b)

#stein
recirc_id(0),in_port(12),ct_state(-trk),eth(src=fa:16:3e:4c:29:6d),eth_type(0x0800),ipv4(src=192.168.21.5,proto=6,frag=no), packets:3271, bytes:307846, used:1.656s, flags:SP., actions:ct(zone=3),recirc(0x1)

and we also reviewed the flow in the whole path: vm -> qrouter-xxx -> br-int -> br-tun -> vxlan-xxx, pls see - https://pastebin.ubuntu.com/p/vzNjb3JT5W/

2, There seems to be no problem with conntrack, because 'conntrack -L | grep mark=1 |grep 192.168.21' is empty when ping vm from sg-xxx

3, There seems to be no problem with the route

# ip netns exec qrouter-7a918e87-1cc8-4252-87d1-e84b4c12c616 ip rule list |grep 192
3232240897: from 192.168.21.1/24 lookup 3232240897
# ip netns exec qrouter-7a918e87-1cc8-4252-87d1-e84b4c12c616 ip route list table 3232240897
default via 192.168.21.168 dev qr-218eed51-82 proto static

4, There seem to be some differences in firewall rules on stein and ussuri, ussuri has table 94

ovs-appctl ofproto/trace br-int 'in_port=9,ip,nw_proto=1,nw_src=192.168.21.151,nw_dst=192.168.21.168,dl_src=fa:16:3e:d3:6f:80,dl_dst=fa:16:3e:34:9e:64' --ct-next 'trk,est'

ussuri - https://pastebin.ubuntu.com/p/tSQXQFfPBw/
stein - https://pastebin.ubuntu.com/p/ZTfXd6rVZ9/

ussuri has the following flow rule so that it didn't go through br-tun

94. reg6=0x3,dl_dst=fa:16:3e:34:9e:64, priority 12, cookie 0x8a4738b01717a42e
    output:8

so then we get the following flow rule for sg-xxx interface on compute node.

root@juju-21f0ba-focal-10:/home/ubuntu# ovs-ofctl dump-flows br-int | grep fa:16:3e:34:9e:64
 cookie=0x8a4738b01717a42e, duration=13204.575s, table=1, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:34:9e:64 actions=mod_dl_src:fa:16:3e:5e:d6:96,resubmit(,60)
 cookie=0x8a4738b01717a42e, duration=13204.573s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:34:9e:64 actions=strip_vlan,output:8
 cookie=0x8a4738b01717a42e, duration=13202.646s, table=94, n_packets=12485, n_bytes=1063359, idle_age=0, priority=12,reg6=0x3,dl_dst=fa:16:3e:34:9e:64 actions=output:8
 cookie=0x8a4738b01717a42e, duration=13202.646s, table=94, n_packets=0, n_bytes=0, idle_age=65534, priority=10,reg6=0x3,dl_src=fa:16:3e:34:9e:64,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:3,output:2

root@juju-824e75-train2-8:~# ovs-ofctl dump-flows br-int | grep fa:16:3e:6b:60:7d
 cookie=0x4580a22bf3824b00, duration=13818.613s, table=1, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:6b:60:7d actions=mod_dl_src:fa:16:3e:4b:f5:19,resubmit(,60)
 cookie=0x4580a22bf3824b00, duration=13818.611s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=20,dl_vlan=3,dl_dst=fa:16:3e:6b:60:7d actions=s...

Read more...

Revision history for this message
Hua Zhang (zhhuabj) wrote :

I have found the root cause.

It works after I run the following command

sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int "table=94, n_packets=12485, n_bytes=1063359, idle_age=0, priority=12,reg6=0x3,dl_dst=fa:16:3e:34:9e:64"

This rule is introduced by this patch - https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1096

So it's a regression for lp bug https://bugs.launchpad.net/neutron/+bug/1732067

tags: added: sts
summary: - north-south traffic not working when VM and main router are not on the
- same host
+ [dvr+l3ha] north-south traffic not working when VM and main router are
+ not on the same host
Revision history for this message
Trent Lloyd (lathiat) wrote :

From:
https://docs.openstack.org/releasenotes/neutron/queens.html

It states:
"A new config option explicitly_egress_direct, with default value False, was added for the aim of distinguishing clouds which are running the network node mixed with compute services, upstream neutron CI should be an example. In such situation, this explicitly_egress_direct should be set to False, because there are numerous cases from HA routers which can not be covered, particularly when you have centralized floating IPs running in such mixed hosts."

The documentation is not super clear when it says "which are running the network node mixed with compute services". I think the original use case simply meant running the neutron l3 routers and compute on the same node (even in non-dvr, non-ha mode) but without dvr-snat (which has always been possible, although for the Ubuntu use case: charm-neutron-gateway did not have a way to do it).

But seems that might also include a DVR environment (which effectively mixes network and compute functions more by design). In that case, it seems explicitly_egress_direct=True should not be used. Neutron should probably prevent this combination.

charm-neutron-openvswitch now always enables explicitly_egress_direct=True on usurri which is a bug if the above is true:
https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/798072
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1931696

Revision history for this message
Hua Zhang (zhhuabj) wrote :

Setting explicitly_egress_direct=false can fix this problem for dvr_snat+l3ha situation, but it will introduce another problem for offload situation (see lp bug 1931696)

Now the patch [1] is trying to allow setting explicitly_egress_direct to False while also allowing offload to work. Not resolved yet but see info in https://bugs.launchpad.net/neutron/+bug/1931696/comments/8.

[1] https://review.opendev.org/c/openstack/neutron/+/812641

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Just for info, on why the problem is not observed on Train by zhhuabj.

The deployment tool used sets the configuration explicitly_egress_direct as True only from Ussuri release. Since the train release has default value for explicitly_egress_direct which is False, the problem is not observed.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I have reverted the charm patch that sets explicitly_egress_direct=True (see bug 1931696) and that will fix this issue. I am also proposing to revert the neutron patch from bug 1897637 that breaks networking when offload is enabled. I will use also use bug 1931696 for the revert.

Revision history for this message
LIU Yulong (dragon889) wrote (last edit ):

The patch of https://review.opendev.org/c/openstack/neutron/+/666991 which introduced the config option ``explicitly_egress_direct=True/False`` had fixed the following problems:
1. the egress flooding issue on br-int when enable openvswitch(openflow) security group driver
https://bugs.launchpad.net/neutron/+bug/1732067

2. fix the east-west traffic broken of dvr
https://bugs.launchpad.net/neutron/+bug/1831534 (this bug is for VLAN network, but the issue is not vlan only).

3. fix some potential ingress flood issue on br-int

And I had put some issues here as well:
https://bugs.launchpad.net/neutron/+bug/1934666/comments/5

So, not use explicitly_egress_direct=True, you have to face these issues.

Another thing is that as I said in the release note before, do not use ``explicitly_egress_direct=True`` in host which enable dvr_snat and compute service. There are too many cases need to cover, please try to combine the following cases for DVR:
1. vlan/vxlan
2. dvr/dvr+ha
3. agent mode(dvr, dvr_snat, dvr_no_external)
4. east-west traffic and north-south traffic with the Scenario of src and dest in or not in same host
5. IPv6
6. allowed_address_pair
7. enable/disabl openflow firewall
8. HA router failover
The final cases is too many to cover.

And FYI, we had mark that dvr_snat + compute services is not supported.
https://review.opendev.org/c/openstack/neutron/+/801503

Revision history for this message
Hua Zhang (zhhuabj) wrote :

I also did some tests to confirm whether any manual intervention is required when the flag is toggled from true to false [1].

The flow [2] will not be deleted by the following tests.

1, update SG member

PROJECT_ID=$(openstack project show --domain admin_domain admin -f value -c id)
SECGRP_ID=$(openstack security group list --project ${PROJECT_ID} | awk '/default/ {print $2}')
openstack security group rule create ${SECGRP_ID} --protocol tcp --dst-port 2222 --ingress

2, update port SG

openstack port set f9977d83-dade-4849-b0a5-b151c9812e94 --no-security-group
openstack port set f9977d83-dade-4849-b0a5-b151c9812e94 --disable-port-security
openstack port set f9977d83-dade-4849-b0a5-b151c9812e94 --enable-port-security
openstack port set f9977d83-dade-4849-b0a5-b151c9812e94 --security-group $SECGRP_ID

3, update network SG

openstack network set private --disable-port-security
openstack network set private --enable-port-security

4, restart neutron-openvswitch-agent.service

Unless I restart the machine, but rebooting machine might not be acceptable in production, so removing the following flows by hand may be a workaround to get it up with the config change with minimal destruction, it can avoid restarting the machine.

# fa:16:3e:7a:11:7d is mac of sg-xxx interface
# ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 'priority=12|priority=10'
 cookie=0xf6202ec41ea7282d, duration=329.516s, table=94, n_packets=0, n_bytes=0, idle_age=333, priority=12,reg6=0x2,dl_dst=fa:16:3e:7a:11:7d actions=output:5
 cookie=0xf6202ec41ea7282d, duration=329.516s, table=94, n_packets=0, n_bytes=0, idle_age=333, priority=10,reg6=0x2,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:2,output:2

[1] https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/813407
[2] https://bugs.launchpad.net/neutron/+bug/1945306/comments/3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/814733

Revision history for this message
Hua Zhang (zhhuabj) wrote :

Follow up my above comment #9, I've found a better workaround to delete two egress direct flows without restarting machine when the flag is toggled from true to false, see - https://bugs.launchpad.net/neutron/+bug/1948656/comments/3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/814733
Reason: Please, feel free to restore the patch, address the comments and rebase the patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.