snat is used instead of dnat_and_snat when L3GW resides on chassis
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
Medium
|
frigo |
Bug Description
I run RHOSP 16.2 (Train) with OVN and enable_
On the same chassis I have a VM running and the L3GW port scheduled there, and a FIP is associated to the VM.
I would expect the "dnat_and_snat" NAT to be used and traffic to egress with the FIP. However, as the L3GW is scheduled there too, I see the "snat" NAT is used instead.
I think this is a bug (unless I'm wrong)
- a VM having a FIP should use this FIP for egress traffic. External firewall expect it
- the L3GW port is expected to move. If the port moves to a chassis where the traffic is already flowing using the FIP, the presence of the L3GW port should not disrupt the traffic.
* Reproduction steps
We assume 2 chassis, cpu34d and cpu35d.
# Create a router, we make sure its port is scheduled on cpu35d:
openstack router create router1 --availability-
openstack router set --external-gateway external1 --fixed-ip subnet=tenant_35 router1
openstack port show 34ef841f-
-c binding_host_id -c device_owner -c fixed_ips
+------
| Field | Value |
+------
| binding_host_id | cpu35d
| device_owner | network:
| fixed_ips | ip_address=
+------
openstack router add subnet router1 mysub
# We run a VM on cpu35d with floating IP 10.64.254.128 associated
openstack server create myserver --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu35d
openstack server add floating ip myserver 10.64.254.128
ssh cirros@
ping external
# We run another VM, on the other chassis:
openstack server create myserver2 --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu34d
openstack server add floating ip myserver2 10.64.254.135
ssh cirros@
ping external
We observe the egress traffic:
* Expected output: what did you hope to see?
from myserver: traffic coming from the fip:
IP 10.64.254.128 > external
from myserver2:
IP 10.64.254.135 > external
* Actual output:
from myserver: traffic coming from the L3GW port:
IP 10.64.245.126 > external
from myserver2:
IP 10.64.254.135 > external
* Version:
** OpenStack version RHOSP 16.2 (Train)
** RHEL 8.4
** deployed with tripleo (ovn-2021-
From OVN northdb:
router 6a1c6c1d-
port lrp-34ef841f-
mac: "fa:16:3e:22:3f:c0"
networks: ["10.64.
gateway chassis: [1126ea9a-
port lrp-e56e108f-
mac: "fa:16:3e:90:a7:3b"
networks: ["192.168.
nat 8e72663f-
external ip: "10.64.254.128"
logical ip: "192.168.200.21"
type: "dnat_and_snat"
nat b0d9f69a-
external ip: "10.64.245.126"
logical ip: "192.168.200.0/27"
type: "snat"
nat f5808a3a-
external ip: "10.64.254.135"
logical ip: "192.168.200.27"
type: "dnat_and_snat"
From ovn-trace we read:
ingress(
-------
0. lr_in_admission (northd.c:10285): eth.dst == fa:16:3e:90:a7:3b && inport == "lrp-e56e10", priority 50, uuid 3711664a
xreg0[0..47] = fa:16:3e:90:a7:3b;
next;
1. lr_in_lookup_
reg9[2] = 1;
next;
2. lr_in_learn_
next;
10. lr_in_ip_routing (northd.c:9179): ip4.dst == 0.0.0.0/0, priority 1, uuid 57ef0971
ip.ttl--;
reg8[0..15] = 0;
reg0 = 10.64.245.97;
reg1 = 10.64.245.126;
eth.src = fa:16:3e:22:3f:c0;
outport = "lrp-34ef84";
flags.loopback = 1;
next;
11. lr_in_ip_
next;
12. lr_in_policy (northd.c:10795): 1, priority 0, uuid a68ffb22
reg8[0..15] = 0;
next;
13. lr_in_policy_ecmp (northd.c:10797): reg8[0..15] == 0, priority 150, uuid c2a799d9
next;
14. lr_in_arp_resolve (northd.c:10831): ip4, priority 0, uuid 3dff7cbd
get_
/* MAC binding to 00:1c:73:00:00:11. */
next;
17. lr_in_gw_redirect (northd.c:12774): ip4.src == 192.168.200.21 && outport == "lrp-34ef84" && is_chassis_
eth.src = fa:16:3e:c3:29:10;
reg1 = 10.64.254.128;
next;
18. lr_in_arp_request (northd.c:11488): 1, priority 0, uuid 39a08290
output;
egress(
-------
0. lr_out_undnat (northd.c:12318): ip && ip4.src == 192.168.200.21 && outport == "lrp-34ef84", priority 100, uuid 8d276994
eth.src = fa:16:3e:c3:29:10;
ct_dnat;
ct_dnat /* assuming no un-dnat entry, so no change */
-------
2. lr_out_snat (northd.c:12410): ip && ip4.src == 192.168.200.0/27 && outport == "lrp-34ef84" && is_chassis_
ct_
ct_snat(
-------
4. lr_out_delivery (northd.c:11536): outport == "lrp-34ef84", priority 100, uuid d00dd976
output;
/* output to "lrp-34ef84", type "patch" */
I think what is happening is is_chassis_
tags: | added: ovn |
Changed in neutron: | |
importance: | Undecided → Medium |
I'm currently trying to reproduce this. I tried first with no AZs and I wasn't able to reproduce the problem, my results were just like the expected results. I will try it using availability zones later today.