[OVN] SNAT only happens for subnets directly connected to a router

Bug #2051935 reported by Giuseppe Petralia
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
High
Brian Haley
neutron (Ubuntu)
New
Undecided
Unassigned

Bug Description

I am trying to achieve the following scenario:

I have a VM attached to a router w/o external gateway (called project-router) but with a default route which send all the traffic to another router (transit router) which has an external gateway with snat enabled and it is connected to a transit network 192.168.100.0/24

My VM is on 172.16.100.0/24, traffic hits the project-router thanks to the default route gets redirected to the transit-router correctly, here it gets into the external gateway but w/o being snat.

This is because in ovn I see that SNAT on this router is only enabled for logical ip in 192.168.100.0/24 which is the subnet directly connected to the router

# ovn-nbctl lr-nat-list neutron-6d1e6bb7-3949-43d1-8dac-dc55155b9ad8
TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 147.22.16.207 192.168.100.0/24

But I would like that this router snat all the traffic that hits it, even when coming from a subnet not directly connected to it.

I can achieve this by setting in ovn the snat for 0.0.0.0/0

# ovn-nbctl lr-nat-add neutron-6d1e6bb7-3949-43d1-8dac-dc55155b9ad8 snat 147.22.16.207 0.0.0.0/0

# ovn-nbctl lr-nat-list neutron-6d1e6bb7-3949-43d1-8dac-dc55155b9ad8
TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 147.22.16.207 0.0.0.0/0
snat 147.22.16.207 192.168.100.0/24

But this workaround can be wiped if I run the neutron-ovn-db-sync-util on any of the neutron-api unit.

Is there a way to achieve this via OpenStack? If not does it make sense to have this as a new feature?

Tags: ovn
description: updated
summary: - [OVN] SNAT only happens for subnets directly connected to the router
+ [OVN] SNAT only happens for subnets directly connected to a router
Revision history for this message
Giuseppe Petralia (peppepetra) wrote :

Same scenario described on the bug works out of the box on an OpenStack environment using ML2/OVS instead of OVN

tags: added: ovn
Revision history for this message
Brian Haley (brian-haley) wrote :

I tried this on master branch and same issue. Seems to be a gap between ML2/OVS and OVN.

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Brian Haley (brian-haley) wrote :

Just for completeness, If I add a second subnet on the network and add a router interface in it, an additional snat rule does get added.

$ sudo ovn-nbctl lr-nat-list neutron-013a394e-66ad-4895-a352-e7a934d4db32
TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 172.24.4.187 10.0.0.64/26
snat 172.24.4.187 10.0.0.0/26

It's only the case of a "nested" router that is missed.

Guiseppe raised a good question - why don't we just install a single rule with a logical_ip of 0.0.0.0/0 instead of adding specific ones for each subnet?

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/907504

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Brian Haley (brian-haley) wrote :

So the one problem here is that this "nested" router does not have a gateway port, i.e. external_gateway_info == null in router show API call. That is why the functional tests are failing in the patch I proposed.

So I'm not sure this is the correct fix, will have to get more input.

Revision history for this message
Brian Haley (brian-haley) wrote :

Just to provide more info to my last comment. The external_gateway_info field contains an element called 'enable_snat', which OVN uses to add SNAT rules for attached subnets. For example (sorry for wrap):

| external_gateway_info | {"network_id": "dbfb3168-85d5-4577-b221-5168f29760f7", "external_fixed_ips": [{"subnet_id": "c6594685-dfec-4497-8b22-b78f066cb5e4", "ip_address": "172.24.4.187"}, {"subnet_id": "67f8c7c8-7215-4ffb-aa98-f34ca8780efc", |
| | "ip_address": "2001:db8::1"}], "enable_snat": true}

On a "nested" router, this field is empty:

| external_gateway_info | null |

So OVN assumes it should not provide SNAT for any subnets.

So over-riding the list of returned subnet cidrs downstream of a router does make the code add more lr-nat-list entries, but there could be more surgery required to make it work properly.

Revision history for this message
Liu Xie (liushy) wrote :

@Brian
It may cause an issue that stateless FIP would be snated by the snat '0.0.0.0/0', right?
I have raised an issue[1] on the OVN GitHub.

[1]https://github.com/ovn-org/ovn/issues/116

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :
Revision history for this message
Brian Haley (brian-haley) wrote :

Hi Rodolfo,

That issue seems related to floating IP, but this one is just default SNAT. I will look into it further though.

Thanks, Brian

Revision history for this message
Brian Haley (brian-haley) wrote :

I just tested on master branch, making sure I had that change, and it does not fix the issue. Trying to use default SNAT from a VM on a nested network does not work. Applying my proposed patch does help, although I am seeing duplicate packets, which I'll need to figure out.

I'm on the console via horizon so can't copy paste, but was just doing a ping to 8.8.8.8.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

So if I'm not wrong, the goal is to cascade two routers and from the down one to be able to reach the upper router external GW. Please provide a reproducer with the steps to create the networks, subnets, routers, attach the subnets to the routers, add the external GW to the router and add the router/subnet routes provided.

What is the external network type you are using?

BTW, we have a test that check the communication between two nested routers and the VMs on the opposite networks [1]. It's worth mentioning that none of these networks are external GW networks in any of these routers; please check how the router routes are added.

Regards.

[1]https://github.com/openstack/neutron-tempest-plugin/blob/b9681a0284501801ba09939a6a577537e11e0a9d/neutron_tempest_plugin/scenario/test_connectivity.py#L73

Revision history for this message
Brian Haley (brian-haley) wrote :

Rodolfo - I will post a reproducer, my devstack went sideways that I was testing on, but it clearly showed the issue. I only had to add a single route on the external gateway router, and a default route on the nested router going to the internal interface of that router.

Cascading routers like this is perfectly normal, and something customers do today. And everything works fine if you attach a floating IP, it's only default SNAT that is broken.

The other thing to note is that this works fine with ML2/OVS, as the router with the external gateway will SNAT everything that arrives on its internal interface, irregardless of the source IP. OVN changes this by only programming these SNAT rules for subnets directly attached to the router. I didn't feel like programming a SNAT rule for 0.0.0.0/0 was a good idea, which is why I proposed the patch.

Revision history for this message
Brian Haley (brian-haley) wrote :
Download full text (5.3 KiB)

Ok, here is the reproducer steps Rodolfo.

Starting with a fresh devstack install from master, and assuming something like:

1) router1 - which is the external gateway for 'private-subnet'

| external_gateway_info | {"network_id": "35f9b888-da3b-42c6-bc73-29395f7e2afe", "external_fixed_ips": [{"subnet_id": "61762e92-9684-4317-ab12-a362feae78a6", "ip_address": "172.24.4.128"}, {"subnet_id": "5f95efdf-6415-43a2-8164-7a9f97a6cbcc", "ip_address": "2001:db8::1"}], "enable_snat": |
| | true} |
| external_gateways | [{'network_id': '35f9b888-da3b-42c6-bc73-29395f7e2afe', 'external_fixed_ips': [{'ip_address': '172.24.4.128', 'subnet_id': '61762e92-9684-4317-ab12-a362feae78a6'}, {'ip_address': '2001:db8::1', 'subnet_id': '5f95efdf-6415-43a2-8164-7a9f97a6cbcc'}]}] |
| flavor_id | None |
| id | 56dd6de7-524e-4850-a5d7-57dbff1e7f7e |
| interfaces_info | [{"port_id": "1fbbfa1d-215b-443b-8464-8ec6cd5749c2", "ip_address": "fd8e:4fac:f388::1", "subnet_id": "88c913ef-69db-4684-ab0f-1de717044065"}, {"port_id": "407f32f2-fb96-43b2-8930-79752c4a2c26", "ip_address": "10.0.0.1", "subnet_id": |
| | "c49e2e09-d54a-4010-a84c-de646f6128f1"}]

2) private-subnet on network 'private'

| allocation_pools | 10.0.0.2-10.0.0.62 |
| cidr | 10.0.0.0/26

We can then create resources to demonstrate the issue.

Create a new private network

$ openstack network create private-network-nested

Create a subnet on it using the default IPv4 subnet pool

$ openstack subnet create --subnet-pool shared-default-subnetpool-v4 --network private-network-nested private-subnet-nested

Create a router that will act as the gateway to this network

$ openstack router create router-nested

| id | a8e69f7b-e8dc-40e1-a944-14bf3b0308dc |

Add an interface on the previously created private subnet

$ openstack router add subnet router-nested private-subnet-nested

Create a port on the initial private subnet/network for the nested router and add it to it

$ openstack port create --network private --fixed-ip subnet=private-subnet,ip-address=10.0.0.62 private-port
$ openstack router add port router-nested private-port

Resultant interfaces

| interfaces_info | [{"port_id": "41b643a1-583c-4909-8cc8-0d2dba994633", "ip_address": "10.0.0.62"...

Read more...

Revision history for this message
Brian Haley (brian-haley) wrote :

I've attached a picture from Horizon when all the steps were complete, just so it's clear what it looks like.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Brian:

I've replicated this configuration in a ML2/OVN environment and I can't ping google.com from inside the VM. I can ping to the router-nested port IPs and to the router nested network port IP. But I can't ping to the external network GW nor any other IP.

Regards.

Revision history for this message
Brian Haley (brian-haley) wrote :

I just verified this works fine with ML2/OVS, using master branch with this commit:

commit a8fe0cb369da7312cff2abb8f3e5902d359a6642 (HEAD -> master, origin/master, origin/HEAD)
Merge: 2d74a93d68 d55c591ecd
Author: Zuul <email address hidden>
Date: Wed Feb 14 15:59:24 2024 +0000

    Merge "[OVN] A LRP in an external tunnelled network has no chassis"

I used the instructions in comments in #13 with the same IP addresses, etc.

So it is clearly a regression in ML2/OVN.

Revision history for this message
Brian Haley (brian-haley) wrote :
Download full text (14.1 KiB)

Just wanted to add info on my ml2/ovs deployment, here are the two routers and the instance info. Sorry for the wrapping.

$ openstack router show router1
+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2024-02-16T16:00:30Z |
| description | |
| enable_ndp_proxy | None |
| external_gateway_info | {"network_id": "f472eb6f-c386-4e21-b601-45af4f30d0f1", "external_fixed_ips": [{"subnet_id": "eeb80f5c-79f2-4097-b5e7-54362a45e3cc", "ip_address": "172.24.4.172"}, {"subnet_id": "80f89407-a55a-43c2-9057-5742e79074b4", "ip_address": "2001:db8::3b9"}], |
| | "enable_snat": true} ...

Revision history for this message
Brian Haley (brian-haley) wrote :

These are the settings I used for my ml2/ovs devstack:

Q_AGENT=openvswitch
Q_ML2_PLUGIN_MECHANISM_DRIVERS=openvswitch
Q_ML2_TENANT_NETWORK_TYPE=vxlan

enable_service q-agt
enable_service q-l3
enable_service q-dhcp
enable_service q-meta
disable_service ovn-controller
disable_service ovn-northd
disable_service ovs-vswitch
disable_service ovsdb-server
disable_service q-ovn-metadata-agent
enable_service placement
enable_service placement-api
enable_service placement-client

I am currently testing an additional nested router/network, will post the OSC commands when I'm done.

Revision history for this message
Brian Haley (brian-haley) wrote :

Ok, as I was asked about the case of 3 nested routers (i.e. a network on a private subnet behind 3 total routers, 2 nested on their own private networks), I've tested that as well. Same results - shows a clear regression from ML2/OVS to OVN.

Again, I used devstack, this was the latest commit in the neutron tree as these deployments were already running from last try:

$ git log -1
commit a8fe0cb369da7312cff2abb8f3e5902d359a6642
Merge: 2d74a93d68 d55c591ecd
Author: Zuul <email address hidden>
Date: Wed Feb 14 15:59:24 2024 +0000

    Merge "[OVN] A LRP in an external tunnelled network has no chassis"

# Create nested network

$ openstack network create private-network-nested
$ openstack subnet create --subnet-pool shared-default-subnetpool-v4 --network private-network-nested private-subnet-nested
$ openstack router create router-nested
$ openstack router add subnet router-nested private-subnet-nested
$ openstack port create --network private --fixed-ip subnet=private-subnet,ip-address=10.0.0.62 private-port
$ openstack router add port router-nested private-port
$ openstack router add route --route destination=10.0.0.64/26,gateway=10.0.0.62 router1
$ openstack router add route --route destination=0.0.0.0/0,gateway=10.0.0.1 router-nested

# Create nested network, 3-layers deep

$ openstack network create private-network-nested-3
$ openstack subnet create --subnet-pool shared-default-subnetpool-v4 --network private-network-nested-3 private-subnet-nested-3
$ openstack router create router-nested-3
$ openstack router add subnet router-nested-3 private-subnet-nested-3
$ openstack port create --network private-network-nested --fixed-ip subnet=private-subnet-nested,ip-address=10.0.0.126 private-port-2
$ openstack router add port router-nested-3 private-port-2
$ openstack router add route --route destination=0.0.0.0/0,gateway=10.0.0.65 router-nested-3
$ openstack router add route --route destination=10.0.0.128/26,gateway=10.0.0.62 router1
$ openstack router add route --route destination=10.0.0.128/26,gateway=10.0.0.126 router-nested

# Launch an instance on doubly-nested network

$ openstack server create --flavor 1 --image cirros-0.6.2-x86_64-disk --key-name devstackkeypair --network private-network-nested-3 test_server1

# Open console of test_server1
# ping 8.8.8.8 (fail)

# Does not work with OVN

$ sudo ovn-nbctl lr-nat-list neutron-034efa05-5717-4e77-b131-b79920ec2a24
TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 172.24.4.122 10.0.0.0/26

# Does work with OVN with the proposed patch
# ping 8.8.8.8 (success)

$ sudo ovn-nbctl lr-nat-list neutron-034efa05-5717-4e77-b131-b79920ec2a24
TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 172.24.4.122 10.0.0.0/26
snat 172.24.4.122 10.0.0.128/26
snat 172.24.4.122 10.0.0.64/26

# Does work with ML2/OVS, running same exact commands as above.
# ping 8.8.8.8 (success)

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Brian:

First of all, I don't have a devstack deployed environment. I have a multinode setup with 3 controllers and 2 compute nodes, so I needed to change the commands provided.

10.0.0.0/24 is the external network configured in my deployment. The other resources are:
* router_ext:
** ext_gw: 10.0.0.221
** net_int2: 10.20.0.1
* router_int:
** net_int2: 10.20.0.100
** net_int1: 10.10.0.1

The routes added:
* router_ext: destination=10.10.0.0/24,gateway=10.20.0.100 (net_int1, router_int net_int2 interface)
* router_int: destination=0.0.0.0/0,gateway=10.20.0.1 (*all*, router_ext net_int2 interface)

I tried with different routers:
* dvr: no ping
* ha: no ping
* legacy: I can ping

That could not be considered as a feature but as a bug or at least a side effect of how legacy router is implemented. In any case, I've contacted the core OVN developers to know if this functionality is possible and how should be implemented in Neutron.

Regards.

Revision history for this message
Brian Haley (brian-haley) wrote :

So just some additional information.

The reporter confirmed their cloud is running HA routers, but not DVR.

And talking with Rodolfo on irc reminded me of a proposed change that I finally found:

https://review.opendev.org/c/openstack/neutron/+/890459

And the bug for that is:

https://bugs.launchpad.net/neutron/+bug/2029722 (Routed subnets cannot use snat)

So this scenario works for "legacy" routers, but not for DVR. It should work for HA although Rodolfo tried and could not get it to work.

So in my opinion, this is a bug in DVR routers and a regression with OVN routers.

As Rodolfo mentioned, he has reached out to the OVN cores for advice.

Revision history for this message
Brian Haley (brian-haley) wrote :

Just adding issue Rodolfo raised with the OVN team at Red Hat:

https://issues.redhat.com/browse/FDP-448

Revision history for this message
Brian Haley (brian-haley) wrote :

BTW, Terry Wilson found the original neutron bug where this behavior was introduced in neutron, allowing all subnets indirectly connected to a router to use the default SNAT address.

https://bugs.launchpad.net/neutron/+bug/1386041

Wanted to make sure that was documented.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917904

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.