Port forwading does only work between VMs in the same neutron network

Bug #1927691 reported by Lars Erik Pedersen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

First of all, I'm not really sure if this is a bug, or some sort of configuration error on our side.. But I'm having issues with the port forwarding in neutron.

Openstack ussuri, running on Bionic
neutron-l3-agent 2:16.2.0-0ubuntu1~cloud0
openvswitch-switch 2.13.1-0ubuntu0.20.04.1~cloud0

My scenario:
- Create two networks (net1 and net2), and attach a router to each of them
- Create two VMs in net1, one in net2
- Attach a "plain" FIP to VM-1 and VM-3
- Create a FIP for the port forwarding, and create a port forwarding rule pointing to VM-2 (i.e map FIP:80 to VM-2:8000)
- Login to VM-2 and start listening to tcp 8000 with "python3 -m http.server 8000"

What I expect:
curl http://FIP:80 should give a response from VM-2:8000 from both VM-1, VM-3 and externally

What happens:
The port forwarding only works for VM-1. In other words, only between VMs in the same neutron network.

--

I've done some debugging with tcpdump on my network nodes within the netns of the qrouter. When I try to connect from either VM-3 or externally, I observe the packets arriving on the qrouter's external interface and they get dropped "somewhere". I've failed to understand/discover where and/or by what.

In the dumps, we have the following IP addresses. All FIPs are in 10.212.136.0/21:
VM-1 (net1): 192.168.0.92 (FIP: 10.212.143.126)
VM-2 (net1): 192.168.0.35 (No FIP, but port forwarding rule on 10.212.141.76 80->8000)
VM-3 (net2): 192.168.111.213 (FIP: 10.212.138.184)
Router of net1: 192.168.0.1 / 10.212.140.143

Iptables for the qrouter that hosts the FIP with port forwarding:
http://paste.openstack.org/show/805020/

tcpdump on the qrouter interal interface when doing "curl http://FIP" from VM-1 (this works, but is of course rather useless):
http://paste.openstack.org/show/805021/

tcpdump on the qrouter external interface when doing "curl http://FIP" from VM-3 (this is identical for connections from machines outside of our openstack environment - and no packets appear on the internal interface):
http://paste.openstack.org/show/805022/

Tags: l3-ha
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

This is very strange. I checked logs which You provided and it seems for me that all what is needed is configured by Neutron L3 agent already (DNAT rule in the nat table).
I tested it locally also with devstack and it worked fine.
Also, we do run tests for that, see e.g. https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_52e/789709/1/check/neutron-tempest-plugin-scenario-openvswitch-ussuri/52e2150/ and there are no problems with that.
Above link to the ci job's logs is for ussuri and it was run on Ubuntu Bionic. Maybe You can compare versions of the packages installed on Your system and on that ci node - maybe there are some differences e.g. in kernel and You will find something.
Also please check if e.g. sysctl's forwarding is enabled on the qg- interface in that router. I assume it is as connectivity to the FIP without port forwarding works fine, right?

tags: added: l3-ha
Changed in neutron:
status: New → Incomplete
Revision history for this message
Lars Erik Pedersen (pedersen-larserik) wrote :

I'll look into the logs from the CI node. sysctl forwarding on the qg-interface are enabled, and yes: Connectivity to the FIP without forwarding works as it should.

I also tried updating to the lastest avbailable versions of neutron in UCA. Still the same issue.

Revision history for this message
Lars Erik Pedersen (pedersen-larserik) wrote :

Did some more debugging. It seems that the packet gets dropped by this in the filter table:

-A neutron-l3-agent-scope -o qr-e8bf5ba7-b5 -m mark ! --mark 0x4000000/0xffff0000 -j DROP

I added logging for iptables on each step in the packet flow (https://i2.wp.com/rakhesh.com/wp-content/uploads/2020/11/Iptables-Flow.png?ssl=1) and I'm pretty sure this stops it. Because nothing appears in the POSTROUTING chain of the mangle table (where it should appear after passing filter FORWARD).

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.