native firewall driver - conntrack marks too much traffic as invalid

Bug #1952055 reported by Yusuf Güngör
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Medium
Unassigned

Bug Description

Hi, we are seeing strange behaviour on our victoria cluster after switching from hyrid firewall driver to native openvswitch firewall driver.

We have to use native openvswitch firewall driver to get firewall logs. After enabling security group logging we had observed that there exist too much DROP actions even any-any ingress-egress rules for all protocols exist in security groups. This seems normal according to [Native Open vSwitch firewall driver](https://docs.openstack.org/neutron/latest/admin/config-ovsfwdriver.html#differences-between-ovs-and-iptables-firewall-drivers) document.

But we do not understand why the traffic is marked invalid by conntrack. We are seeing too much traffic marked as INVALID by conntrack, especially for the services which are doing too much traffic. For example etcd heartbeat which send to cluster members for every 100 ms (tcp port 2380)

conntrack statistics also show high counts for "insert_failed" and "search_restart". nf_conntrack_buckets=65536 and nf_conntrack_max=262144. We also not see nf_conntrack_count reaches to max.

We are seeing random and frequent timeouts on the kubernetes clusters which installed to openstack instances on this cluster. We believe that situation is related this. Especially calico-node pod on k8s cluster gets timeouts for liveness probe checks. Tested calico with both ipip and vxlan mode but no changes. Tested with k8s clusters which are installed to different OS but still no change. (centos 7, debian etcd)

Environment Details:
 OpenStack Victoria Cluster installed via kolla-ansible to Ubuntu 20.04.2 LTS Hosts. (Kernel:5.4.0-80-generic)
 There exist 5 controller+network node.
 "neutron-openvswitch-agent", "neutron-l3-agent" and "neutron-server" version is "17.2.2.dev46"
 OpenvSwitch used in DVR mode with router HA configured. (l3_ha = true)
 We are using a single centralized neutron router for connecting all tenant networks to provider network.
 We are using bgp_dragent to announce unique tenant networks.
 Tenant network type: vxlan
 External network type: vlan

Conntrack Invalid Logs (After enabling nf_conntrack_log_invalid logging)
...
... For etcd port 2380
...
Nov 24 10:45:47 test-compute-07 kernel: [9666429.466072] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=52384 DF PROTO=TCP SPT=33726 DPT=2380 SEQ=1503741580 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:46:01 test-compute-07 kernel: [9666444.248252] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=10.168.112.39 DST=10.211.2.97 LEN=60 TOS=0x00 PREC=0x00 TTL=59 ID=0 DF PROTO=TCP SPT=6533 DPT=45832 SEQ=2345805154 ACK=1982320186 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A1ACBE8A518E611E801030309) MARK=0x4010000
Nov 24 10:46:02 test-compute-07 kernel: [9666444.490741] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=10.168.112.39 DST=10.211.2.97 LEN=60 TOS=0x00 PREC=0x00 TTL=59 ID=0 DF PROTO=TCP SPT=6533 DPT=59862 SEQ=3082071853 ACK=2961225592 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A1ACBE8E218E612DA01030309) MARK=0x4010000
Nov 24 10:46:06 test-compute-07 kernel: [9666448.362730] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.139 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=42180 DF PROTO=TCP SPT=42286 DPT=2380 SEQ=3794545871 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:46:11 test-compute-07 kernel: [9666453.465972] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=62831 DF PROTO=TCP SPT=33954 DPT=2380 SEQ=935403626 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:46:19 test-compute-07 kernel: [9666461.590026] nf_ct_proto_6: invalid packet ignored in state SYN_SENT IN= OUT= SRC=162.247.243.149 DST=10.211.2.121 LEN=40 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=TCP SPT=443 DPT=56158 SEQ=1845326009 ACK=4146250693 WINDOW=1198 RES=0x00 ACK URGP=0 MARK=0x4010000
Nov 24 10:46:22 test-compute-07 kernel: [9666464.365487] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.139 DST=10.211.2.168 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=47797 DF PROTO=TCP SPT=46064 DPT=2380 SEQ=4079966865 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:47:07 test-compute-07 kernel: [9666509.467096] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.139 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=13159 DF PROTO=TCP SPT=49816 DPT=2380 SEQ=3428465462 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:47:07 test-compute-07 kernel: [9666509.467658] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.139 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=13160 DF PROTO=TCP SPT=49816 DPT=2380 SEQ=3428465462 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:47:08 test-compute-07 kernel: [9666510.380344] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=52.84.114.5 DST=10.211.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=228 ID=16475 PROTO=TCP SPT=443 DPT=49610 SEQ=3672172766 ACK=202854934 WINDOW=1428 RES=0x00 ACK SYN URGP=0 OPT (020405A00402080A65E5498918E687F501030309) MARK=0x4010000
Nov 24 10:47:27 test-compute-07 kernel: [9666529.466842] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=25780 DF PROTO=TCP SPT=34674 DPT=2380 SEQ=1778600979 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:47:27 test-compute-07 kernel: [9666529.467583] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=25781 DF PROTO=TCP SPT=34674 DPT=2380 SEQ=1778600979 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:48:03 test-compute-07 kernel: [9666565.468588] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.139 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=44458 DF PROTO=TCP SPT=50346 DPT=2380 SEQ=179714231 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:48:07 test-compute-07 kernel: [9666569.468069] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=40395 DF PROTO=TCP SPT=35050 DPT=2380 SEQ=1396127788 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
Nov 24 10:48:07 test-compute-07 kernel: [9666569.468408] nf_ct_proto_6: invalid rst IN= OUT= SRC=10.211.2.168 DST=10.211.2.98 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=40396 DF PROTO=TCP SPT=35050 DPT=2380 SEQ=1396127788 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
...
... For another ports
...
Nov 24 10:45:12 test-compute-07 kernel: [9666394.834132] nf_ct_proto_6: invalid packet ignored in state SYN_SENT IN= OUT= SRC=162.247.243.148 DST=10.211.2.246 LEN=40 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=TCP SPT=443 DPT=40824 SEQ=3886318363 ACK=1730529897 WINDOW=1190 RES=0x00 ACK URGP=0 MARK=0x4010000
Nov 24 10:46:01 test-compute-07 kernel: [9666444.248252] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=10.168.112.39 DST=10.211.2.97 LEN=60 TOS=0x00 PREC=0x00 TTL=59 ID=0 DF PROTO=TCP SPT=6533 DPT=45832 SEQ=2345805154 ACK=1982320186 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A1ACBE8A518E611E801030309) MARK=0x4010000
Nov 24 10:46:02 test-compute-07 kernel: [9666444.490741] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=10.168.112.39 DST=10.211.2.97 LEN=60 TOS=0x00 PREC=0x00 TTL=59 ID=0 DF PROTO=TCP SPT=6533 DPT=59862 SEQ=3082071853 ACK=2961225592 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A1ACBE8E218E612DA01030309) MARK=0x4010000
Nov 24 10:46:19 test-compute-07 kernel: [9666461.590026] nf_ct_proto_6: invalid packet ignored in state SYN_SENT IN= OUT= SRC=162.247.243.149 DST=10.211.2.121 LEN=40 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=TCP SPT=443 DPT=56158 SEQ=1845326009 ACK=4146250693 WINDOW=1198 RES=0x00 ACK URGP=0 MARK=0x4010000
Nov 24 10:47:08 test-compute-07 kernel: [9666510.380344] nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=52.84.114.5 DST=10.211.2.89 LEN=60 TOS=0x00 PREC=0x00 TTL=228 ID=16475 PROTO=TCP SPT=443 DPT=49610 SEQ=3672172766 ACK=202854934 WINDOW=1428 RES=0x00 ACK SYN URGP=0 OPT (020405A00402080A65E5498918E687F501030309) MARK=0x4010000
Nov 24 10:49:39 test-compute-07 kernel: [9666661.880770] nf_ct_proto_6: invalid packet ignored in state SYN_SENT IN= OUT= SRC=162.247.243.149 DST=10.211.2.246 LEN=40 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=TCP SPT=443 DPT=41428 SEQ=358351623 ACK=2255766346 WINDOW=1212 RES=0x00 ACK URGP=0 MARK=0x4010000
Nov 24 10:50:17 test-compute-07 kernel: [9666699.786127] nf_ct_proto_6: invalid rst IN= OUT= SRC=162.247.243.149 DST=10.211.2.251 LEN=40 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=TCP SPT=443 DPT=50758 SEQ=1139505987 ACK=0 WINDOW=0 RES=0x00 RST URGP=0 MARK=0x4010000

Conntrack Statistics logs from compute node (root namespace) attached.

Revision history for this message
Yusuf Güngör (yusuf2) wrote :
description: updated
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Yusuf:

You should maybe ask this question in the netfilter support channel.

According to the logs, if an unexpected TCP packet arrives to a host, this host responds sending a RST packet. You should find why those 10.211.2.* IPs are sending those unexpected TCP packets.

If those are the ectd heartbeats, then you can try disabling the security for those ports receiving the heartbeats. If I'm not wrong, the ectd heartbeats at TCP packets. There should be a listener expecting them. If those packets are not expected, then you need to debug this issue.

Sorry if I didn't provide you enough help.

Regards.

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Yusuf Güngör (yusuf2) wrote :
Download full text (3.4 KiB)

Hi Rodolfo,

Thanks for your reply. You helped a lot. Now we are focused to this unexpected TCP packets which cause to TCP RSTs.

We also see this TCP RST's on k8s clusters which located on vmware infrastructure.

This seems k8s related problem. (may be cni related) Connections closed with FIN,ACK by client and probably socket is closed at client side but after this FIN,ACK packet; server side tries to send a packet with data and then FIN, ACK too. This two packets gets TCP RST from the client side, probably reason is connection already closed by client.

This TCP RSTs logged by conntrack as "Invalid RST" and native openvswitch firewall driver log these requests as DROP. We will ask this behaviour to k8s team but if it is normal then should openvswitch firewall driver needs update or how we can prevent of logging this packets as DROP? Our firewall team will use these firewall logs and this situation is confusing for them.

You can see the dump below:

529 2021-11-26 22:11:58.807319 192.168.166.53 192.168.166.51 TCP 76 0 1 0 50 43526 → 2380 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3258810342 TSecr=0 WS=128
530 2021-11-26 22:11:58.807372 192.168.166.51 192.168.166.53 TCP 76 0 1 1 50 2380 → 43526 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=3258809359 TSecr=3258810342 WS=128
531 2021-11-26 22:11:58.807483 192.168.166.53 192.168.166.51 TCP 68 1 1 1 50 43526 → 2380 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=3258810343 TSecr=3258809359
532 2021-11-26 22:11:58.807559 192.168.166.53 192.168.166.51 TLSv1.2 255 1 188 1 50 Client Hello
533 2021-11-26 22:11:58.807568 192.168.166.51 192.168.166.53 TCP 68 1 1 188 50 2380 → 43526 [ACK] Seq=1 Ack=188 Win=30080 Len=0 TSval=3258809360 TSecr=3258810343
624 2021-11-26 22:11:58.812160 192.168.166.51 192.168.166.53 TLSv1.2 1443 1 1376 188 50 Server Hello, Certificate, Server Key Exchange, Certificate Request, Server Hello Done
625 2021-11-26 22:11:58.812253 192.168.166.53 192.168.166.51 TCP 68 188 188 1376 50 43526 → 2380 [ACK] Seq=188 Ack=1376 Win=32128 Len=0 TSval=3258810347 TSecr=3258809364
631 2021-11-26 22:11:58.816296 192.168.166.53 192.168.166.51 TLSv1.2 1383 188 1503 1376 50 Certificate, Client Key Exchange, Certificate Verify, Change Cipher Spec, Encrypted Handshake Message
639 2021-11-26 22:11:58.816992 192.168.166.51 192.168.166.53 TLSv1.2 119 1376 1427 1503 50 Change Cipher Spec, Encrypted Handshake Message
644 2021-11-26 22:11:58.817236 192.168.166.53 192.168.166.51 TLSv1.2 209 1503 1644 1427 50 Application Data
646 2021-11-26 22:11:58.817388 192.168.166.51 192.168.166.53 TLSv1.2 264 1427 1623 1644 50 Application Data
653 2021-11-26 22:11:58.817586 192.168.166.53 192.168.166.51 TLSv1.2 99 1644 1675 1623 50 Encrypted Alert
656 2021-11-26 22:11:58.817597 192.168.166.53 192.168.166.51 TCP 68 1675 1676 1623 50 43526 → 2380 [FIN, ACK] Seq=1675 Ack=1623 Win=34944 Len=0 TSval=3258810353 TSecr=3258809369
664 2021-11-26 22:11:58.817742 192.168.166.51 192.168.166.53 TLSv1.2 99 1623 1654 1676 50 Encrypted Alert
665 2021-11-26 22:11:58.817758 192.168.166.51 192.168.166.53 TCP 68 1654 1655 1676 50 2380 → 43526 [FIN, ACK] Seq=1654 Ack=1676 Win=35584 Len=0 TSval=3258809370 TSecr=32...

Read more...

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Hi Yusuf, I close this bug report now, but if you have any news from k8s team, please feel free to reopen it. If we need to change how the firewall driver(s) work(s) perhaps we have to open an RFE and have more analyses (neutron ha multiple fw drivers, and we have to keep them give the same user experience).

Changed in neutron:
status: New → Invalid
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi Lajos, it seems that it is the way of same applications to close connections with "FIN,ACK" packet. There exist too much applications which are doing that. I do not think it is possible to change this behaviour of all other applications. It should be nice to have an option to ignore this kind of invalid RST packets. We are going try to find a way to ignore them before storing the logs. (If we see the all tcp flags from the logs.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.