Bug #1817936 “Periodic message loss seen between VIM and OpenSta...” : Bugs : StarlingX

Bart Wensley (bartwensley) on 2019-02-27

Changed in starlingx:
assignee:	nobody → Bart Wensley (bartwensley)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-27:

#1

Marking as release gating until further investigation. medium priority.

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
tags:	added: stx.2019.05 stx.containers

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-03-01:

#2

Download full text (5.5 KiB)

I have made some progress on the periodic message loss issue.

First, I believe the messages are being sent by the VIM, but are not being received by the nova-api-osapi pod. As an example, in one failure case, the VIM would have sent two GET requests to the nova-api here:
2019-02-27T16:54:05.222 controller-1 VIM_Thread[192368] DEBUG _vim_nfvi_audits.py.974 Audit instance called, timer_id=16.
2019-02-27T16:54:05.223 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.1003 Auditing instance a5d7d490-5e12-4161-9949-b400f79237a5.
2019-02-27T16:54:05.223 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.1003 Auditing instance c572d839-19f5-4aec-9f57-f34b8bcf2a2d.

I can see that the second message is processed by the nova-api-osapi pod on controller-1 (co-located with the VIM):
2019-02-27 16:54:05,634.634 1 INFO nova.osapi_compute.wsgi.server [req-64ca04ca-684e-4b82-9cde-cd0f4dc8e4f9 3609cdfff9cb486c92732d162375fb9f 9f0e8cacb37b42c7a5b3c7bc99863582 - default default] 128.224.151.84 "GET /v2.1/9f0e8cacb37b42c7a5b3c7bc99863582/servers/c572d839-19f5-4aec-9f57-f34b8bcf2a2d HTTP/1.1" status: 200 len: 2708 time: 0.4034779

The VIM handles the reply:
2019-02-27T16:54:05.636 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.941 Audit-Instance callback for c572d839-19f5-4aec-9f57-f34b8bcf2a2d
2019-02-27T16:54:05.640 controller-1 VIM_Thread[192368] DEBUG _instance_director.py.1579 Notify other directors that an instance centos-1 audit is inprogress.

But the first message doesn’t appear in the nova-api-osapi logs and the VIM eventually times out waiting for the reply:
2019-02-27T16:54:15.441 controller-1 VIM_Thread[192368] DEBUG _vim_nfvi_audits.py.974 Audit instance called, timer_id=16.
2019-02-27T16:54:15.441 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.984 Audit instance queries still outstanding, outstanding=OrderedDict([(u'a5d7d490-5e12-4161-9949-b400f79237a5', u'centos-2')])
2019-02-27T16:54:25.234 controller-1 VIM_Thread[192368] INFO _task_worker_pool.py.73 Timeout worker Compute-Worker-0
2019-02-27T16:54:26.239 controller-1 VIM_Thread[192368] ERROR _task.py.200 Task(get_instance) work (get_server) timed out, id=3901.
2019-02-27T16:54:26.240 controller-1 VIM_Thread[192368] ERROR _vim_nfvi_audits.py.961 Audit-Instance callback, not completed, response={'completed': False, 'reason': ''}.

I next determined that this problem is not specific to the VIM. I wrote simple script to send GET requests directly to the nova API using curl (run this from a shell with the OpenStack credentials set up):

#!/bin/bash

counter=1
while [ $counter -le 10000 ]
do
  echo
  echo "Request: $counter"
  date +"%D %T %N"
  curl -g -i -X GET http://nova-api.openstack.svc.cluster.local:8774/v2.1/$TENANT_ID/flavors/detail -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: $TOKEN_ID" -H "X-OpenStack-Nova-API-Version: 2.1"
  #curl -g -i -X GET http://nova.openstack.svc.cluster.local:80/v2.1/$TENANT_ID/flavors/de...

The connection fails only under specific conditions and is directly related to a conflict when setting up the NAT connection tracking information.  This issue occurs in about 1 in 500 connections and is only impacts that one connection, all other and subsequent connections are fine.

There seems to be a race condition or a source port conflict when performing the NAT operation when connecting to the local pod.  When there is a failed connection, the conntrack entry shows the use of a generated port number in the reverse path translation.  This seems to imply it could not use the original source port due to the conflict, but then subsequent packets fail to match the flow entry, causing the connection to fail.

tcp      6 86399 ESTABLISHED src=10.10.10.13 dst=10.103.101.91 sport=60171 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60171 [ASSURED] mark=0 use=1
tcp      6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59721 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59721 [ASSURED] mark=0 use=1
tcp      6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59753 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59753 [ASSURED] mark=0 use=1
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60053 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60053 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59961 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59961 [ASSURED] mark=0 use=2
tcp      6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59717 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59717 [ASSURED] mark=0 use=1
tcp      6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59739 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59739 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59985 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59985 [ASSURED] mark=0 use=2
tcp      6 119 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60147 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60147 [ASSURED] mark=0 use=2
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60047 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60047 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60009 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60009 [ASSURED] mark=0 use=1
tcp      6 119 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60161 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60161 [ASSURED] mark=0 use=1
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60057 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60057 [ASSURED] mark=0 use=1
tcp      6 116 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59833 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59833 [ASSURED] mark=0 use=1
tcp      6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59791 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59791 [ASSURED] mark=0 use=1
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60093 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60093 [ASSURED] mark=0 use=2
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60083 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60083 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59943 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59943 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59951 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59951 [ASSURED] mark=0 use=1
tcp      6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59641 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59641 [ASSURED] mark=0 use=1
tcp      6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60015 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60015 [ASSURED] mark=0 use=1
tcp      6 116 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59911 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59911 [ASSURED] mark=0 use=1
tcp      6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59671 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59671 [ASSURED] mark=0 use=1
tcp      6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59659 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59659 [ASSURED] mark=0 use=1
tcp      6 116 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59879 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59879 [ASSURED] mark=0 use=1
tcp      6 116 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59827 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59827 [ASSURED] mark=0 use=1
tcp      6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60065 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60065 [ASSURED] mark=0 use=1

tcp      6 16 SYN_RECV src=10.10.10.13 dst=10.103.101.91 sport=51513 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=2018 mark=0 use=1
tcp      6 29 SYN_RECV src=10.10.10.13 dst=10.103.101.91 sport=52545 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=2019 mark=0 use=1
tcp      6 40 SYN_RECV src=10.10.10.13 dst=10.103.101.91 sport=53497 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=2020 mark=0 use=1
tcp      6 59 SYN_RECV src=10.10.10.13 dst=10.103.101.91 sport=55179 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=2021 mark=0 use=1

Revision history for this message

Matt Peters (mpeters-wrs) wrote on 2019-03-13:

#4

The statistics also show the "invalid" counts being begged every time the issue occurs.

controller-0:~$ sudo conntrack -S
cpu=0 searched=2278720 found=26257506 new=925893 invalid=1205 ignore=4451231 delete=1094225 delete_list=1094226 insert=925895 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=1 searched=2192697 found=26227245 new=929531 invalid=1392 ignore=4293678 delete=1069473 delete_list=1069464 insert=929527 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=2 searched=3038601 found=35074053 new=932092 invalid=1197 ignore=4435206 delete=1042689 delete_list=1042690 insert=932090 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=3 searched=12913387 found=154903356 new=2980111 invalid=5441 ignore=3586824 delete=2559266 delete_list=2428654 insert=2849496 insert_failed=0 drop=0 early_drop=0 error=5 search_restart=0

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Revision history for this message

Austin Sun (sunausti) wrote on 2019-06-07:

#5

Bart, and Matt:
I have try reproduce this issue in virtual environment in latest version.
I run more than 10000 times the script Bart mentioned, but can not reproduce.
could you try latest version if the issue is there or not ?

Revision history for this message

Matt Peters (mpeters-wrs) wrote on 2019-06-10:

#6

This issue only occurs when the services (or test client) are co-located on the same host. Can you confirm that your reproduction attempts were all localized to a single host?

Revision history for this message

Austin Sun (sunausti) wrote on 2019-06-11:

#7

Hi Matt:
you mean run this in a AIO(All-in-One) Simplex setup ?

Revision history for this message

Austin Sun (sunausti) wrote on 2019-06-11:

#8

I think you are right , in AIO Simplex setup ,the script stopped at
Request: 2083
06/11/19 03:03:48 523488226

curl: (7) Failed connect to nova-api.openstack.svc.cluster.local:8774; Connection timed out
7

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-08:

#9

mytestnossh.cap15 Edit (9.5 MiB, application/octet-stream)

From, mytestnossh.cap15 tcpdump. a very weird No 13523, it got a response from nova-api, but can not find any corresponding request send from host in previous time.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-07-08:

#10

Assigning to Austin since he is actively working on this bug

Changed in starlingx:
assignee:	Matt Peters (mpeters-wrs) → Austin Sun (sunausti)

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-16:

#11

1817936_iptrace.zip Edit (286.7 KiB, application/zip)

in, kern_2019-07-15_15_04_50.log, from Line 24000 to 24001, a very weird:
Line 24000, mangle:cali-PREROUTING:rule:1 172.16.192.109:8774->10.10.10.3:33727
Line 24001, mangle:INPUT:policy:1 172.16.192.109:8774->10.10.10.3:53902

The destination port was suddenly changed from 33727 to 53902.

after that:
Ln 24014, it goes to filter:cali-th-ens6:rule:2, which rule is because cstate is invalid, then it goes error.

Frank Miller (sensfan22) on 2019-07-16

tags:

added: stx.networking
removed: stx.containers

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-17:

#12

Here is some finding , readiness or liveness deployment of nova-api-osapi or xxxx-api is using the tcp and port to check pod readiness and liveness
For example barbican-api readiness
readinessProbe:
          failureThreshold: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 9311

‘curl --connect-timeout 10 http://barbican-api.openstack.svc.cluster.local:9311 result’

Host Name readinessProbe. periodSeconds Error times in 50000 times Comments
R22 10 29 only applied barbican , no cinder, glance , compute-toolkit were applied.
R22 10 28
R22 300 0
R22 300 0
R22 300 0
Edgetest901 10 8 Apply default helm chart
Edgetest901 300 2
Edgetest901 300 1

it seems change period of readiness will improve the result.

And I changed nova-api-osapi deploy readiness and liveness from tcp to exec command.
Changed to
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/nova-api.sh
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

The result is very positive , there is 0 error on 2*50000 times.
will check how to fix this issue officially

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-19:

#13

in 1817936_iptrace.zip, kern_2019-07-15_15_04_50.log
Ln 23963, SPT=53902 , SYN Package from kernel, but it was not found in tcpdump package. tcpdump package only find source port =33727 sync package.

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-23:

#14

tcp_test.zip Edit (101.3 KiB, application/zip)

Some more tests are performed based on attached tcp_test.zip
Set-Up K8s with calico CNI On Ubuntu 18.04.1 LTS and CentOS Linux release 7.6.1810
'kubectl create ns policy-demo'
'kubectl apply -f tcp_echo.yaml'
and run test.sh in attached zip.

Ubuntu 18.04.1 result is Good. connected 10000 times.
on CentOS 7.6.1810, connected will failed in 1000 times.
But if readinessProbe was removed from pod in CentOS 7.6.1810. connected 10000 times without any failure.

General info
CentOS Linux release 7.6.1810
docker version:19.03.0
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

Ubuntu 18.04.1 LTS
Docker version 18.09.8, build 0dd43dd87f
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

centos_trace.cap No 211, 10.10.10.188:1024->192.168.102.134:1237, always Retransmission and failure.
kern.log Line 7189 10.10.10.188:1024->10.101.65.100:1237 can not find this package in tcpdump
In Line 7241 SYN Package Source Port is 42934, But in Line 7242 The SYN ACK Destination Port changed to 1237
Line 7241 Jul 22 23:44:17 localhost kernel: [ 8013.941489] TRACE: nat:KUBE-POSTROUTING:rule:1 IN= OUT=calic3dc5321ad4 SRC=10.10.10.188 DST=192.168.102.134 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23843 DF PROTO=TCP SPT=42934 DPT=1237 SEQ=2847795293 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A0075B57E0000000001030307) UID=1000 GID=1000 MARK=0x4000
Line 7242 Jul 22 23:44:17 localhost kernel: [ 8013.941551] TRACE: raw:PREROUTING:policy:4 IN=calic3dc5321ad4 OUT= MAC=ee:ee:ee:ee:ee:ee:e6:bf:5c:54:84:71:08:00 SRC=192.168.102.134 DST=10.10.10.188 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=1237 DPT=1024 SEQ=2976603651 ACK=2847795294 WINDOW=27760 RES=0x00 ACK SYN URGP=0 OPT (020405780402080A0075B57F0075B57E01030307) MARK=0x40000

Some more tests are performed based on attached tcp_test.zip 
Set-Up K8s with calico CNI On Ubuntu 18.04.1 LTS and CentOS Linux release 7.6.1810 
'kubectl create ns policy-demo'
'kubectl apply -f tcp_echo.yaml'
and run test.sh in attached zip.

Ubuntu 18.04.1 result is Good. connected 10000 times.
on CentOS 7.6.1810,  connected will failed in 1000 times. 
But if readinessProbe was removed from pod in CentOS 7.6.1810. connected 10000 times without any failure.

General info
CentOS Linux release 7.6.1810 
docker version:19.03.0
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

Ubuntu 18.04.1 LTS
Docker version 18.09.8, build 0dd43dd87f
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

centos_trace.cap No 211, 10.10.10.188:1024->192.168.102.134:1237, always Retransmission and failure.
kern.log Line 7189 10.10.10.188:1024->10.101.65.100:1237  can not find this package in tcpdump 
In Line 7241 SYN Package Source Port is 42934, But in Line 7242 The SYN ACK Destination Port changed to 1237
Line 7241 Jul 22 23:44:17 localhost kernel: [ 8013.941489] TRACE: nat:KUBE-POSTROUTING:rule:1 IN= OUT=calic3dc5321ad4 SRC=10.10.10.188 DST=192.168.102.134 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23843 DF PROTO=TCP SPT=42934 DPT=1237 SEQ=2847795293 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A0075B57E0000000001030307) UID=1000 GID=1000 MARK=0x4000
Line 7242 Jul 22 23:44:17 localhost kernel: [ 8013.941551] TRACE: raw:PREROUTING:policy:4 IN=calic3dc5321ad4 OUT= MAC=ee:ee:ee:ee:ee:ee:e6:bf:5c:54:84:71:08:00 SRC=192.168.102.134 DST=10.10.10.188 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=1237 DPT=1024 SEQ=2976603651 ACK=2847795294 WINDOW=27760 RES=0x00 ACK SYN URGP=0 OPT (020405780402080A0075B57F0075B57E01030307) MARK=0x40000

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-23:

#15

another interesting test was performed. if using command
"curl --local-port 47250-67500 --connect-timeout 10 http://barbican-api.openstack.svc.cluster.local:9311", then the issue is gone. it seems the port conflict in some case.

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-07-23:

#16

Changed the priority to High based on the outcome of discussions with PLs and TLs (Ghada, Brent, Frank). Numerous LPs have been reported over the past few months related to openstack commands intermittently failing. Many of these look like message loss between openstack pods and could be due to this LP.

tags:	added: stx.containers
Changed in starlingx:
importance:	Medium → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-23: Fix proposed to integ (master)

#17

Fix proposed to branch: master
Review: https://review.opendev.org/672387

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-30:

#18

it seems curl command in centos is not controlled by net.ipv4.ip_local_port_range.
When I running curl command , the source port range is from 11728 to 65508 with tcpdump log.

So the valid test approach should be 'curl --local-port 49216-61000 xxxx' 49216-61000 is stx ip local port range.

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-31:

#19

although curl command is invalid test case, but for python test code.
import urllib
import datetime
for i in range(1,20000):
    now = datetime.datetime.now()
    print("start %s at: %s" % (i,now))
    try:
        response = urllib.urlopen("http://glance-api.openstack.svc.cluster.local:9292")
        print(response.read())
        response.close()
    except:
        now = datetime.datetime.now()
        print("Error %s at: %s" % (i,now))
        exit()
    now = datetime.datetime.now()
    print("end %s at: %s" % (i,now))

This issue is still reproducible even local port range 49216-61000 or 45000-65500 or 32768-60999.
need more investigation.

Revision history for this message

Austin Sun (sunausti) wrote on 2019-07-31:

#20

if remove glance-api readiness probe and livenessprobe , then test mentioned above run successfully w/o failure in 50000 times. it should be some conflict interfere between application with tcp probe.

Revision history for this message

Austin Sun (sunausti) wrote on 2019-08-02:

#21

glance-api.tar.gz Edit (578.1 KiB, application/x-tar)

from the attached tcpdump log.
@26151, there is a package 10.10.10.3:18794->172.16.192.89:9292 , but the port 18794 should not be valid, because 'net.ipv4.ip_local_port_range = 49216 61000', need checking where the packages comes from.

Revision history for this message

Austin Sun (sunausti) wrote on 2019-08-02:

#22

k8s_probe.tar.gz Edit (23.7 MiB, application/x-tar)

attachment is long trace for monitoring 9292 port , there are a lot of ports out of ip_local_port_range(49217 65500)
for example: No 233525, the port is 10325, 10.10.10.3:10325->172.16.192.130:9292

The ephemeral ports out of ip_local_port_range:
10187: 10325: 11283: 11702: 11891: 12797: 13324: 13336: 13341: 13447: 13470: 13553: 13664: 13685: 13864: 13934: 14393
: 15440: 15496: 15520: 16558: 16695: 16794: 16909: 16932: 17017: 17110: 17117: 17269: 17343: 17589: 18024: 18166: 182
13: 18274: 18342: 18959: 19275: 19845: 20230: 20499: 21086: 21121: 21141: 21302: 21477: 21816: 21987: 22117: 22296: 2
3081: 23233: 23621: 24344: 24877: 25243: 25297: 25634: 25637: 25909: 26443: 26454: 27055: 27428: 27597: 27703: 27786:
27860: 27935: 28010: 28105: 28224: 28403: 28847: 28922: 29385: 29714: 29912: 30493: 30547: 30641: 30736: 31264: 3147
9: 31734: 32003: 32070: 32234: 32531: 32670: 32987: 33032: 33486: 33507: 33509: 33611: 33806: 33926: 34073: 34505: 34
522: 34574: 34589: 34819: 35228: 35442: 35779: 36030: 36036: 37493: 37538: 37782: 38567: 38756: 38872: 39561: 39642:
39700: 39903: 39989: 3exmp: 40624: 40858: 40874: 41165: 41231: 41343: 41511: 41687: 41701: 41709: 41773: 41799: 41864
: 42103: 42525: 42698: 42837: 4334: 43705: 4388: 44435: 44726: 45212: 45313: 45613: 45752: 45789: 45925: 45927: 46201
: 46225: 47263: 47679: 47830: 47945: 47985: 48007: 4826: 48431: 48473: 48646: 48729: 4873: 48852: 4921: 49216: 49217:
49218:

Revision history for this message

Austin Sun (sunausti) wrote on 2019-08-04:

#23

echo-glance-api.tar.gz Edit (16.7 KiB, application/x-tar)

when issue occured.
in echo-glance-api.cap (tcpdump -i any -vv port 9292 -w echo-glance-api.cap)
No468: 10.10.10.3:13550 ---> 172.16.192.99:9292
in echo-glance-api.con (conntrack -o timestamp -E -p tcp --dport 9292)
[1564902343.325301] [NEW] tcp 6 120 SYN_SENT src=10.10.10.3 dst=10.98.65.211 sport=53720 dport=9292 [UNREPLIED] src=172.16.192.99 dst=10.10.10.3 sport=9292 dport=13550
[1564902343.325305] [UPDATE] tcp 6 60 SYN_RECV src=10.10.10.3 dst=10.98.65.211 sport=53720 dport=9292 src=172.16.192.99 dst=10.10.10.3 sport=9292 dport=13550
The sport was changed from 53720 to 13550. it seems NAT change the port. will check netfliter code.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-05: Change abandoned on integ (master)

#24

Change abandoned by Austin Sun (<email address hidden>) on branch: master
Review: https://review.opendev.org/672387

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-06: Fix proposed to integ (master)

#25

Fix proposed to branch: master
Review: https://review.opendev.org/674719

Revision history for this message

Austin Sun (sunausti) wrote on 2019-08-06:

#26

conntrack_tcpdump_kern.zip Edit (3.5 MiB, application/zip)

from this log , The root cause is found.
probe was in TIME_WAIT connection(10.10.10.3:50538->172.16.192.101:9292)
client send a new connection 10.10.10.3:50538->10.109.43.235:9292, after NAT the output SYN package will be
10.10.10.3:24479->172.16.192.101:9292. the SYN ACK 172.16.192.101:9292->10.10.10.3:24479 will be modified by Mangle to 172.16.192.101:9292->10.10.10.3:50538, but this socket was still in TIME_WAIT state, will be sent INVALID and then drop.

==========================================================================================
describe controller service endpoint STATE
                (10.10.10.3) (10.109.43.235) (172.16.192.101)
     Probe *(50538)------------------------->*(9292) TIME_WAIT
     Client New *(50538)----> *(9292)
     After NAT
     Client *(24479)------------------------->*(9292) SYN_SENT
     IN PREROUTING *(24479)<-------------------------*(9292) SYN_RECV
     INPUT Chain *(50538)<-------------------------*(9292) SYN_RECV

==========================================================================================
tcpdump_port_9292.cap:
No.24288 10.10.10.3:24479->172.16.192.101:9292
No.24289 172.16.192.101:9292->10.10.10.3:24479

conntrack_tcp_port9292:

[1565008856.898709] [UPDATE] tcp 6 120 TIME_WAIT src=10.10.10.3 dst=172.16.192.101 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=50538 [ASSURED]
[1565008911.774492] [NEW] tcp 6 120 SYN_SENT src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 [UNREPLIED] src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479
[1565008911.774522] [UPDATE] tcp 6 60 SYN_RECV src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479

because port 50538 is still in TIME_WAIT to release.
New connection NAT can not use this CT. so NAT change to 10.10.10.3:24479->172.16.192.101:9292
10.10.10.3:50538->10.109.43.235:9292 nat to 10.10.10.3:24479->172.16.192.101:9292
kern_9292.log:
mangle change the CT 172.16.192.101:9292->10.10.10.3:24479 to 172.16.192.101:9292->10.10.10.3:50538
because this 172.16.192.101:9292->10.10.10.3:50538 CT is still waitting for DESTORY,
so 172.16.192.101:9292->10.10.10.3:50538 is set invalid. then package is droped

2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116655] TRACE: mangle:cali-PREROUTING:rule:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=24479 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000
2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116661] TRACE: mangle:INPUT:policy:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=50538 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000

from this log , The root cause is found. 
probe  was in TIME_WAIT connection(10.10.10.3:50538->172.16.192.101:9292)
client send a new connection 10.10.10.3:50538->10.109.43.235:9292, after NAT the output SYN package will be 
10.10.10.3:24479->172.16.192.101:9292. the SYN ACK 172.16.192.101:9292->10.10.10.3:24479 will be modified by Mangle to 172.16.192.101:9292->10.10.10.3:50538, but this socket was still in TIME_WAIT state, will be sent INVALID and then drop.

==========================================================================================
describe    controller        service          endpoint       STATE
                (10.10.10.3)    (10.109.43.235)   (172.16.192.101)
     Probe         *(50538)------------------------->*(9292)      TIME_WAIT
     Client New    *(50538)---->  *(9292)
     After NAT
     Client        *(24479)------------------------->*(9292)      SYN_SENT
     IN PREROUTING *(24479)<-------------------------*(9292)      SYN_RECV
     INPUT Chain   *(50538)<-------------------------*(9292)      SYN_RECV

==========================================================================================
tcpdump_port_9292.cap:
No.24288 10.10.10.3:24479->172.16.192.101:9292
No.24289 172.16.192.101:9292->10.10.10.3:24479

conntrack_tcp_port9292:

[1565008856.898709]      [UPDATE] tcp      6 120 TIME_WAIT src=10.10.10.3 dst=172.16.192.101 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=50538 [ASSURED]
[1565008911.774492]         [NEW] tcp      6 120 SYN_SENT src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 [UNREPLIED] src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479
[1565008911.774522]      [UPDATE] tcp      6 60 SYN_RECV src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479

because port 50538 is still in TIME_WAIT to release.
New connection NAT can not use this CT. so NAT change to 10.10.10.3:24479->172.16.192.101:9292
10.10.10.3:50538->10.109.43.235:9292 nat to 10.10.10.3:24479->172.16.192.101:9292
kern_9292.log:
mangle change the CT 172.16.192.101:9292->10.10.10.3:24479 to 172.16.192.101:9292->10.10.10.3:50538
because this 172.16.192.101:9292->10.10.10.3:50538 CT is still waitting for DESTORY,
so 172.16.192.101:9292->10.10.10.3:50538 is set invalid. then package is droped

2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116655] TRACE: mangle:cali-PREROUTING:rule:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=24479 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000
2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116661] TRACE: mangle:INPUT:policy:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=50538 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000

Revision history for this message

Austin Sun (sunausti) wrote on 2019-08-08:

#27

Download full text (3.1 KiB)

The probe connection action before going to time_wait state.
Probe connection
controller service endpoint Chain TCP FLAG SEQ ACK
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 SYN 2707980036 0
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 SYN ACK 1599414185 2707980037
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 filter:OUTPUT:rule:1 ACK 2707980037 1599414186
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 FIN ACK 2707980037 1599414186
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 ACK 1599414186 2707980038
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 FIN ACK 1599414186 2707980038
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 ACK 2707980038 1599414187

And for the curl command connection with same port 50538: it will be like
controller service endpoint Chain TCP FLAG SEQ ACK
10.10.10.3:50538 --> 10.109.43.235:9292 raw:OUTPUT:policy:4 SYN 2917708674 0
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 filter:OUTPUT:rule:1 SYN 2917708674 0
10.10.10.3:24479 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 SYN ACK 2742336307 2917708675
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 mangle:INPUT:policy:1 SYN ACK 2742336307 2917708675
10.10.10.3:50538 --> 10.109.43.235:9292 raw:OUTPUT:policy:4 ACK 2707980038 1599414187
10.10.10.3:50538 --> 10.109.43.235:9292 filter:OUTPUT:rule:1 ACK 2707980038 1599414187
10.10.10.3:50538 --> 10.109.43.235:9292 filter:cali-th-ens6:rule:2 ACK(DROP) 2707980038 1599414187

The last ACK(10.10.10.3:50538 --> 10.109.43.235:9292) SEQ and ACK is same as Probe TIME_WAIT latest ACK’s .
from https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 , it only check (des ip , des port, src ip, and src port)
Because this is not a correct SEQ/ACK , then it is set invalid and then dropped.

If enabling tcp_tw_recycle, the previous socket should be already closed , then the issu...

The probe connection action before going to time_wait state. 
Probe connection
controller                                        service                     endpoint                            Chain                      TCP FLAG                           SEQ          ACK
10.10.10.3:50538  ------------------------------->    172.16.192.101:9292     raw:OUTPUT:policy:4               SYN                       2707980036        0
10.10.10.3:50538  <-------------------------------    172.16.192.101:9292    raw:PREROUTING:policy:4     SYN ACK                 1599414185    2707980037
10.10.10.3:50538  ------------------------------->    172.16.192.101:9292    filter:OUTPUT:rule:1                 ACK                       2707980037    1599414186
10.10.10.3:50538  ------------------------------->    172.16.192.101:9292     raw:OUTPUT:policy:4             FIN ACK                 2707980037    1599414186
10.10.10.3:50538  <-------------------------------    172.16.192.101:9292    raw:PREROUTING:policy:4       ACK                      1599414186    2707980038
10.10.10.3:50538  <-------------------------------    172.16.192.101:9292    raw:PREROUTING:policy:4      FIN ACK                1599414186    2707980038
10.10.10.3:50538  ------------------------------->    172.16.192.101:9292     raw:OUTPUT:policy:4               ACK                      2707980038    1599414187

And for the curl command connection with same port 50538: it will be like 
controller                                   service                     endpoint                                 Chain                            TCP FLAG            SEQ          ACK
10.10.10.3:50538  -->  10.109.43.235:9292                                               raw:OUTPUT:policy:4                  SYN           2917708674        0
10.10.10.3:50538  ------------------------------->    172.16.192.101:9292    filter:OUTPUT:rule:1                    SYN           2917708674        0
10.10.10.3:24479  <-------------------------------    172.16.192.101:9292    raw:PREROUTING:policy:4      SYN ACK        2742336307    2917708675
10.10.10.3:50538  <-------------------------------    172.16.192.101:9292    mangle:INPUT:policy:1            SYN ACK        2742336307    2917708675
10.10.10.3:50538  -->  10.109.43.235:9292                                               raw:OUTPUT:policy:4                   ACK           2707980038    1599414187
10.10.10.3:50538  -->  10.109.43.235:9292                                               filter:OUTPUT:rule:1                    ACK           2707980038    1599414187                                          
10.10.10.3:50538  -->  10.109.43.235:9292                                              filter:cali-th-ens6:rule:2         ACK(DROP)     2707980038    1599414187

The last ACK(10.10.10.3:50538  -->  10.109.43.235:9292) SEQ and ACK is same as Probe TIME_WAIT latest ACK’s .  
from https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 , it only check (des ip , des port, src ip, and src port)
Because this is not a correct SEQ/ACK , then it is set invalid and then dropped.

If enabling tcp_tw_recycle, the previous socket should be already closed , then the issue should be gone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-27: Fix merged to integ (master)

#28

Reviewed: https://review.opendev.org/674719
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=171c43dca8b419c5ac579e289002ff4fdb1aee3f
Submitter: Zuul
Branch: master

commit 171c43dca8b419c5ac579e289002ff4fdb1aee3f
Author: Sun Austin <email address hidden>
Date: Mon Aug 5 16:46:43 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs

set net.ipv4.tcp_tw_recycle=1 to avoid dnat conntrack invalid

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.

If enabling tcp_tw_recycle, the previous socket should be already
closed, then the issue should be gone.

Closes-Bug: 1817936

Change-Id: If6e66d85f08fc99022946fd2e9f4e5756bfb7b2f
Signed-off-by: Sun Austin <email address hidden>

Reviewed:  https://review.opendev.org/674719
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=171c43dca8b419c5ac579e289002ff4fdb1aee3f
Submitter: Zuul
Branch:    master

commit 171c43dca8b419c5ac579e289002ff4fdb1aee3f
Author: Sun Austin <austin.sun@intel.com>
Date:   Mon Aug 5 16:46:43 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs
    
    set net.ipv4.tcp_tw_recycle=1 to avoid dnat conntrack invalid
    
    The probe connection action before going to time_wait state.
    Probe connection
    controller               pod        TCP FLAG      SEQ           ACK
    controller:50538 ---> endpoint:9292     SYN       2707980036       0
    controller:50538 <--- endpoint:9292   SYN ACK     1599414185
    2707980037
    controller:50538 ---> endpoint:9292     ACK       2707980037
    1599414186
    controller:50538 ---> endpoint:9292   FIN ACK     2707980037
    1599414186
    controller:50538 <--- endpoint:9292     ACK       1599414186
    2707980038
    controller:50538 <--- endpoint:9292   FIN ACK     1599414186
    2707980038
    controller:50538 ---> endpoint:9292     ACK       2707980038
    1599414187
    
    And for the curl command connection with same port 50538: it will be
    like
    controller              pod          TCP FLAG         SEQ          ACK
    controller:50538 -->  service:9292     SYN        2917708674        0
    controller:50538 --> endpoint:9292     SYN        2917708674        0
    controller:24479 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK(DROP)  2707980038
    1599414187
    
    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.
    
    If enabling tcp_tw_recycle, the previous socket should be already
    closed, then the issue should be gone.
    
    Closes-Bug: 1817936
    
    Change-Id: If6e66d85f08fc99022946fd2e9f4e5756bfb7b2f
    Signed-off-by: Sun Austin <austin.sun@intel.com>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-27: Fix proposed to integ (r/stx.2.0)

#29

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/678944

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-03: Change abandoned on integ (r/stx.2.0)

#30

Change abandoned by Austin Sun (<email address hidden>) on branch: r/stx.2.0
Review: https://review.opendev.org/678944

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-09-03:

#31

change needs to be reverted, due to https://bugs.launchpad.net/starlingx/+bug/1841833

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

Austin Sun (sunausti) wrote on 2019-09-10:

#32

once revert change merged, will submit a new change for another parameters net.ipv4_tcp_tw_resue=0.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-11: Fix proposed to config-files (master)

#33

Fix proposed to branch: master
Review: https://review.opendev.org/681363

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-16: Fix merged to config-files (master)

#34

Reviewed: https://review.opendev.org/681363
Committed: https://git.openstack.org/cgit/starlingx/config-files/commit/?id=fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Submitter: Zuul
Branch: master

commit fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Author: Sun Austin <email address hidden>
Date: Sat Sep 7 08:31:13 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs

set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
and remove customizing ephemeral port range

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.

    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .

Closes-Bug: 1817936

Change-Id: I0b37e9829ac5d3bc9ca1a0b8f55abc632c79f446
Signed-off-by: Sun Austin <email address hidden>

Reviewed:  https://review.opendev.org/681363
Committed: https://git.openstack.org/cgit/starlingx/config-files/commit/?id=fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Submitter: Zuul
Branch:    master

commit fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Author: Sun Austin <austin.sun@intel.com>
Date:   Sat Sep 7 08:31:13 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs
    
    set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
    and remove customizing ephemeral port range
    
    The probe connection action before going to time_wait state.
    Probe connection
    controller               pod        TCP FLAG      SEQ           ACK
    controller:50538 ---> endpoint:9292     SYN       2707980036       0
    controller:50538 <--- endpoint:9292   SYN ACK     1599414185
    2707980037
    controller:50538 ---> endpoint:9292     ACK       2707980037
    1599414186
    controller:50538 ---> endpoint:9292   FIN ACK     2707980037
    1599414186
    controller:50538 <--- endpoint:9292     ACK       1599414186
    2707980038
    controller:50538 <--- endpoint:9292   FIN ACK     1599414186
    2707980038
    controller:50538 ---> endpoint:9292     ACK       2707980038
    1599414187
    
    And for the curl command connection with same port 50538: it will be
    like
    controller              pod          TCP FLAG         SEQ          ACK
    controller:50538 -->  service:9292     SYN        2917708674        0
    controller:50538 --> endpoint:9292     SYN        2917708674        0
    controller:24479 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK(DROP)  2707980038
    1599414187
    
    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.
    
    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .
    
    Closes-Bug: 1817936
    
    Change-Id: I0b37e9829ac5d3bc9ca1a0b8f55abc632c79f446
    Signed-off-by: Sun Austin <austin.sun@intel.com>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-09-19:

#35

@Austin, the change has sufficient soak in master and sanity is green, please cherrypick to r/stx.2.0

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-09-19:

#36

@Austin, when the change has sufficient soak in master and sanity is green, please cherrypick to r/stx.2.0

Revision history for this message

Austin Sun (sunausti) wrote on 2019-09-20:

#37

Hi Ghada:
The previous change was merged but sanity is green, so i think sanity did not cover some cases. I will check with test team to check if they can perform some regression test after daily sanity is green

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-22: Fix proposed to integ (r/stx.2.0)

#38

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/683818

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-03: Fix merged to integ (r/stx.2.0)

#39

Reviewed: https://review.opendev.org/683818
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=457b4747e92255fa99340fa7cb1d7204873fb3c5
Submitter: Zuul
Branch: r/stx.2.0

commit 457b4747e92255fa99340fa7cb1d7204873fb3c5
Author: Sun Austin <email address hidden>
Date: Sun Sep 22 19:33:04 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs

set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
and remove customizing ephemeral port range

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not a correct SEQ/ACK , then it is set invalid and then dropped.

    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .

Closes-Bug: 1817936

Change-Id: I9a39e3b439633ffb82ff05814935eba9ae892eda
Signed-off-by: Sun Austin <email address hidden>

Reviewed:  https://review.opendev.org/683818
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=457b4747e92255fa99340fa7cb1d7204873fb3c5
Submitter: Zuul
Branch:    r/stx.2.0

commit 457b4747e92255fa99340fa7cb1d7204873fb3c5
Author: Sun Austin <austin.sun@intel.com>
Date:   Sun Sep 22 19:33:04 2019 +0800

Fix Periodic message loss between VIM and Openstack REST APIs
    
    set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
    and remove customizing ephemeral port range
    
    The probe connection action before going to time_wait state.
    Probe connection
    controller               pod        TCP FLAG      SEQ           ACK
    controller:50538 ---> endpoint:9292     SYN       2707980036       0
    controller:50538 <--- endpoint:9292   SYN ACK     1599414185
    2707980037
    controller:50538 ---> endpoint:9292     ACK       2707980037
    1599414186
    controller:50538 ---> endpoint:9292   FIN ACK     2707980037
    1599414186
    controller:50538 <--- endpoint:9292     ACK       1599414186
    2707980038
    controller:50538 <--- endpoint:9292   FIN ACK     1599414186
    2707980038
    controller:50538 ---> endpoint:9292     ACK       2707980038
    1599414187
    
    And for the curl command connection with same port 50538: it will be
    like
    controller              pod          TCP FLAG         SEQ          ACK
    controller:50538 -->  service:9292     SYN        2917708674        0
    controller:50538 --> endpoint:9292     SYN        2917708674        0
    controller:24479 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 <-- endpoint:9292   SYN ACK      2742336307
    2917708675
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK        2707980038
    1599414187
    controller:50538 -->  service:9292     ACK(DROP)  2707980038
    1599414187
    
    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not a correct SEQ/ACK , then it is set invalid and then dropped.
    
    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .
    
    Closes-Bug: 1817936
    
    Change-Id: I9a39e3b439633ffb82ff05814935eba9ae892eda
    Signed-off-by: Sun Austin <austin.sun@intel.com>

Ghada Khalil (gkhalil) on 2019-10-03

tags:

added: in-r-stx20

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-10-03:

#40

The following testing was done by Elio Martinez with an stx.2.0 designer build with this fix

Sanity Horizon VM operation Ping 10 Hrs Script from bug
AIOX GREEN PASS PASS PASS PASS
MULTI GREEN PASS PASS PASS PASS

Important highlights:

1. All scenarios were executed in AIOX and MULTI (2+2) systems.
2. Sanity execution was successfully without any kind of problema (GREEN)
3. Horizon tests were executed as well (pause, suspend, unpause, créate, delete instances) .
4. VM operation was using Horizon as well, no problem detected.
5. Ping between instances were executed about at least 10 hours each configuration.
6. I decided to execute the script mentioned in https://bugs.launchpad.net/starlingx/+bug/1817936, no issues found using that script (executed 5 times since the reproducibility was intemittent).

Conclusion:

No problem was found with requested testing on http://dcp-dev.intel.com/pub/starlingx/stx-eng-build/270/outputs/

StarlingX

Periodic message loss seen between VIM and OpenStack REST APIs

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Changed in starlingx:
assignee:	Bart Wensley (bartwensley) → Matt Peters (mpeters-wrs)