Periodic message loss seen between VIM and OpenStack REST APIs

Bug #1817936 reported by Bart Wensley
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Austin Sun

Bug Description

Title
-----
Periodic message loss seen between VIM and OpenStack REST APIs

Brief Description
-----------------
I am seeing periodic message loss when the VIM (on bare metal) is sending messages to the REST APIs for the OpenStack services (e.g. nova, neutron, glance). The VIM sends a message and never gets a reply. I see this happening every 10 or 20 minutes - the logs look something like this (nfv_vim.log):
              2019-02-27T01:07:37.411 controller-0 VIM_Thread[4494] INFO _task_worker_pool.py.73 Timeout worker Image-Worker-0
              2019-02-27T01:07:38.417 controller-0 VIM_Thread[4494] ERROR _task.py.200 Task(get_images) work (get_images) timed out, id=1061.
              2019-02-27T01:07:38.417 controller-0 VIM_Thread[4494] ERROR _vim_nfvi_audits.py.550 Audit-Images callback, not completed, responses={'completed': False, 'reason': '', 'page-request-id': '37c04013-ccc8-49d1-93ba-99e903753c27'}.

Severity
--------
Minor

Steps to Reproduce
------------------
This seems to happen in some labs during steady state operation.

Expected Behavior
------------------
There should not be any message loss between the VIM and the OpenStack REST APIs.

Actual Behavior
----------------
See above

Reproducibility
---------------
Intermittent - only see this occasionally.

System Configuration
--------------------
2 + 2 system (kubernetes)

Branch/Pull Time/Commit
-----------------------
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"

JOB="STX_build_stein_master"
<email address hidden>"
BUILD_NUMBER="54"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-02-25 19:13:50 +0000"

Timestamp/Logs
--------------
See above.

Changed in starlingx:
assignee: nobody → Bart Wensley (bartwensley)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating until further investigation. medium priority.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.2019.05 stx.containers
Revision history for this message
Bart Wensley (bartwensley) wrote :
Download full text (5.5 KiB)

I have made some progress on the periodic message loss issue.

First, I believe the messages are being sent by the VIM, but are not being received by the nova-api-osapi pod. As an example, in one failure case, the VIM would have sent two GET requests to the nova-api here:
2019-02-27T16:54:05.222 controller-1 VIM_Thread[192368] DEBUG _vim_nfvi_audits.py.974 Audit instance called, timer_id=16.
2019-02-27T16:54:05.223 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.1003 Auditing instance a5d7d490-5e12-4161-9949-b400f79237a5.
2019-02-27T16:54:05.223 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.1003 Auditing instance c572d839-19f5-4aec-9f57-f34b8bcf2a2d.

I can see that the second message is processed by the nova-api-osapi pod on controller-1 (co-located with the VIM):
2019-02-27 16:54:05,634.634 1 INFO nova.osapi_compute.wsgi.server [req-64ca04ca-684e-4b82-9cde-cd0f4dc8e4f9 3609cdfff9cb486c92732d162375fb9f 9f0e8cacb37b42c7a5b3c7bc99863582 - default default] 128.224.151.84 "GET /v2.1/9f0e8cacb37b42c7a5b3c7bc99863582/servers/c572d839-19f5-4aec-9f57-f34b8bcf2a2d HTTP/1.1" status: 200 len: 2708 time: 0.4034779

The VIM handles the reply:
2019-02-27T16:54:05.636 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.941 Audit-Instance callback for c572d839-19f5-4aec-9f57-f34b8bcf2a2d
2019-02-27T16:54:05.640 controller-1 VIM_Thread[192368] DEBUG _instance_director.py.1579 Notify other directors that an instance centos-1 audit is inprogress.

But the first message doesn’t appear in the nova-api-osapi logs and the VIM eventually times out waiting for the reply:
2019-02-27T16:54:15.441 controller-1 VIM_Thread[192368] DEBUG _vim_nfvi_audits.py.974 Audit instance called, timer_id=16.
2019-02-27T16:54:15.441 controller-1 VIM_Thread[192368] INFO _vim_nfvi_audits.py.984 Audit instance queries still outstanding, outstanding=OrderedDict([(u'a5d7d490-5e12-4161-9949-b400f79237a5', u'centos-2')])
2019-02-27T16:54:25.234 controller-1 VIM_Thread[192368] INFO _task_worker_pool.py.73 Timeout worker Compute-Worker-0
2019-02-27T16:54:26.239 controller-1 VIM_Thread[192368] ERROR _task.py.200 Task(get_instance) work (get_server) timed out, id=3901.
2019-02-27T16:54:26.240 controller-1 VIM_Thread[192368] ERROR _vim_nfvi_audits.py.961 Audit-Instance callback, not completed, response={'completed': False, 'reason': ''}.

I next determined that this problem is not specific to the VIM. I wrote simple script to send GET requests directly to the nova API using curl (run this from a shell with the OpenStack credentials set up):

#!/bin/bash

TENANT_ID=`openstack token issue | grep "| project_id |" | cut -f3 -d'|' | tr -d '[[:space:]]'`
TOKEN_ID=`openstack token issue | grep "| id |" | cut -f3 -d'|' | tr -d '[[:space:]]'`

counter=1
while [ $counter -le 10000 ]
do
  echo
  echo "Request: $counter"
  date +"%D %T %N"
  curl -g -i -X GET http://nova-api.openstack.svc.cluster.local:8774/v2.1/$TENANT_ID/flavors/detail -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: $TOKEN_ID" -H "X-OpenStack-Nova-API-Version: 2.1"
  #curl -g -i -X GET http://nova.openstack.svc.cluster.local:80/v2.1/$TENANT_ID/flavors/de...

Read more...

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Bart Wensley (bartwensley) → Matt Peters (mpeters-wrs)
Revision history for this message
Matt Peters (mpeters-wrs) wrote :
Download full text (5.5 KiB)

The connection fails only under specific conditions and is directly related to a conflict when setting up the NAT connection tracking information. This issue occurs in about 1 in 500 connections and is only impacts that one connection, all other and subsequent connections are fine.

There seems to be a race condition or a source port conflict when performing the NAT operation when connecting to the local pod. When there is a failed connection, the conntrack entry shows the use of a generated port number in the reverse path translation. This seems to imply it could not use the original source port due to the conflict, but then subsequent packets fail to match the flow entry, causing the connection to fail.

tcp 6 86399 ESTABLISHED src=10.10.10.13 dst=10.103.101.91 sport=60171 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60171 [ASSURED] mark=0 use=1
tcp 6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59721 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59721 [ASSURED] mark=0 use=1
tcp 6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59753 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59753 [ASSURED] mark=0 use=1
tcp 6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60053 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60053 [ASSURED] mark=0 use=1
tcp 6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59961 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59961 [ASSURED] mark=0 use=2
tcp 6 114 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59717 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59717 [ASSURED] mark=0 use=1
tcp 6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59739 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59739 [ASSURED] mark=0 use=1
tcp 6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59985 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59985 [ASSURED] mark=0 use=2
tcp 6 119 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60147 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60147 [ASSURED] mark=0 use=2
tcp 6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60047 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60047 [ASSURED] mark=0 use=1
tcp 6 117 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60009 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60009 [ASSURED] mark=0 use=1
tcp 6 119 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60161 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60161 [ASSURED] mark=0 use=1
tcp 6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=60057 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=60057 [ASSURED] mark=0 use=1
tcp 6 116 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59833 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59833 [ASSURED] mark=0 use=1
tcp 6 115 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sport=59791 dport=8774 src=172.16.0.164 dst=10.10.10.13 sport=8774 dport=59791 [ASSURED] mark=0 use=1
tcp 6 118 TIME_WAIT src=10.10.10.13 dst=10.103.101.91 sp...

Read more...

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

The statistics also show the "invalid" counts being begged every time the issue occurs.

controller-0:~$ sudo conntrack -S
cpu=0 searched=2278720 found=26257506 new=925893 invalid=1205 ignore=4451231 delete=1094225 delete_list=1094226 insert=925895 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=1 searched=2192697 found=26227245 new=929531 invalid=1392 ignore=4293678 delete=1069473 delete_list=1069464 insert=929527 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=2 searched=3038601 found=35074053 new=932092 invalid=1197 ignore=4435206 delete=1042689 delete_list=1042690 insert=932090 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=3 searched=12913387 found=154903356 new=2980111 invalid=5441 ignore=3586824 delete=2559266 delete_list=2428654 insert=2849496 insert_failed=0 drop=0 early_drop=0 error=5 search_restart=0

Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Revision history for this message
Austin Sun (sunausti) wrote :

Bart, and Matt:
I have try reproduce this issue in virtual environment in latest version.
I run more than 10000 times the script Bart mentioned, but can not reproduce.
could you try latest version if the issue is there or not ?

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

This issue only occurs when the services (or test client) are co-located on the same host. Can you confirm that your reproduction attempts were all localized to a single host?

Revision history for this message
Austin Sun (sunausti) wrote :

Hi Matt:
    you mean run this in a AIO(All-in-One) Simplex setup ?

Revision history for this message
Austin Sun (sunausti) wrote :

I think you are right , in AIO Simplex setup ,the script stopped at
Request: 2083
06/11/19 03:03:48 523488226

curl: (7) Failed connect to nova-api.openstack.svc.cluster.local:8774; Connection timed out
7

Revision history for this message
Austin Sun (sunausti) wrote :

From, mytestnossh.cap15 tcpdump. a very weird No 13523, it got a response from nova-api, but can not find any corresponding request send from host in previous time.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Austin since he is actively working on this bug

Changed in starlingx:
assignee: Matt Peters (mpeters-wrs) → Austin Sun (sunausti)
Revision history for this message
Austin Sun (sunausti) wrote :

in, kern_2019-07-15_15_04_50.log, from Line 24000 to 24001, a very weird:
Line 24000, mangle:cali-PREROUTING:rule:1 172.16.192.109:8774->10.10.10.3:33727
Line 24001, mangle:INPUT:policy:1 172.16.192.109:8774->10.10.10.3:53902

The destination port was suddenly changed from 33727 to 53902.

after that:
Ln 24014, it goes to filter:cali-th-ens6:rule:2, which rule is because cstate is invalid, then it goes error.

Frank Miller (sensfan22)
tags: added: stx.networking
removed: stx.containers
Revision history for this message
Austin Sun (sunausti) wrote :

Here is some finding , readiness or liveness deployment of nova-api-osapi or xxxx-api is using the tcp and port to check pod readiness and liveness
For example barbican-api readiness
readinessProbe:
          failureThreshold: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 9311

‘curl --connect-timeout 10 http://barbican-api.openstack.svc.cluster.local:9311 result’

Host Name readinessProbe. periodSeconds Error times in 50000 times Comments
R22 10 29 only applied barbican , no cinder, glance , compute-toolkit were applied.
R22 10 28
R22 300 0
R22 300 0
R22 300 0
Edgetest901 10 8 Apply default helm chart
Edgetest901 300 2
Edgetest901 300 1

it seems change period of readiness will improve the result.

And I changed nova-api-osapi deploy readiness and liveness from tcp to exec command.
Changed to
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/nova-api.sh
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

The result is very positive , there is 0 error on 2*50000 times.
will check how to fix this issue officially

Revision history for this message
Austin Sun (sunausti) wrote :

in 1817936_iptrace.zip, kern_2019-07-15_15_04_50.log
Ln 23963, SPT=53902 , SYN Package from kernel, but it was not found in tcpdump package. tcpdump package only find source port =33727 sync package.

Revision history for this message
Austin Sun (sunausti) wrote :

Some more tests are performed based on attached tcp_test.zip
Set-Up K8s with calico CNI On Ubuntu 18.04.1 LTS and CentOS Linux release 7.6.1810
'kubectl create ns policy-demo'
'kubectl apply -f tcp_echo.yaml'
and run test.sh in attached zip.

Ubuntu 18.04.1 result is Good. connected 10000 times.
on CentOS 7.6.1810, connected will failed in 1000 times.
But if readinessProbe was removed from pod in CentOS 7.6.1810. connected 10000 times without any failure.

General info
CentOS Linux release 7.6.1810
docker version:19.03.0
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

Ubuntu 18.04.1 LTS
Docker version 18.09.8, build 0dd43dd87f
k8s:"v1.15.1"
calico v3.6.4(https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml)

centos_trace.cap No 211, 10.10.10.188:1024->192.168.102.134:1237, always Retransmission and failure.
kern.log Line 7189 10.10.10.188:1024->10.101.65.100:1237 can not find this package in tcpdump
In Line 7241 SYN Package Source Port is 42934, But in Line 7242 The SYN ACK Destination Port changed to 1237
Line 7241 Jul 22 23:44:17 localhost kernel: [ 8013.941489] TRACE: nat:KUBE-POSTROUTING:rule:1 IN= OUT=calic3dc5321ad4 SRC=10.10.10.188 DST=192.168.102.134 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23843 DF PROTO=TCP SPT=42934 DPT=1237 SEQ=2847795293 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A0075B57E0000000001030307) UID=1000 GID=1000 MARK=0x4000
Line 7242 Jul 22 23:44:17 localhost kernel: [ 8013.941551] TRACE: raw:PREROUTING:policy:4 IN=calic3dc5321ad4 OUT= MAC=ee:ee:ee:ee:ee:ee:e6:bf:5c:54:84:71:08:00 SRC=192.168.102.134 DST=10.10.10.188 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=1237 DPT=1024 SEQ=2976603651 ACK=2847795294 WINDOW=27760 RES=0x00 ACK SYN URGP=0 OPT (020405780402080A0075B57F0075B57E01030307) MARK=0x40000

Revision history for this message
Austin Sun (sunausti) wrote :

another interesting test was performed. if using command
 "curl --local-port 47250-67500 --connect-timeout 10 http://barbican-api.openstack.svc.cluster.local:9311", then the issue is gone. it seems the port conflict in some case.

Revision history for this message
Frank Miller (sensfan22) wrote :

Changed the priority to High based on the outcome of discussions with PLs and TLs (Ghada, Brent, Frank). Numerous LPs have been reported over the past few months related to openstack commands intermittently failing. Many of these look like message loss between openstack pods and could be due to this LP.

tags: added: stx.containers
Changed in starlingx:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672387

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Austin Sun (sunausti) wrote :

it seems curl command in centos is not controlled by net.ipv4.ip_local_port_range.
When I running curl command , the source port range is from 11728 to 65508 with tcpdump log.

So the valid test approach should be 'curl --local-port 49216-61000 xxxx' 49216-61000 is stx ip local port range.

Revision history for this message
Austin Sun (sunausti) wrote :

although curl command is invalid test case, but for python test code.
import urllib
import datetime
for i in range(1,20000):
    now = datetime.datetime.now()
    print("start %s at: %s" % (i,now))
    try:
        response = urllib.urlopen("http://glance-api.openstack.svc.cluster.local:9292")
        print(response.read())
        response.close()
    except:
        now = datetime.datetime.now()
        print("Error %s at: %s" % (i,now))
        exit()
    now = datetime.datetime.now()
    print("end %s at: %s" % (i,now))

This issue is still reproducible even local port range 49216-61000 or 45000-65500 or 32768-60999.
need more investigation.

Revision history for this message
Austin Sun (sunausti) wrote :

if remove glance-api readiness probe and livenessprobe , then test mentioned above run successfully w/o failure in 50000 times. it should be some conflict interfere between application with tcp probe.

Revision history for this message
Austin Sun (sunausti) wrote :

from the attached tcpdump log.
@26151, there is a package 10.10.10.3:18794->172.16.192.89:9292 , but the port 18794 should not be valid, because 'net.ipv4.ip_local_port_range = 49216 61000', need checking where the packages comes from.

Revision history for this message
Austin Sun (sunausti) wrote :

attachment is long trace for monitoring 9292 port , there are a lot of ports out of ip_local_port_range(49217 65500)
for example: No 233525, the port is 10325, 10.10.10.3:10325->172.16.192.130:9292

The ephemeral ports out of ip_local_port_range:
10187: 10325: 11283: 11702: 11891: 12797: 13324: 13336: 13341: 13447: 13470: 13553: 13664: 13685: 13864: 13934: 14393
: 15440: 15496: 15520: 16558: 16695: 16794: 16909: 16932: 17017: 17110: 17117: 17269: 17343: 17589: 18024: 18166: 182
13: 18274: 18342: 18959: 19275: 19845: 20230: 20499: 21086: 21121: 21141: 21302: 21477: 21816: 21987: 22117: 22296: 2
3081: 23233: 23621: 24344: 24877: 25243: 25297: 25634: 25637: 25909: 26443: 26454: 27055: 27428: 27597: 27703: 27786:
 27860: 27935: 28010: 28105: 28224: 28403: 28847: 28922: 29385: 29714: 29912: 30493: 30547: 30641: 30736: 31264: 3147
9: 31734: 32003: 32070: 32234: 32531: 32670: 32987: 33032: 33486: 33507: 33509: 33611: 33806: 33926: 34073: 34505: 34
522: 34574: 34589: 34819: 35228: 35442: 35779: 36030: 36036: 37493: 37538: 37782: 38567: 38756: 38872: 39561: 39642:
39700: 39903: 39989: 3exmp: 40624: 40858: 40874: 41165: 41231: 41343: 41511: 41687: 41701: 41709: 41773: 41799: 41864
: 42103: 42525: 42698: 42837: 4334: 43705: 4388: 44435: 44726: 45212: 45313: 45613: 45752: 45789: 45925: 45927: 46201
: 46225: 47263: 47679: 47830: 47945: 47985: 48007: 4826: 48431: 48473: 48646: 48729: 4873: 48852: 4921: 49216: 49217:
 49218:

Revision history for this message
Austin Sun (sunausti) wrote :

when issue occured.
in echo-glance-api.cap (tcpdump -i any -vv port 9292 -w echo-glance-api.cap)
No468: 10.10.10.3:13550 ---> 172.16.192.99:9292
in echo-glance-api.con (conntrack -o timestamp -E -p tcp --dport 9292)
[1564902343.325301] [NEW] tcp 6 120 SYN_SENT src=10.10.10.3 dst=10.98.65.211 sport=53720 dport=9292 [UNREPLIED] src=172.16.192.99 dst=10.10.10.3 sport=9292 dport=13550
[1564902343.325305] [UPDATE] tcp 6 60 SYN_RECV src=10.10.10.3 dst=10.98.65.211 sport=53720 dport=9292 src=172.16.192.99 dst=10.10.10.3 sport=9292 dport=13550
The sport was changed from 53720 to 13550. it seems NAT change the port. will check netfliter code.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by Austin Sun (<email address hidden>) on branch: master
Review: https://review.opendev.org/672387

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/674719

Revision history for this message
Austin Sun (sunausti) wrote :

from this log , The root cause is found.
probe was in TIME_WAIT connection(10.10.10.3:50538->172.16.192.101:9292)
client send a new connection 10.10.10.3:50538->10.109.43.235:9292, after NAT the output SYN package will be
10.10.10.3:24479->172.16.192.101:9292. the SYN ACK 172.16.192.101:9292->10.10.10.3:24479 will be modified by Mangle to 172.16.192.101:9292->10.10.10.3:50538, but this socket was still in TIME_WAIT state, will be sent INVALID and then drop.

==========================================================================================
describe controller service endpoint STATE
                (10.10.10.3) (10.109.43.235) (172.16.192.101)
     Probe *(50538)------------------------->*(9292) TIME_WAIT
     Client New *(50538)----> *(9292)
     After NAT
     Client *(24479)------------------------->*(9292) SYN_SENT
     IN PREROUTING *(24479)<-------------------------*(9292) SYN_RECV
     INPUT Chain *(50538)<-------------------------*(9292) SYN_RECV

==========================================================================================
tcpdump_port_9292.cap:
No.24288 10.10.10.3:24479->172.16.192.101:9292
No.24289 172.16.192.101:9292->10.10.10.3:24479

conntrack_tcp_port9292:

[1565008856.898709] [UPDATE] tcp 6 120 TIME_WAIT src=10.10.10.3 dst=172.16.192.101 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=50538 [ASSURED]
[1565008911.774492] [NEW] tcp 6 120 SYN_SENT src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 [UNREPLIED] src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479
[1565008911.774522] [UPDATE] tcp 6 60 SYN_RECV src=10.10.10.3 dst=10.109.43.235 sport=50538 dport=9292 src=172.16.192.101 dst=10.10.10.3 sport=9292 dport=24479

because port 50538 is still in TIME_WAIT to release.
New connection NAT can not use this CT. so NAT change to 10.10.10.3:24479->172.16.192.101:9292
10.10.10.3:50538->10.109.43.235:9292 nat to 10.10.10.3:24479->172.16.192.101:9292
kern_9292.log:
mangle change the CT 172.16.192.101:9292->10.10.10.3:24479 to 172.16.192.101:9292->10.10.10.3:50538
because this 172.16.192.101:9292->10.10.10.3:50538 CT is still waitting for DESTORY,
so 172.16.192.101:9292->10.10.10.3:50538 is set invalid. then package is droped

2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116655] TRACE: mangle:cali-PREROUTING:rule:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=24479 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000
2019-08-05T12:41:51.775 controller-0 kernel: warning [41907.116661] TRACE: mangle:INPUT:policy:1 IN=calif380bc70786 OUT= MAC=ee:ee:ee:ee:ee:ee:c6:6d:0e:f5:a5:88:08:00 SRC=172.16.192.101 DST=10.10.10.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=9292 DPT=50538 SEQ=2742336307 ACK=2917708675 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A027ADF54027ADF5401030307) MARK=0x40000

Revision history for this message
Austin Sun (sunausti) wrote :
Download full text (3.1 KiB)

The probe connection action before going to time_wait state.
Probe connection
controller service endpoint Chain TCP FLAG SEQ ACK
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 SYN 2707980036 0
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 SYN ACK 1599414185 2707980037
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 filter:OUTPUT:rule:1 ACK 2707980037 1599414186
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 FIN ACK 2707980037 1599414186
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 ACK 1599414186 2707980038
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 FIN ACK 1599414186 2707980038
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 raw:OUTPUT:policy:4 ACK 2707980038 1599414187

And for the curl command connection with same port 50538: it will be like
controller service endpoint Chain TCP FLAG SEQ ACK
10.10.10.3:50538 --> 10.109.43.235:9292 raw:OUTPUT:policy:4 SYN 2917708674 0
10.10.10.3:50538 -------------------------------> 172.16.192.101:9292 filter:OUTPUT:rule:1 SYN 2917708674 0
10.10.10.3:24479 <------------------------------- 172.16.192.101:9292 raw:PREROUTING:policy:4 SYN ACK 2742336307 2917708675
10.10.10.3:50538 <------------------------------- 172.16.192.101:9292 mangle:INPUT:policy:1 SYN ACK 2742336307 2917708675
10.10.10.3:50538 --> 10.109.43.235:9292 raw:OUTPUT:policy:4 ACK 2707980038 1599414187
10.10.10.3:50538 --> 10.109.43.235:9292 filter:OUTPUT:rule:1 ACK 2707980038 1599414187
10.10.10.3:50538 --> 10.109.43.235:9292 filter:cali-th-ens6:rule:2 ACK(DROP) 2707980038 1599414187

The last ACK(10.10.10.3:50538 --> 10.109.43.235:9292) SEQ and ACK is same as Probe TIME_WAIT latest ACK’s .
from https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 , it only check (des ip , des port, src ip, and src port)
Because this is not a correct SEQ/ACK , then it is set invalid and then dropped.

If enabling tcp_tw_recycle, the previous socket should be already closed , then the issu...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/674719
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=171c43dca8b419c5ac579e289002ff4fdb1aee3f
Submitter: Zuul
Branch: master

commit 171c43dca8b419c5ac579e289002ff4fdb1aee3f
Author: Sun Austin <email address hidden>
Date: Mon Aug 5 16:46:43 2019 +0800

    Fix Periodic message loss between VIM and Openstack REST APIs

    set net.ipv4.tcp_tw_recycle=1 to avoid dnat conntrack invalid

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.

    If enabling tcp_tw_recycle, the previous socket should be already
    closed, then the issue should be gone.

    Closes-Bug: 1817936

    Change-Id: If6e66d85f08fc99022946fd2e9f4e5756bfb7b2f
    Signed-off-by: Sun Austin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.2.0)

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/678944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (r/stx.2.0)

Change abandoned by Austin Sun (<email address hidden>) on branch: r/stx.2.0
Review: https://review.opendev.org/678944

Revision history for this message
Cindy Xie (xxie1) wrote :

change needs to be reverted, due to https://bugs.launchpad.net/starlingx/+bug/1841833

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Austin Sun (sunausti) wrote :

once revert change merged, will submit a new change for another parameters net.ipv4_tcp_tw_resue=0.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)

Fix proposed to branch: master
Review: https://review.opendev.org/681363

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/681363
Committed: https://git.openstack.org/cgit/starlingx/config-files/commit/?id=fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Submitter: Zuul
Branch: master

commit fbc09b8db8a14fbf24976a0f7d8924af7a330f85
Author: Sun Austin <email address hidden>
Date: Sat Sep 7 08:31:13 2019 +0800

    Fix Periodic message loss between VIM and Openstack REST APIs

    set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
    and remove customizing ephemeral port range

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not
     a correct SEQ/ACK , then it is set invalid and then dropped.

    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .

    Closes-Bug: 1817936

    Change-Id: I0b37e9829ac5d3bc9ca1a0b8f55abc632c79f446
    Signed-off-by: Sun Austin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Austin, the change has sufficient soak in master and sanity is green, please cherrypick to r/stx.2.0

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Austin, when the change has sufficient soak in master and sanity is green, please cherrypick to r/stx.2.0

Revision history for this message
Austin Sun (sunausti) wrote :

Hi Ghada:
    The previous change was merged but sanity is green, so i think sanity did not cover some cases. I will check with test team to check if they can perform some regression test after daily sanity is green

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.2.0)

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/683818

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (r/stx.2.0)

Reviewed: https://review.opendev.org/683818
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=457b4747e92255fa99340fa7cb1d7204873fb3c5
Submitter: Zuul
Branch: r/stx.2.0

commit 457b4747e92255fa99340fa7cb1d7204873fb3c5
Author: Sun Austin <email address hidden>
Date: Sun Sep 22 19:33:04 2019 +0800

    Fix Periodic message loss between VIM and Openstack REST APIs

    set net.ipv4.tcp_tw_reuse=0 to avoid dnat conntrack invalid
    and remove customizing ephemeral port range

    The probe connection action before going to time_wait state.
    Probe connection
    controller pod TCP FLAG SEQ ACK
    controller:50538 ---> endpoint:9292 SYN 2707980036 0
    controller:50538 <--- endpoint:9292 SYN ACK 1599414185
    2707980037
    controller:50538 ---> endpoint:9292 ACK 2707980037
    1599414186
    controller:50538 ---> endpoint:9292 FIN ACK 2707980037
    1599414186
    controller:50538 <--- endpoint:9292 ACK 1599414186
    2707980038
    controller:50538 <--- endpoint:9292 FIN ACK 1599414186
    2707980038
    controller:50538 ---> endpoint:9292 ACK 2707980038
    1599414187

    And for the curl command connection with same port 50538: it will be
    like
    controller pod TCP FLAG SEQ ACK
    controller:50538 --> service:9292 SYN 2917708674 0
    controller:50538 --> endpoint:9292 SYN 2917708674 0
    controller:24479 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 <-- endpoint:9292 SYN ACK 2742336307
    2917708675
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK 2707980038
    1599414187
    controller:50538 --> service:9292 ACK(DROP) 2707980038
    1599414187

    The last ACK(controller:50538-->service:9292) SEQ and ACK is same as
    Probe TIME_WAIT latest ACK’s.
    from
    https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_ipv4.c#L2002 ,
    it only check (des ip , des port, src ip, and src port).Because this is
    not a correct SEQ/ACK , then it is set invalid and then dropped.

    If disable tcp_tw_reuse, the port nova-api will be always not same as
     pod probe using, then the issue should be gone.
    set back default(centos) ephemeral port range to avoid ephemeral port
    exhaustion .

    Closes-Bug: 1817936

    Change-Id: I9a39e3b439633ffb82ff05814935eba9ae892eda
    Signed-off-by: Sun Austin <email address hidden>

Ghada Khalil (gkhalil)
tags: added: in-r-stx20
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The following testing was done by Elio Martinez with an stx.2.0 designer build with this fix

  Sanity Horizon VM operation Ping 10 Hrs Script from bug
AIOX GREEN PASS PASS PASS PASS
MULTI GREEN PASS PASS PASS PASS

Important highlights:

1. All scenarios were executed in AIOX and MULTI (2+2) systems.
2. Sanity execution was successfully without any kind of problema (GREEN)
3. Horizon tests were executed as well (pause, suspend, unpause, créate, delete instances) .
4. VM operation was using Horizon as well, no problem detected.
5. Ping between instances were executed about at least 10 hours each configuration.
6. I decided to execute the script mentioned in https://bugs.launchpad.net/starlingx/+bug/1817936, no issues found using that script (executed 5 times since the reproducibility was intemittent).

Conclusion:

No problem was found with requested testing on http://dcp-dev.intel.com/pub/starlingx/stx-eng-build/270/outputs/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers