210K MAC : SSL connection between TOR Agent and QFX is flapping

Bug #1464312 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Prabhjot Singh Sethi
Trunk
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

Added 210K mac in one QFX. All this MAC is spread across 8 Physical Interface and 4000 LIF.

In this scenario connection between QFX and tor agent is flapping.

root@bng-contrail-qfx51-2> show ovsdb controller
VTEP controller information:
Controller IP address: 192.168.22.210
Controller protocol: ssl
Controller port: 4321
Controller connection: up
Controller seconds-since-connect: 19
Controller seconds-since-disconnect: 20
Controller last-eror: Broken pipe
Controller connection status: active

{master:0}
root@bng-contrail-qfx51-2> show ethernet-switching table | grep Contrail-| count
Count: 210000 lines

root@nodei6:~# netstat -anp | grep 4321
tcp 0 0 0.0.0.0:4321 0.0.0.0:* LISTEN 8467/haproxy
tcp 0 0 192.168.22.1:43217 192.168.22.210:5673 ESTABLISHED 31464/python
tcp 0 0 192.168.22.1:47390 192.168.22.4:4321 ESTABLISHED 8467/haproxy >>> connection between QFX and
tcp 0 0 192.168.22.1:43211 192.168.22.210:5673 ESTABLISHED 31436/python
tcp 0 0 192.168.22.210:4321 192.168.11.1:65408 ESTABLISHED 8467/haproxy
tcp 0 0 192.168.22.210:5673 192.168.22.1:43211 ESTABLISHED 8467/haproxy
tcp 0 0 192.168.22.210:5673 192.168.22.1:43217 ESTABLISHED 8467/haproxy
root@nodei6:~# date
Wed Jun 10 19:02:44 IST 2015
root@nodei6:~# netstat -anp | grep 4321
tcp 0 0 0.0.0.0:4321 0.0.0.0:* LISTEN 8467/haproxy
tcp 0 0 192.168.22.1:43217 192.168.22.210:5673 ESTABLISHED 31464/python >>> connection broken
tcp 0 0 192.168.22.1:43211 192.168.22.210:5673 ESTABLISHED 31436/python
tcp 0 0 192.168.22.210:4321 192.168.11.1:56585 ESTABLISHED 8467/haproxy
tcp 0 0 192.168.22.210:5673 192.168.22.1:43211 ESTABLISHED 8467/haproxy
tcp 0 0 192.168.22.210:5673 192.168.22.1:43217 ESTABLISHED 8467/haproxy
root@nodei6:~# date
Wed Jun 10 19:05:10 IST 2015

Revision history for this message
chhandak (chhandak) wrote :

Hit the problem even with 64K mac .

Increasing keep alive time to 5min and then did not see the issue.

# OVS keep alive timer interval in milliseconds
tor_keepalive_interval=300000

information type: Proprietary → Public
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Prabhjot Singh Sethi (prabhjot)
tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11546
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11547
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11546
Committed: http://github.org/Juniper/contrail-controller/commit/184b557c7f6cbee1578dabed4ce51a0678e2635e
Submitter: Zuul
Branch: master

commit 184b557c7f6cbee1578dabed4ce51a0678e2635e
Author: Prabhjot Singh Sethi <email address hidden>
Date: Fri Jun 12 14:46:31 2015 +0530

Fix KeepAlive Mechanism for partial packets

Issue:
------
In Scaled scenario, response to initial monitor request
which contains info about the whole data in OVSDB-server
spans over multiple chunked packets which keeps coming
for more than keepalive time, since we were accounting
packet receive in keepalive state machine only when the
parser completes receiving packet it doesn't give trigger
to state machine causing connection to close.

Fix:
----
trigger keepalive state on packet receive irrespective
of whether parser has got the end marker or not.

Closes-Bug: 1464312
Change-Id: Ibc7bdb200a0022a97cd1f25c16688f05f33bb718

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11547
Committed: http://github.org/Juniper/contrail-controller/commit/27fdd80d1e9f15cb7fd596c75112aa12ea5b81d7
Submitter: Zuul
Branch: R2.20

commit 27fdd80d1e9f15cb7fd596c75112aa12ea5b81d7
Author: Prabhjot Singh Sethi <email address hidden>
Date: Fri Jun 12 14:46:31 2015 +0530

Fix KeepAlive Mechanism for partial packets

Issue:
------
In Scaled scenario, response to initial monitor request
which contains info about the whole data in OVSDB-server
spans over multiple chunked packets which keeps coming
for more than keepalive time, since we were accounting
packet receive in keepalive state machine only when the
parser completes receiving packet it doesn't give trigger
to state machine causing connection to close.

Fix:
----
trigger keepalive state on packet receive irrespective
of whether parser has got the end marker or not.

Closes-Bug: 1464312
Change-Id: Ibc7bdb200a0022a97cd1f25c16688f05f33bb718
(cherry picked from commit 184b557c7f6cbee1578dabed4ce51a0678e2635e)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.