[Service connections]: Control and agent connection goes unstable on editing entry in agent.conf

Bug #1682080 reported by Pulkit Tandon
282
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.21.x
Fix Committed
Critical
Nipa
R3.1
Fix Committed
Critical
Nipa
R3.2
Fix Committed
Critical
Nipa
R4.0
Fix Released
Critical
Nipa
Trunk
Fix Committed
Critical
Nipa

Bug Description

Build 3054
Mainline
R4.0

Testbed file attached.
Containerised setup with 3 agents and 3 control nodes.

Description:
Support agent A1 was connected to control C1 and C2.
In agent A1's agent.conf, [CONTROL-NODE] was having list of applicable control nodes as C1.C2 and C3.
I removed the entry of C1 from agent.conf and gave a SIGHUP.

Observation:
The agent and control connection start toggling.
eg.
root@nodem8:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (XMPP:control-node:10.10.10.5, XMPP:control-node:10.10.10.7 connection down)
contrail-vrouter-nodemgr active

root@nodem8:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (XMPP:control-node:10.10.10.6, XMPP:control-node:10.10.10.7 connection down)
contrail-vrouter-nodemgr active

root@nodem8:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (XMPP:control-node:10.10.10.5, XMPP:control-node:10.10.10.6 connection down)
contrail-vrouter-nodemgr active

Some connections logs:

2017-04-12 15:19:50.343 AgentXmppDiscoveryConnection: XMPP ReConfig Apply Server Ip index = 0 server = 10.10.10.7 5269 controller/src/vnsw/agent/controller/controller_init.cc 582
2017-04-12 15:19:50.343 AgentXmppDiscoveryConnection: XMPP ReConfig Server is NOT_READY, ignore index = 0 server = 10.10.10.7 controller/src/vnsw/agent/controller/controller_init.cc 595
2017-04-12 15:19:50.343 AgentXmppDiscoveryConnection: XMPP ReConfig Apply Server Ip index = 1 server = 10.10.10.5 5269 controller/src/vnsw/agent/controller/controller_init.cc 582
2017-04-12 15:19:50.343 AgentXmppDiscoveryConnection: ReSet Xmpp ReConfig Channel index = 0 server = 10.10.10.6 10.10.10.5 controller/src/vnsw/agent/controller/controller_init.cc 629
2017-04-12 15:19:50.344 AgentXmppDiscoveryConnection: XMPP Server is already present, ignore reconfig response index = 1 server = 10.10.10.7 controller/src/vnsw/agent/controller/controller_init.cc 159
2017-04-12 15:19:57.016 AgentXmppDiscoveryConnection: XMPP ReConfig Apply Server Ip index = 0 server = 10.10.10.7 5269 controller/src/vnsw/agent/controller/controller_init.cc 582
2017-04-12 15:19:57.016 AgentXmppDiscoveryConnection: XMPP ReConfig Server is NOT_READY, ignore index = 0 server = 10.10.10.7 controller/src/vnsw/agent/controller/controller_init.cc 595
2017-04-12 15:19:57.016 AgentXmppDiscoveryConnection: XMPP ReConfig Apply Server Ip index = 1 server = 10.10.10.6 5269 controller/src/vnsw/agent/controller/controller_init.cc 582
2017-04-12 15:19:57.016 AgentXmppDiscoveryConnection: ReSet Xmpp ReConfig Channel index = 0 server = 10.10.10.5 10.10.10.6 controller/src/vnsw/agent/controller/controller_init.cc 629
2017-04-12 15:19:57.017 AgentXmppDiscoveryConnection: XMPP Server is already present, ignore reconfig response index = 1 server = 10.10.10.7 controller/src/vnsw/agent/controller/controller_init.cc 159

Tags: vrouter
Revision history for this message
Pulkit Tandon (pulkitt) wrote :
Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Core kept at:
bhushana@mayamruga

Path:
/home/bhushana/Documents/technical/bugs/1682080

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Also note that the issue is reproducible but might require 3-4 attempts.

Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Manish Singh (manishs)
Nischal Sheth (nsheth)
tags: added: contrail-control
Nischal Sheth (nsheth)
tags: removed: contrail-control
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/31218
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/31219
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/31219
Committed: http://github.com/Juniper/contrail-controller/commit/8d834648089d16e70cf50724d90db202e16c8f16
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 8d834648089d16e70cf50724d90db202e16c8f16
Author: Manish <email address hidden>
Date: Tue May 9 14:34:55 2017 +0530

SIGHUP on agent makes its CN connectin unstable

Analysis (courtesy nipak):
The issue happens as we make-before-break I.e establish new Xmpp connections,
while a background task does the cleanup of routes etc., for the older
connections.

We apply changes only if the new set is different.

With the following config applied b2b with SIGHUP,
(Server-A, Server-B)
(Server-C, Server-D)
(Server-A, Server-E, Server-D)

While connection to server-A(older incarnation) is in the process of cleaning
up, agent will apply new config(server-A) and try a new connection towards
server-A.

Control-node sends TcpClose on both the new and old connections.
Agent tries to connect endlessly.

Solution:
On deleted channels, do unregister from xmpp as soon as it is moved out of
agent channel list. This ensures that old connection is cleaned up before new
one is tried. Also old channel remains in deleted list till some new channel
becomes stable.

Change-Id: I7305432f8ddc093fdcf5a7945b0ad70aa64caa67
Closes-bug: #1682080

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Tested on Build 2 - R4.0

Edited and issues SIGHUP 20 times.
The issue was not observed.

router remain stable and toggling did not happen.

Hence closing the bug

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/31218
Committed: http://github.com/Juniper/contrail-controller/commit/f10f57e5c30ac3156e4c7952554829b084a162e3
Submitter: Zuul (<email address hidden>)
Branch: master

commit f10f57e5c30ac3156e4c7952554829b084a162e3
Author: Manish <email address hidden>
Date: Tue May 9 14:34:55 2017 +0530

SIGHUP on agent makes its CN connectin unstable

Analysis (courtesy nipak):
The issue happens as we make-before-break I.e establish new Xmpp connections,
while a background task does the cleanup of routes etc., for the older
connections.

We apply changes only if the new set is different.

With the following config applied b2b with SIGHUP,
(Server-A, Server-B)
(Server-C, Server-D)
(Server-A, Server-E, Server-D)

While connection to server-A(older incarnation) is in the process of cleaning
up, agent will apply new config(server-A) and try a new connection towards
server-A.

Control-node sends TcpClose on both the new and old connections.
Agent tries to connect endlessly.

Solution:
On deleted channels, do unregister from xmpp as soon as it is moved out of
agent channel list. This ensures that old connection is cleaned up before new
one is tried. Also old channel remains in deleted list till some new channel
becomes stable.

Change-Id: I7305432f8ddc093fdcf5a7945b0ad70aa64caa67
Closes-bug: #1682080

information type: Proprietary → Private
information type: Private → Public Security
information type: Public Security → Proprietary
information type: Proprietary → Public Security
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/36976
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/36977
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/37171
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37171
Committed: http://github.com/Juniper/contrail-controller/commit/d256421d191ee3beb67aba4576dcc2c9332fc59e
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit d256421d191ee3beb67aba4576dcc2c9332fc59e
Author: Nipa Kumar <email address hidden>
Date: Fri Nov 3 14:51:50 2017 -0700

Cleanup of stale XMPP connections.

Ensure cleanup of older XMPP connections by halting the
state-machine and unregister any rx/tx calls.

Change-Id: Ifb64d896458cdcb1c8272a2224c357e339811ba3
Closes-Bug:#1682080

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/37324
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/37325
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37325
Committed: http://github.com/Juniper/contrail-controller/commit/28c5cabf6b57971fb220e12b8917d88afcccd0f6
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 28c5cabf6b57971fb220e12b8917d88afcccd0f6
Author: Nipa Kumar <email address hidden>
Date: Wed Nov 8 14:40:30 2017 -0800

Cleanup of stale XMPP connections.

Ensure cleanup of older XMPP connections by halting the
state-machine and unregister any rx/tx calls.

Change-Id: I954b74399d6b31137c600d7c76586a0eeea997a4
Closes-Bug: #1682080

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/41912
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/41912
Committed: http://github.com/Juniper/contrail-controller/commit/199afe8977416b79182be37c9058cb30163fcdaa
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 199afe8977416b79182be37c9058cb30163fcdaa
Author: Nipa Kumar <email address hidden>
Date: Wed Nov 8 14:40:30 2017 -0800

Cleanup of stale XMPP connections.

Ensure cleanup of older XMPP connections by halting the
state-machine and unregister any rx/tx calls.

Conflicts:
 src/vnsw/agent/controller/controller_peer.cc
 src/vnsw/agent/controller/controller_peer.h

Change-Id: I954b74399d6b31137c600d7c76586a0eeea997a4
Closes-Bug: #1682080

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.