Agent connects to only one xmpp server even after node failover

Bug #1457243 reported by venu kolli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
High
Nipa
R2.1
Fix Committed
High
Nipa
R2.20
Fix Released
High
Nipa
Trunk
Fix Committed
High
Nipa

Bug Description

Agent is connecting to only one xmpp server even though discovery client list two xmpp-servers in its list.

Issue Observed on HA configured setup with 3 (control node + cfgm + Openstack ) and 2 computes .

On one of the compute , xmpp connection is established to only one xmpp-server

Please find the logs of sandesh discovery and controller @

/cs-shared/bugs/venu_bugs/xmpp_bug/

 Workaround is to restart controllers .

Tags: vrouter
venu kolli (vkolli)
Changed in juniperopenstack:
assignee: nobody → Nipa (nipak)
importance: Undecided → High
information type: Proprietary → Public
tags: added: xmpp
tags: added: vrouter
removed: xmpp
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10870
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10870
Committed: http://github.org/Juniper/contrail-controller/commit/7811f4eab136ed45751e4a04d6d7710a78109d8a
Submitter: Zuul
Branch: master

commit 7811f4eab136ed45751e4a04d6d7710a78109d8a
Author: Nipa Kumar <email address hidden>
Date: Tue May 26 14:50:42 2015 -0700

Agent: XMPP connection not attempted by agent

Agent marks connection to Xmpp-Server DOWN after several attempts and rediscovers controllers.
If the controller list sent by Discovery is already marked DOWN, agent does not honor the
controller connection to be applied. This happens in the time period where discovery has not
yet marked the controller down (marked when 3 heartbeats are missing) and sends the stale list.
Additionally as discovery client has a checksum so the callbacks are throttled at source and
hence the clients never get called when controllers are UP and the information is lost.

Solution is to honor reponse from discovery irrespective of the state of the publisher
(both Xmpp Server advertised by controller and dns daemon)

Test cases added.

Change-Id: I5c6695f04d3dd4a384f0ea7d0da912475966c6eb
Closes-Bug:1446463
Closes-Bug:1457243

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10929
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10929
Committed: http://github.org/Juniper/contrail-controller/commit/6d845c15ca2bd114ded81d4092aa134b929ec39e
Submitter: Zuul
Branch: R2.20

commit 6d845c15ca2bd114ded81d4092aa134b929ec39e
Author: Nipa Kumar <email address hidden>
Date: Tue May 26 14:50:42 2015 -0700

Agent: XMPP connection not attempted by agent

Agent marks connection to Xmpp-Server DOWN after several attempts and rediscovers controllers.
If the controller list sent by Discovery is already marked DOWN, agent does not honor the
controller connection to be applied. This happens in the time period where discovery has not
yet marked the controller down (marked when 3 heartbeats are missing) and sends the stale list.
Additionally as discovery client has a checksum so the callbacks are throttled at source and
hence the clients never get called when controllers are UP and the information is lost.

Solution is to honor reponse from discovery irrespective of the state of the publisher
(both Xmpp Server advertised by controller and dns daemon)

Test cases added.

Change-Id: I5c6695f04d3dd4a384f0ea7d0da912475966c6eb
Closes-Bug:1446463
Closes-Bug:1457243

Revision history for this message
venu kolli (vkolli) wrote :

Verified on R2.20 build 32

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.0

Review in progress for https://review.opencontrail.org/12160
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.1

Review in progress for https://review.opencontrail.org/12161
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12160
Committed: http://github.org/Juniper/contrail-controller/commit/65b5285125950d168dbc651354587725cc28d268
Submitter: Zuul
Branch: R2.0

commit 65b5285125950d168dbc651354587725cc28d268
Author: Nipa Kumar <email address hidden>
Date: Tue May 26 14:50:42 2015 -0700

Agent: XMPP connection not attempted by agent

Agent marks connection to Xmpp-Server DOWN after several attempts and rediscovers controllers.
If the controller list sent by Discovery is already marked DOWN, agent does not honor the
controller connection to be applied. This happens in the time period where discovery has not
yet marked the controller down (marked when 3 heartbeats are missing) and sends the stale list.
Additionally as discovery client has a checksum so the callbacks are throttled at source and
hence the clients never get called when controllers are UP and the information is lost.

Solution is to honor reponse from discovery irrespective of the state of the publisher
(both Xmpp Server advertised by controller and dns daemon)

Test cases added.

Change-Id: I5c6695f04d3dd4a384f0ea7d0da912475966c6eb
Closes-Bug:1457243

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/12161
Committed: http://github.org/Juniper/contrail-controller/commit/89232854f4847c2941252eb6de756e162a5a23cb
Submitter: Zuul
Branch: R2.1

commit 89232854f4847c2941252eb6de756e162a5a23cb
Author: Nipa Kumar <email address hidden>
Date: Tue May 26 14:50:42 2015 -0700

Agent: XMPP connection not attempted by agent

Agent marks connection to Xmpp-Server DOWN after several attempts and rediscovers controllers.
If the controller list sent by Discovery is already marked DOWN, agent does not honor the
controller connection to be applied. This happens in the time period where discovery has not
yet marked the controller down (marked when 3 heartbeats are missing) and sends the stale list.
Additionally as discovery client has a checksum so the callbacks are throttled at source and
hence the clients never get called when controllers are UP and the information is lost.

Solution is to honor reponse from discovery irrespective of the state of the publisher
(both Xmpp Server advertised by controller and dns daemon)

Test cases added.

Change-Id: I5c6695f04d3dd4a384f0ea7d0da912475966c6eb
Closes-Bug:1457243

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.