2.21.1-B22:TOR-Agent deletes virtual network/virtual machine interfaces when one of the config node fails.

Bug #1567775 reported by Sandeep Sridhar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Hari Prasad Killi
R2.21.x
Fix Committed
High
Hari Prasad Killi
R2.22.x
Fix Committed
High
Hari Prasad Killi
R3.0
Fix Committed
High
Hari Prasad Killi
Trunk
Fix Committed
High
Hari Prasad Killi

Bug Description

Setup Details :

10.204.74.76, 10.204.74.78 and 10.204.74.79 are cfgm/control nodes. ( root/contrail )
10.204.74.64, 10.204.74.65 and 10.204.74.77 are compute nodes. ( root/contrail )
10.204.74.61 and 10.204.74.66 are TSN nodes in HA ( root/contrail )
10.204.74.229 and 10.204.74.230 are TOR Switches ( root/Juniper123 )

tor-agent in question was active on 74.61. The XMPP connections were with 74.78 and 74.79. The ifmap-server was with 74.76. On executing supervisorctl -c /etc/contrail/supervisord_config.conf restart all on 74.76, it seems like tor-agent deletes some interfaces/networks on the QFX. Please see the messages below :

contrail-control logs
================
2016-04-08 Fri 11:36:53:703.043 IST contrail78 [Thread 140089780348672, Pid 1302]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: short read SendPoll PollResponseWait ifsm::EvWriteSuccess controller/src/ifmap/client/ifmap_state_machine.cc 915
2016-04-08 Fri 11:36:56:769.092 IST contrail78 [Thread 140089763555072, Pid 1302]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Connection refused ServerResolve SsrcConnect ifsm::EvResolveSuccess controller/src/ifmap/client/ifmap_state_machine.cc 915
2016-04-08 Fri 11:37:00:799.925 IST contrail78 [Thread 140089771951872, Pid 1302]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Connection refused ServerResolve SsrcConnect ifsm::EvResolveSuccess controller/src/ifmap/client/ifmap_state_machine.cc 915
2016-04-08 Fri 11:37:15:247.579 IST contrail78 [Thread 140089767753472, Pid 1302]: BGP [SYS_WARN]: XmppPeerInstanceLog: XMPP Peer contrail78:10.204.74.61 in instance default-domain:admin:QFX5100-229_48_1401_vn:QFX5100-229_48_1401_vn : Bad inet address /32 controller/src/bgp/bgp_xmpp_channel.cc 1469
2016-04-08 Fri 11:37:16:947.487 IST contrail78 [Thread 140089763555072, Pid 1302]: BGP [SYS_WARN]: XmppPeerInstanceLog: XMPP Peer contrail78:10.204.74.61 in instance default-domain:admin:QFX5100-229_48_1169_vn:QFX5100-229_48_1169_vn : Bad inet address /32 controller/src/bgp/bgp_xmpp_channel.cc 1469

contrail-tor-agent logs (see converting to delete messages)
================================================
2016-04-08 Fri 11:37:09:979.065 IST contrail61 [Thread 140454342690560, Pid 10037]: ID-PERM not set for object <default-global-system-config:QFX5100-229:ge-0/0/48:ge-0/0/48.2003> Table <__ifmap__.logical_interface.0>. Converting to DELETE
2016-04-08 Fri 11:37:15:152.681 IST contrail61 [Thread 140454325896960, Pid 10037]: ID-PERM not set for object <default-domain:admin:QFX5100-229_48_1401_vn> Table <__ifmap__.virtual_network.0>. Converting to DELETE
2016-04-08 Fri 11:37:16:924.638 IST contrail61 [Thread 140454330095360, Pid 10037]: ID-PERM not set for object <default-domain:admin:QFX5100-229_48_1169_vn> Table <__ifmap__.virtual_network.0>. Converting to DELETE
2016-04-08 Fri 11:37:19:733.434 IST contrail61 [Thread 140454317500160, Pid 10037]: ID-PERM not set for object <default-domain:admin:db58fa22-b8a6-4c26-9112-f1c9facbd5b5> Table <__ifmap__.virtual_machine_interface.0>. Converting to DELETE
2016-04-08 Fri 11:37:21:331.304 IST contrail61 [Thread 140453752076032, Pid 10037]: ID-PERM not set for object <default-global-system-config:QFX5100-229:ge-0/0/48:ge-0/0/48.1207> Table <__ifmap__.logical_interface.0>. Converting to DELETE
2016-04-08 Fri 11:37:25:334.688 IST contrail61 [Thread 140453752076032, Pid 10037]: ID-PERM not set for object <default-domain:admin:QFX5100-229_48_1669_vn> Table <__ifmap__.virtual_network.0>. Converting to DELETE

I had set traceoptions flag to all on ovsdb.log. There are no interesting messages there.

I don't think this is expected as deleting interfaces on QFX will impact BMS traffic. Please investigate.

Greetings,
Sandeep.

Tags: vrouter
information type: Proprietary → Public
Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

The deleting messages could be generated when control node reconnects to other ifmap-server after ifmap-server is restarted on Config node.

Revision history for this message
Ashish Ranjan (aranjan-n) wrote :

@sandeep that is not expected. Control node doesn't accept any connection from vrouter until all config is downloaded from the if map.

Revision history for this message
Hari Prasad Killi (haripk) wrote :

vrouter-agent received the config node without ID_PERMS. This is being considered as a delete by the agent and hence the case above is seen. Agent should only ignore the nodes without ID_PERMS and should not consider it as delete.

tags: added: vrouter
Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Hi Hari - Thanks for the confirmation. In which release/build can we get a fix for this. This is a common scenario for many customers during upgrade as they would just upgrade the control/cfgm nodes first followed by upgrading the vRouters one by one on production systems. When all cfgm/control nodes are rebooted during the due course of upgrade, this scenario is bound to happen.

Greetings,
Sandeep.

Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19440
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/19442
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/19443
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/19444
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Hi Hari - The binary provided ( vrouter-agent ) fixed the issue. I repeated my tests locally with the binary but did not see "Convert to DELETE" messages in the tor-agent logs. This was tested on 2.21.2-33~juno.

- Sandeep.

Changed in juniperopenstack:
importance: Undecided → High
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19440
Committed: http://github.org/Juniper/contrail-controller/commit/f08ec1ecdf3619de5144621e07072a80d5025293
Submitter: Zuul
Branch: R3.0

commit f08ec1ecdf3619de5144621e07072a80d5025293
Author: Hari <email address hidden>
Date: Tue Apr 19 16:04:26 2016 +0530

Do not consider absence of id-perms as a trigger to delete config.

When a config node is received without id-perms, agent is deleting
the oper object. This could cause unwanted churn in the oper data
and can be avoided. Changing agent config processing to ignore
config witChanging agent config processing to ignore
such config.

Change-Id: Id646de8d63b3529699149fba0d99d751772b989d
closes-bug: #1567775

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/19510
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19444
Committed: http://github.org/Juniper/contrail-controller/commit/86b795ee753001b8d8179ca855b53bd5df0a92dd
Submitter: Zuul
Branch: R2.20

commit 86b795ee753001b8d8179ca855b53bd5df0a92dd
Author: Hari <email address hidden>
Date: Tue Apr 19 17:12:47 2016 +0530

Do not consider absence of id-perms as a trigger to delete config.

When a config node is received without id-perms, agent is deleting
the oper object. This could cause unwanted churn in the oper data
and can be avoided. Changing agent config processing to ignore
config witChanging agent config processing to ignore
such config.

Change-Id: I7fc2221a67080d007201d1d74a375835a56ae246
closes-bug: #1567775

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19442
Committed: http://github.org/Juniper/contrail-controller/commit/a1729f8d99751e9fc021885a87de0169b6dbe7c2
Submitter: Zuul
Branch: R2.21.x

commit a1729f8d99751e9fc021885a87de0169b6dbe7c2
Author: Hari <email address hidden>
Date: Tue Apr 19 17:06:52 2016 +0530

Do not consider absence of id-perms as a trigger to delete config.

When a config node is received without id-perms, agent is deleting
the oper object. This could cause unwanted churn in the oper data
and can be avoided. Changing agent config processing to ignore
config witChanging agent config processing to ignore
such config.

Change-Id: Iad7e027190ef1850038a50e70227239565aae3ba
closes-bug: #1567775

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19443
Committed: http://github.org/Juniper/contrail-controller/commit/cb857c2aba07f40fa6150490103e244676063fc4
Submitter: Zuul
Branch: R2.22.x

commit cb857c2aba07f40fa6150490103e244676063fc4
Author: Hari <email address hidden>
Date: Tue Apr 19 17:10:01 2016 +0530

Do not consider absence of id-perms as a trigger to delete config.

When a config node is received without id-perms, agent is deleting
the oper object. This could cause unwanted churn in the oper data
and can be avoided. Changing agent config processing to ignore
config witChanging agent config processing to ignore
such config.

Change-Id: Id646de8d63b3529699149fba0d99d751772b989d
closes-bug: #1567775

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19510
Committed: http://github.org/Juniper/contrail-controller/commit/3d6a918b4f933cad466fb79955b69ac225aba4d1
Submitter: Zuul
Branch: master

commit 3d6a918b4f933cad466fb79955b69ac225aba4d1
Author: Hari <email address hidden>
Date: Tue Apr 19 16:04:26 2016 +0530

Do not consider absence of id-perms as a trigger to delete config.

When a config node is received without id-perms, agent is deleting
the oper object. This could cause unwanted churn in the oper data
and can be avoided. Changing agent config processing to ignore
config witChanging agent config processing to ignore
such config.

Change-Id: Id646de8d63b3529699149fba0d99d751772b989d
closes-bug: #1567775
(cherry picked from commit f08ec1ecdf3619de5144621e07072a80d5025293)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.