[R2.20.49] Agent Crash@ IFMapAgentLinkTable::EvalDefLink

Bug #1465190 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
Undecided
Divakar Dharanalakota
R2.20
Fix Committed
High
Divakar Dharanalakota
Trunk
Fix Committed
High
Divakar Dharanalakota

Bug Description

Observed the crash after rebooting 2 out of 3 config node was rebooted. Gallera cluster was broken during the process. Has to restart all mysql to restore it back

Backtrace
-----------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007ff6471fccc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ff6471fccc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ff6472000d8 in __GI_abort () at abort.c:89
#2 0x00007ff6471f5b86 in __assert_fail_base (fmt=0x7ff647346830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x10677d8 "link_def_map_.end() != right_defmap_it", file=file@entry=0x10677a8 "controller/src/ifmap/ifmap_agent_table.cc",
    line=line@entry=507, function=function@entry=0x1067ca0 "void IFMapAgentLinkTable::EvalDefLink(IFMapTable::RequestKey*)") at assert.c:92
#3 0x00007ff6471f5c32 in __GI___assert_fail (assertion=0x10677d8 "link_def_map_.end() != right_defmap_it",
    file=0x10677a8 "controller/src/ifmap/ifmap_agent_table.cc", line=507, function=0x1067ca0 "void IFMapAgentLinkTable::EvalDefLink(IFMapTable::RequestKey*)")
    at assert.c:101
#4 0x0000000000e5c6cb in IFMapAgentLinkTable::EvalDefLink(IFMapTable::RequestKey*) ()
#5 0x0000000000e5e013 in IFMapAgentTable::Input(DBTablePartition*, DBClient*, DBRequest*) ()
#6 0x0000000000e9b126 in DBPartition::QueueRunner::Run() ()
#7 0x0000000000f98d00 in TaskImpl::execute() ()
#8 0x00007ff647dcbb3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007ff647dc7816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007ff647dc6f4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007ff647dc30ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007ff647dc32f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007ff647fe7182 in start_thread (arg=0x7ff6397f5700) at pthread_create.c:312
#14 0x00007ff6472c047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Tags: bms scale vrouter
chhandak (chhandak)
summary: - [R2.20.43] Agent Crash@ IFMapAgentLinkTable::EvalDefLink
+ [R2.20.49] Agent Crash@ IFMapAgentLinkTable::EvalDefLink
Changed in juniperopenstack:
assignee: nobody → Divakar Dharanalakota (ddivakar)
milestone: none → r2.30-fcs
Revision history for this message
chhandak (chhandak) wrote :
Changed in juniperopenstack:
importance: Undecided → High
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11794
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11861
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.0

Review in progress for https://review.opencontrail.org/11862
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11861
Committed: http://github.org/Juniper/contrail-controller/commit/55debc26904da93ad276d4eb691f75356e81e90d
Submitter: Zuul
Branch: master

commit 55debc26904da93ad276d4eb691f75356e81e90d
Author: Divakar <email address hidden>
Date: Fri Jun 19 23:30:48 2015 +0530

Update IFNode seq number even for Delete operation

Lets say if Node A, Node B and link L1 exists between them with
each entity's seq number being 1. After Xmpp channel flap we generate
a new seq number for the subsequent config. After XMPP flap, if an
update is receive for Node B, its seq number would be 2 with A and L1
being at seq 1. If Node A's delete is received before L1 delete,
L1 would be pushed to Defer list. While pushing to defer list,
the seq number of defer entry A->B is taken as 2 (as B's seq is 2)
and B->A entry is taken as 1 (as A's seq is 1). This is leading to
dissimilar seq number for A->B and B->A. When subsequntly add of Node A
is received, while evaluating defer list, B->A is not considered as its
seq is old seq leading to deletion of defer enty A->B but not B->A. This
ends up in a single defer entry being present which is invalid.

As a fix, when Node A's delete is received, the seq number is of A is
updated before pushing its links to Defer list. While adding defer list,
the seq number of Defer entry is taken from IFMapLink rather from
end nodes. While accepting an update to a node in DB, and assert is
added to ensure that update's seq number is same or higher than existing
node. While evaluating Defer list, it is ensured that both direction
entry's are removed rather a single.

Change-Id: Ibb5a7469b80a596961baa3f11d52ee52ff9b1f10
closes-bug: #1465190

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11794
Committed: http://github.org/Juniper/contrail-controller/commit/ac464cda5b161e4d09cce2bfe3a781038efb9331
Submitter: Zuul
Branch: R2.20

commit ac464cda5b161e4d09cce2bfe3a781038efb9331
Author: Divakar <email address hidden>
Date: Thu Jun 18 21:26:51 2015 +0530

Update IFNode seq number even for Delete operation

Lets say if Node A, Node B and link L1 exists between them with
each entity's seq number being 1. After Xmpp channel flap we generate
a new seq number for the subsequent config. After XMPP flap, if an
update is receive for Node B, its seq number would be 2 with A and L1
being at seq 1. If Node A's delete is received before L1 delete,
L1 would be pushed to Defer list. While pushing to defer list,
the seq number of defer entry A->B is taken as 2 (as B's seq is 2)
and B->A entry is taken as 1 (as A's seq is 1). This is leading to
dissimilar seq number for A->B and B->A. When subsequntly add of Node A
is received, while evaluating defer list, B->A is not considered as its
seq is old seq leading to deletion of defer enty A->B but not B->A. This
ends up in a single defer entry being present which is invalid.

As a fix, when Node A's delete is received, the seq number is of A is
updated before pushing its links to Defer list. While adding defer list,
the seq number of Defer entry is taken from IFMapLink rather from
end nodes. While accepting an update to a node in DB, and assert is
added to ensure that update's seq number is same or higher than existing
node. While evaluating Defer list, it is ensured that both direction
entry's are removed rather a single.

Change-Id: I2cb8229304ac75fde8c9647e3567970725040559
closes-bug: #1465190

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11862
Committed: http://github.org/Juniper/contrail-controller/commit/6a134b01c15b562f519aabc1bff12d79779549b5
Submitter: Zuul
Branch: R2.0

commit 6a134b01c15b562f519aabc1bff12d79779549b5
Author: Divakar <email address hidden>
Date: Fri Jun 19 23:39:06 2015 +0530

Update IFNode seq number even for Delete operation

Lets say if Node A, Node B and link L1 exists between them with
each entity's seq number being 1. After Xmpp channel flap we generate
a new seq number for the subsequent config. After XMPP flap, if an
update is receive for Node B, its seq number would be 2 with A and L1
being at seq 1. If Node A's delete is received before L1 delete,
L1 would be pushed to Defer list. While pushing to defer list,
the seq number of defer entry A->B is taken as 2 (as B's seq is 2)
and B->A entry is taken as 1 (as A's seq is 1). This is leading to
dissimilar seq number for A->B and B->A. When subsequntly add of Node A
is received, while evaluating defer list, B->A is not considered as its
seq is old seq leading to deletion of defer enty A->B but not B->A. This
ends up in a single defer entry being present which is invalid.

As a fix, when Node A's delete is received, the seq number is of A is
updated before pushing its links to Defer list. While adding defer list,
the seq number of Defer entry is taken from IFMapLink rather from
end nodes. While accepting an update to a node in DB, and assert is
added to ensure that update's seq number is same or higher than existing
node. While evaluating Defer list, it is ensured that both direction
entry's are removed rather a single.

Change-Id: Ic9ff72a51f0aa683266792a147e6b0a7a8be7699
closes-bug: #1465190

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13927
Submitter: Vinay Vithal Mahuli (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.