tor-agent crash at KSyncEntry::KSyncState KSyncSM_DelAckWait

Bug #1518899 reported by Vedamurthy Joshi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Medium
Prabhjot Singh Sethi
R2.21.x
Fix Committed
Medium
Prabhjot Singh Sethi
Trunk
Fix Committed
Medium
Prabhjot Singh Sethi

Bug Description

R2.20 Ubuntu 14.04 Juno

Below tor-agent was seen on my testbed

Core will be in http://10.204.216.50/Docs/bugs/#

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-1.co'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f7d78e73cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f7d78e73cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f7d78e770d8 in __GI_abort () at abort.c:89
#2 0x00007f7d78e6cb86 in __assert_fail_base (fmt=0x7f7d78fbd830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xe64df5 "0",
    file=file@entry=0xead098 "controller/src/ksync/ksync_object.cc", line=line@entry=1067,
    function=function@entry=0xeae440 "KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)") at assert.c:92
#3 0x00007f7d78e6cc32 in __GI___assert_fail (assertion=0xe64df5 "0", file=0xead098 "controller/src/ksync/ksync_object.cc", line=1067,
    function=0xeae440 "KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)") at assert.c:101
#4 0x0000000000a96d82 in KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent) ()
#5 0x0000000000a98f25 in KSyncObject::NotifyEvent(KSyncEntry*, KSyncEntry::KSyncEvent) ()
#6 0x0000000000a9914d in KSyncObject::SafeNotifyEvent(KSyncEntry*, KSyncEntry::KSyncEvent) ()
#7 0x00000000009a39f8 in OVSDB::HaStaleDevVnTable::StaleClearTimerCb() ()
#8 0x0000000000e34f79 in Timer::TimerTask::Run() ()
#9 0x0000000000e2e970 in TaskImpl::execute() ()
#10 0x00007f7d79a42b3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f7d79a3e816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f7d79a3df4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f7d79a3a0ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f7d79a3a2f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f7d79c5e182 in start_thread (arg=0x7f7d714f6700) at pthread_create.c:312
#16 0x00007f7d78f3747d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Tags: bms vrouter
Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :

side effect of fix for bug-1503124

stale timer should have been cancelled at the time of Add/Change/Delete req itself instead of Ack event, which results in inappropriate state machine event

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/15347
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Changed in juniperopenstack:
status: New → In Progress
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/15358
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/15359
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/15347
Committed: http://github.org/Juniper/contrail-controller/commit/cd6bbe51d96d0ca31f48984a20946fb866004527
Submitter: Zuul
Branch: master

commit cd6bbe51d96d0ca31f48984a20946fb866004527
Author: Prabhjot Singh Sethi <email address hidden>
Date: Mon Nov 23 22:18:15 2015 +0530

fix ToR-agent crash for stale timer cb

Issue:
------
Stale entry timer triggered delete on a deleted entry
causing invalid state machine event, ideally when an
Add/Change/Delete event happens it should have removed
entry from the stale entry tree. however along with
last change for sync DB request from a workqueue this
removal of entry from the stale entry tree got moved to
workqueue context causing issue.

Fix:
----
Move Stopping of stale entry timer to Add/Change/Delete
request context instead of workqueue context

Change-Id: I820c25d0462e2459aa7f0d6a84aee5626e8da4f2
Closes-Bug: 1518899

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/15359
Committed: http://github.org/Juniper/contrail-controller/commit/cf701031e11fe69e422af10877073384ca9e0b22
Submitter: Zuul
Branch: R2.21.x

commit cf701031e11fe69e422af10877073384ca9e0b22
Author: Prabhjot Singh Sethi <email address hidden>
Date: Mon Nov 23 22:18:15 2015 +0530

fix ToR-agent crash for stale timer cb

Issue:
------
Stale entry timer triggered delete on a deleted entry
causing invalid state machine event, ideally when an
Add/Change/Delete event happens it should have removed
entry from the stale entry tree. however along with
last change for sync DB request from a workqueue this
removal of entry from the stale entry tree got moved to
workqueue context causing issue.

Fix:
----
Move Stopping of stale entry timer to Add/Change/Delete
request context instead of workqueue context

Change-Id: I820c25d0462e2459aa7f0d6a84aee5626e8da4f2
Closes-Bug: 1518899
(cherry picked from commit cd6bbe51d96d0ca31f48984a20946fb866004527)
(cherry picked from commit bb2c1200ef01ce92bc8e2377b0875ee1d3698b5b)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/15358
Committed: http://github.org/Juniper/contrail-controller/commit/bb2c1200ef01ce92bc8e2377b0875ee1d3698b5b
Submitter: Zuul
Branch: R2.20

commit bb2c1200ef01ce92bc8e2377b0875ee1d3698b5b
Author: Prabhjot Singh Sethi <email address hidden>
Date: Mon Nov 23 22:18:15 2015 +0530

fix ToR-agent crash for stale timer cb

Issue:
------
Stale entry timer triggered delete on a deleted entry
causing invalid state machine event, ideally when an
Add/Change/Delete event happens it should have removed
entry from the stale entry tree. however along with
last change for sync DB request from a workqueue this
removal of entry from the stale entry tree got moved to
workqueue context causing issue.

Fix:
----
Move Stopping of stale entry timer to Add/Change/Delete
request context instead of workqueue context

Change-Id: I820c25d0462e2459aa7f0d6a84aee5626e8da4f2
Closes-Bug: 1518899
(cherry picked from commit cd6bbe51d96d0ca31f48984a20946fb866004527)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.