[Build R2.20.10 Juno] TOR Scale: collector crash @ void WorkQueue<QueueEntryT>::ProcessLowWaterMarks(size_t)

Bug #1453236 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Megh Bhatt
Trunk
Fix Committed
High
Megh Bhatt

Bug Description

Back trace
------------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f6aaa9fbcc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#0 0x00007f6aaa9fbcc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f6aaa9ff0d8 in __GI_abort () at abort.c:89
#2 0x00007f6aaa9f4b86 in __assert_fail_base (
    fmt=0x7f6aaab45830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x756bef "count <= wm_info.count_",
    file=file@entry=0x756c50 "controller/src/base/queue_task.h",
    line=line@entry=448,
    function=function@entry=0x7957a0 "void WorkQueue<QueueEntryT>::ProcessLowWaterMarks(size_t) [with QueueEntryT = CdbIf::CdbIfColList; size_t = long unsigned int]") at assert.c:92
#3 0x00007f6aaa9f4c32 in __GI___assert_fail (
    assertion=0x756bef "count <= wm_info.count_",
    file=0x756c50 "controller/src/base/queue_task.h", line=448,
    function=0x7957a0 "void WorkQueue<QueueEntryT>::ProcessLowWaterMarks(size_t) [with QueueEntryT = CdbIf::CdbIfColList; size_t = long unsigned int]")
    at assert.c:101
#4 0x000000000066d034 in ?? ()
#5 0x00000000006fe240 in ?? ()
#6 0x00007f6aabf86b3a in ?? () from /usr/lib/libtbb.so.2
#7 0x00007f6aabf82816 in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f6aabf81f4b in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f6aabf7e0ff in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f6aabf7e2f9 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f6aac1a2182 in start_thread (arg=0x7f6a9d7f5700)
    at pthread_create.c:312
#12 0x00007f6aaaabf47d in clone ()

    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Trigger
------------
Not sure about exact trigger but observed after restarting network service once.

Scale Config
-------------------
1k VN
16K LIF
32K VMI

Tags: analytics bms
Revision history for this message
chhandak (chhandak) wrote :

Logs saved at http://mayamruga.englab.juniper.net/bugs/1453236

Observed in Ubuntu 14.04

Raj Reddy (rajreddy)
Changed in juniperopenstack:
assignee: nobody → Megh Bhatt (meghb)
importance: Undecided → Medium
Megh Bhatt (meghb)
Changed in juniperopenstack:
milestone: none → r2.20-fcs
milestone: r2.20-fcs → none
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10578
Submitter: Megh Bhatt (<email address hidden>)

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10578
Committed: http://github.org/Juniper/contrail-controller/commit/8905174c55361f8dc5cf8caaa634d4229ae273d4
Submitter: Zuul
Branch: R2.20

commit 8905174c55361f8dc5cf8caaa634d4229ae273d4
Author: Megh Bhatt <email address hidden>
Date: Tue May 19 16:24:57 2015 -0700

In cases when cdb connection drops for a generator and it reconnects
we can end up in a situation where the cdb queue watermark info is
duplicated since the cdb queue watermark set is called multiple times
and this can cause issues when processing watermarks in WorkQueue.
Fix is to uniquify the watermarks when setting them in WorkQueue.
Closes-Bug: #1453236

Change-Id: I166a65c371b1815fc9fca43f8b56cc7e2d891a0c

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10879
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10879
Committed: http://github.org/Juniper/contrail-controller/commit/ff67ee2137eb9b81651b2981787a3aab44d2e057
Submitter: Zuul
Branch: master

commit ff67ee2137eb9b81651b2981787a3aab44d2e057
Author: Megh Bhatt <email address hidden>
Date: Tue May 19 16:24:57 2015 -0700

In cases when cdb connection drops for a generator and it reconnects
we can end up in a situation where the cdb queue watermark info is
duplicated since the cdb queue watermark set is called multiple times
and this can cause issues when processing watermarks in WorkQueue.
Fix is to uniquify the watermarks when setting them in WorkQueue.
Closes-Bug: #1453236

Change-Id: I166a65c371b1815fc9fca43f8b56cc7e2d891a0c
(cherry picked from commit 8905174c55361f8dc5cf8caaa634d4229ae273d4)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.