[ubuntu-havana-R1.10-#34] control-node cored at RoutePathReplicator::DeleteSecondaryPath

Bug #1375226 reported by Prashant Shetty
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R1.1
Fix Released
High
Prakash Bailkeri
Trunk
Fix Released
High
Prakash Bailkeri

Bug Description

Have upgraded ubuntu havana system from R1.05 to R1.10. Upgrade went through fine.
After few restarts of supervisor-config and control-node process restarts, we are seeing multiple control-node cores.

Can someone look into?

Copied the logs and cores at http://mayamruga.englab.juniper.net/bugs/<bug-ID>

Setup:

host1 = 'root@10.204.217.118'
host2 = 'root@10.204.217.119'
host3 = 'root@10.204.217.120'
host4 = 'root@10.204.217.121'
host5 = 'root@10.204.217.122'
host6 = 'root@10.204.217.128'
host7 = 'root@10.204.217.129'
host8 = 'root@10.204.217.130'
host9 = 'root@10.204.217.131'
host10 = 'root@10.204.217.132'

env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6, host7, host8, host9, host10],
    'cfgm': [host1, host2, host3],
    'openstack': [host2],
    'webui': [host3],
    'control': [host1, host3],
    'compute': [host4, host5, host6, host7, host8, host9, host10],
    'collector': [host1, host3],
    'database': [host1, host2, host3],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei6', 'nodei7', 'nodei8', 'nodei9', 'nodei10', 'nodei16', 'nodei17', 'nodei18', 'nodei19', 'nodei20']
}

Crash-Decode:

Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal 6, Aborted.
#0 0x00007ff1cd5da425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ff1cd5da425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ff1cd5ddb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ff1cd5d30ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ff1cd5d3192 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000074c593 in RoutePathReplicator::DeleteSecondaryPath (this=0x2885830, table=0x7ff1a010b620, rt=0x7ff15801f3a0, rtinfo=...)
    at controller/src/bgp/routing-instance/routepath_replicator.cc:536
#5 0x000000000074f905 in RoutePathReplicator::DBStateSync (this=0x2885830, table=0x7ff1a010b620, rt=0x7ff15801f3a0, id=0, dbstate=0x7ff1b4005c20, current=...)
    at controller/src/bgp/routing-instance/routepath_replicator.cc:326
#6 0x000000000074ffc5 in RoutePathReplicator::BgpTableListener (this=0x2885830, root=<optimized out>, entry=0x7ff15801f3a0)
    at controller/src/bgp/routing-instance/routepath_replicator.cc:513
#7 0x00000000009bbe72 in operator() (a1=0x7ff15801f3a0, a0=0x7ff1a010cd20, this=0x7ff197ffebb0) at build/include/boost/function/function_template.hpp:763
#8 DBTableBase::ListenerInfo::RunNotify (this=0x7ff1a010b6f0, tpart=0x7ff1a010cd20, entry=0x7ff15801f3a0) at controller/src/db/db_table.cc:85
#9 0x00000000009bd01a in DBTablePartBase::RunNotify (this=0x7ff1a010cd20) at controller/src/db/db_table_partition.cc:37
#10 0x00000000009ba2fb in DBPartition::QueueRunner::Run (this=0x7ff16414f0f0) at controller/src/db/db_partition.cc:178
#11 0x00000000009fccc0 in TaskImpl::execute (this=0x7ff1640169c0) at controller/src/base/task.cc:224
#12 0x00007ff1ce83cece in ?? () from /usr/lib/libtbb_debug.so.2
#13 0x00007ff1ce833e0b in ?? () from /usr/lib/libtbb_debug.so.2
#14 0x00007ff1ce8326f2 in ?? () from /usr/lib/libtbb_debug.so.2
#15 0x00007ff1ce82d3ce in ?? () from /usr/lib/libtbb_debug.so.2
#16 0x00007ff1ce82d270 in ?? () from /usr/lib/libtbb_debug.so.2
#17 0x00007ff1ce384e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007ff1cd697ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()
(gdb)

tags: added: blocker
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/3667
Committed: http://github.org/Juniper/contrail-controller/commit/ecbf2b8a1f0a67903715d716668e62de326fe8bb
Submitter: Zuul
Branch: master

commit ecbf2b8a1f0a67903715d716668e62de326fe8bb
Author: Prakash Bailkeri <email address hidden>
Date: Sun Oct 12 23:41:07 2014 -0700

Fix concurrency issue in updating DBEntry flags

Fixes bug: #1375226

Cause:
DBEntry flag is updated from two mutually non-exclusive task. This results in corrupting the flags and unexpected free/delete on entry.
The problem is caused while setting "OnRemoveQ" flag on DBEntry as it can be done from any task context.

Fix:
Move OnRemoveQ out of DBEntry flags and make it as atomic bool variable.
Add assert to catch the case where route gets deleted with active paths
Add assert to catch the case where path is inserted to a deleted route
Added new test for repeated route update(from a agent with different nexthop) which discovered this bug.

Change-Id: I5df04f89f00959799a921e9db37388c0eb56334e

Changed in juniperopenstack:
assignee: nobody → Prakash Bailkeri (prakashmb)
status: New → Fix Committed
Nischal Sheth (nsheth)
Changed in juniperopenstack:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.