vouter-agent crash at PhysicalDevice::Copy in tor-scale setup

Bug #1462605 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Praveen
Trunk
Fix Committed
High
Praveen

Bug Description

R2.20 Build 30 (with latest vrouter-agent and tor-agent) Ubuntu 14.04 Juno multi-node setup

Was seen on one of the tsn nodes.
Core will be in http://10.204.216.50/Docs/bugs/#

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fcccbdb3863 in std::string::size() const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0 0x00007fcccbdb3863 in std::string::size() const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x000000000136913d in std::operator==<char> (__lhs=..., __rhs=...) at /usr/include/c++/4.8/bits/basic_string.h:2495
#2 0x00000000013691b6 in std::operator!=<char, std::char_traits<char>, std::allocator<char> > (__lhs=..., __rhs=...) at /usr/include/c++/4.8/bits/basic_string.h:2534
#3 0x000000000145aad3 in PhysicalDevice::Copy (this=0x7fcc9a179d00, table=0x7fccac029510, data=0x7fcc9afdaaf0) at controller/src/vnsw/agent/oper/physical_device.cc:77
#4 0x000000000145aeac in PhysicalDeviceTable::OperDBAdd (this=0x7fccac029510, req=0x7fcc9afdab30) at controller/src/vnsw/agent/oper/physical_device.cc:137
#5 0x00000000013d68de in AgentOperDBTable::Add (this=0x7fccac029510, req=0x7fcc9afdab30) at controller/src/vnsw/agent/oper/oper_db.h:156
#6 0x0000000001b5db4d in DBTable::Input (this=0x7fccac029510, tbl_partition=0x7fccac029660, client=0x0, req=0x7fcc9afdab30) at controller/src/db/db_table.cc:314
#7 0x00000000014c401a in AgentDBTable::Input (this=0x7fccac029510, partition=0x7fccac029660, client=0x0, req=0x7fcc9afdab30) at controller/src/vnsw/agent/cmn/agent_db.cc:105
#8 0x0000000001b63656 in DBTablePartition::Process (this=0x7fccac029660, client=0x0, req=0x7fcc9afdab30) at controller/src/db/db_table_partition.cc:92
#9 0x0000000001b5a250 in DBPartition::QueueRunner::Run (this=0x7fcc9afdab50) at controller/src/db/db_partition.cc:187
#10 0x0000000001cb2d72 in TaskImpl::execute (this=0x7fccc4bea140) at controller/src/base/task.cc:232
#11 0x00007fcccc01eb3a in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fcccc01a816 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fcccc019f4b in ?? () from /usr/lib/libtbb.so.2
#14 0x00007fcccc0160ff in ?? () from /usr/lib/libtbb.so.2
#15 0x00007fcccc0162f9 in ?? () from /usr/lib/libtbb.so.2
#16 0x00007fcccc23a182 in start_thread (arg=0x7fccbdbf6700) at pthread_create.c:312
#17 0x00007fcccb51347d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

env.roledefs = {
    'all': [host2, host3, host4, host5, host6, host7, host8, host9],
    'cfgm': [host2, host3, host4],
    'openstack': [host2, host3, host4],
    'webui': [host3],
    'control': [host2, host3, host4],
    'compute': [host5, host6, host7, host8, host9],
    'collector': [host2, host3, host4],
    'database': [host2, host3, host4],
    'toragent': [host5, host6, host7, host9 ],
    'tsn': [host5, host6, host7,host9 ],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38', 'nodei28', 'nodei27', 'nodei30']
}

Tags: bms scale vrouter
Revision history for this message
Praveen (praveen-karadakal) wrote :

Multicast module enqueues ADD_CHANGE when physical-device is deleted. PhysicalDeviceTable::Add() routine is not handling data of type PhysicalDeviceTsnManagedData resulting in crash.

Fix: Multicast module must enqueue RESYNC instead of ADD_CHANGE

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11344
Submitter: Praveen K V (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11344
Committed: http://github.org/Juniper/contrail-controller/commit/754f067f2c3c4b57f7a5585a8921129589fe2c06
Submitter: Zuul
Branch: R2.20

commit 754f067f2c3c4b57f7a5585a8921129589fe2c06
Author: Praveen K V <email address hidden>
Date: Sun Jun 7 09:30:12 2015 +0530

Use RESYNC operation to update physical-device mastership

Bug:
When multicast module identifies change in physical-device mastership,
it enqueues a request with op=DB_ENTRY_ADD_CHANGE to change master_ flag
in physical-device entry. The method PhysicalDeviceTable::OperDBAdd
does not handle req.data to be of type PhysicalDeviceTsnManagedData
resulting in crash

Fix:
Multicast module must enqueue request with RESYNC operation so that the
request is ignored if physical-device entry is not present

Change-Id: If8b29c1c2a1df239e130f326643436e927486ae1
Closes-Bug: #1462605

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11583
Submitter: Praveen K V (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.