Contrail Multicloud GW (AWS) : frequent vrouter agent cores on the AWS MC-GW node

Bug #1795090 reported by vivekananda shenoy
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Committed
Critical
Pranavadatta DN
Trunk
In Progress
Critical
Pranavadatta DN

Bug Description

Sanju is very well aware of these cores. For reproducing or more info please get in touch with him.

If required to reproduce please get in touch with Sanju's team.

Following is the coredump:

warning: Could not load shared library symbols for 15 libraries, e.g. /lib64/libtcmalloc.so.4.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f48bc630d5b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /lib64/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libcurl-7.29.0-46.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-12.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-5.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007f48bc630d5b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /lib64/libstdc++.so.6
#1 0x0000000000d11df1 in VmInterfaceKey::VmInterfaceKey(AgentKey::DBSubOperation, boost::uuids::uuid const&, std::string const&) ()
#2 0x0000000000cd9e23 in RouteLeakState::AddReceiveRoute(AgentRoute const*) ()
#3 0x0000000000cdb6e5 in RouteLeakState::AddInterfaceRoute(AgentRoute const*, AgentPath const*) ()
#4 0x0000000000cdbad9 in RouteLeakState::AddCompositeRoute(AgentRoute const*) ()
#5 0x0000000000cdbdf7 in RouteLeakState::AddRoute(AgentRoute const*) ()
#6 0x0000000000cdc880 in RouteLeakVrfState::Notify(DBTablePartBase*, DBEntryBase*) ()
#7 0x0000000000ebb70a in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#8 0x0000000000ebe358 in DBTablePartBase::RunNotify() ()
#9 0x0000000000eb9e2e in DBPartition::QueueRunner::Run() ()
#10 0x0000000000e9608f in TaskImpl::execute() ()
#11 0x00007f48bc89a8ca in ?? ()
#12 0x01017f4800000000 in ?? ()
#13 0x00007f48b633bf40 in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x00007f48b5970c08 in ?? ()
#16 0x00007f48b633bf28 in ?? ()
#17 0x0000000000000000 in ?? ()
(gdb)

Revision history for this message
vivekananda shenoy (vshenoy83) wrote :

Core file is copied to /cs-shared/bugs/1795090 on the build server.

Jeba Paulaiyan (jebap)
tags: added: vrouter
Revision history for this message
vivekananda shenoy (vshenoy83) wrote :

Saw another core on one of the onprem (K8S) node and the stack trace looks similar. But there were no operations (add/remove workloads) done on this compute node through !

yuvarajamariappan [4:01 PM]
(gdb) bt
#0 0x00000000012173ba in NHKSyncEntry::Sync(DBEntry*) ()
#1 0x00000000015d6105 in KSyncDBObject::Notify(DBTablePartBase*, DBEntryBase*) ()
#2 0x0000000000ebb70a in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#3 0x0000000000ebe358 in DBTablePartBase::RunNotify() ()
#4 0x0000000000eb9e2e in DBPartition::QueueRunner::Run() ()
#5 0x0000000000e9608f in TaskImpl::execute() ()
#6 0x00007fca416618ca in ?? ()
#7 0x0101000000000022 in ?? ()
#8 0x00007fca3b0f3f40 in ?? ()
#9 0x0000000000000001 in ?? ()
#10 0x00007fca39b34c08 in ?? ()
#11 0x00007fca3b0f3f28 in ?? ()
#12 0x0000000000000000 in ?? ()

core file is copied to the same location as above.

information type: Proprietary → Public
Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/47288
Submitter: Pranavadatta DN (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/47288
Committed: http://github.com/Juniper/contrail-controller/commit/879df7e1c20f19adefcb69c7ed42d341117b3137
Submitter: Mithun Mistry (<email address hidden>)
Branch: R5.0

commit 879df7e1c20f19adefcb69c7ed42d341117b3137
Author: Pranavadatta D N <email address hidden>
Date: Thu Oct 25 19:16:30 2018 +0530

Added a defensive check in casting NH in AddReceiveRoute

AddReceiveRoute assumes the active path to have an NH with interface. This
resulted in agent coring when the active path pointed to composite NH. Adding a
defensive check to proceed only if NH is INTERFACE or RECEIVE.

Change-Id: I7c53a8bad56d004562c75ae9bdb9b2c47431f28c
Closes-Bug: #1795090

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/48681
Submitter: Pranavadatta DN (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/48681
Committed: http://github.com/Juniper/contrail-controller/commit/3eeb95e87bbce79db3853046129a255c39d5546a
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 3eeb95e87bbce79db3853046129a255c39d5546a
Author: Pranavadatta D N <email address hidden>
Date: Thu Oct 25 19:16:30 2018 +0530

Added a defensive check in casting NH in AddReceiveRoute

AddReceiveRoute assumes the active path to have an NH with interface. This
resulted in agent coring when the active path pointed to composite NH. Adding a
defensive check to proceed only if NH is INTERFACE or RECEIVE.

Change-Id: I7c53a8bad56d004562c75ae9bdb9b2c47431f28c
Closes-Bug: #1795090
(cherry picked from commit 879df7e1c20f19adefcb69c7ed42d341117b3137)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/48877
Submitter: Arun RS (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.