[k8s-R5.0] Multiple vrouter crashes observed during k8s sanity run.

Bug #1771170 reported by Pulkit Tandon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
High
Nagendra E S
Trunk
Fix Released
High
Nagendra E S

Bug Description

Configuration:
K8s 1.9.2
ocata-5.0-53
Centos-7.4

Setup:
5 node setup.
1 Kube master. 3 Controller.
2 Agent+ K8s slaves

During k8s sanity run 28 agent crashes observed on each agent node.
All crashes have same backtrace

(gdb) bt full
#0 0x00000000014989f2 in std::_Rb_tree<boost::asio::ip::address, std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*>, std::_Select1st<std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*> >, std::less<boost::asio::ip::address>, std::allocator<std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*> > >::find(boost::asio::ip::address const&) ()
No symbol table info available.
#1 0x00000000014972c3 in IgmpProto::IncrSendStats(VmInterface const*, bool) ()
No symbol table info available.
#2 0x000000000149854d in IgmpProto::SendIgmpPacket(GmpIntf*, GmpPacket*) ()
No symbol table info available.
#3 0x000000000154137e in GmpProto::SendPacket(GmpIntf*, unsigned char*, unsigned int, boost::asio::ip::address) ()
No symbol table info available.
#4 0x00000000015413e2 in gmp_send_one_packet ()
No symbol table info available.
#5 0x00000000015421de in igmp_send_one_packet(gmp_intf_handle_*) ()
No symbol table info available.
#6 0x0000000001542248 in gmp_xmit_ready ()
No symbol table info available.
#7 0x00000000015425a6 in gmpp_start_xmit ()
No symbol table info available.
#8 0x000000000154d4df in gmpr_intf_query_timer_expiry ()
No symbol table info available.
#9 0x00000000015545dd in task_timer_callback(void*) ()
No symbol table info available.
#10 0x0000000000e94959 in Timer::TimerTask::Run() ()
No symbol table info available.
#11 0x0000000000e8c72f in TaskImpl::execute() ()
No symbol table info available.
#12 0x00007f9e25d9c8ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
No symbol table info available.
#13 0x00007f9e25d985b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
No symbol table info available.
#14 0x00007f9e25d97c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
No symbol table info available.
#15 0x00007f9e25d9567f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
No symbol table info available.
#16 0x00007f9e25d95879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
No symbol table info available.
#17 0x00007f9e25fb7e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#18 0x00007f9e2529034d in clone () from /lib64/libc.so.6
No symbol table info available.

Path for sanity report:
http://10.204.216.50/Docs/logs/5.0-53_2018_05_14_18_17_40_1526311817.36/junit-noframes.html

Tags: vrouter
Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Few core dumps are kept at following path:
bhushana@mayamruga:/home/bhushana/Documents/technical/bugs/1771170

Pulkit Tandon (pulkitt)
tags: removed: sanityblocker
Revision history for this message
Pulkit Tandon (pulkitt) wrote :

This issue was observed for 1/2 builds only.
Not observed this issue post that till date.
Hence lowering the priority and removing the sanity blocker tag

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/43979
Submitter: Nagendra E S (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/43979
Committed: http://github.com/Juniper/contrail-controller/commit/dd2daebe1374f659e280f6f7915e8ca2548149fb
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit dd2daebe1374f659e280f6f7915e8ca2548149fb
Author: Nagendra E S <email address hidden>
Date: Wed Jun 20 09:31:16 2018 +0530

Fix for agent crash in IGMP for VHOST0 interface.

Putting check for ignoring vhost and when vm's vrf differs
from native vrf.

Change-Id: If8b22969e94f027676fd20d6f7006428e803d75c
Partial-Bug: #1771170

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44112
Submitter: Nagendra E S (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/44112
Committed: http://github.com/Juniper/contrail-controller/commit/eae0682b05560fb167d236a8b11154150aad319a
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit eae0682b05560fb167d236a8b11154150aad319a
Author: Nagendra E S <email address hidden>
Date: Wed Jun 20 09:31:16 2018 +0530

Fix for agent crash in IGMP for VHOST0 interface.

Putting check for ignoring vhost and when vm's vrf differs
from native vrf.

Change-Id: If8b22969e94f027676fd20d6f7006428e803d75c
Partial-Bug: #1771170

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Multiple sanity runs has happened after the fix and issue is not observed again.
Latest build where sanity as run are:
ocata-5.0-152
coat-master-196
hence closing the bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.