[k8s-R5.0] Multiple vrouter crashes observed during k8s sanity run.

Bug #1771170 reported by Pulkit Tandon on 2018-05-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
High
Nagendra E S
Trunk
Fix Released
High
Nagendra E S

Bug Description

Configuration:
K8s 1.9.2
ocata-5.0-53
Centos-7.4

Setup:
5 node setup.
1 Kube master. 3 Controller.
2 Agent+ K8s slaves

During k8s sanity run 28 agent crashes observed on each agent node.
All crashes have same backtrace

(gdb) bt full
#0 0x00000000014989f2 in std::_Rb_tree<boost::asio::ip::address, std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*>, std::_Select1st<std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*> >, std::less<boost::asio::ip::address>, std::allocator<std::pair<boost::asio::ip::address const, IgmpInfo::IgmpSubnetState*> > >::find(boost::asio::ip::address const&) ()
No symbol table info available.
#1 0x00000000014972c3 in IgmpProto::IncrSendStats(VmInterface const*, bool) ()
No symbol table info available.
#2 0x000000000149854d in IgmpProto::SendIgmpPacket(GmpIntf*, GmpPacket*) ()
No symbol table info available.
#3 0x000000000154137e in GmpProto::SendPacket(GmpIntf*, unsigned char*, unsigned int, boost::asio::ip::address) ()
No symbol table info available.
#4 0x00000000015413e2 in gmp_send_one_packet ()
No symbol table info available.
#5 0x00000000015421de in igmp_send_one_packet(gmp_intf_handle_*) ()
No symbol table info available.
#6 0x0000000001542248 in gmp_xmit_ready ()
No symbol table info available.
#7 0x00000000015425a6 in gmpp_start_xmit ()
No symbol table info available.
#8 0x000000000154d4df in gmpr_intf_query_timer_expiry ()
No symbol table info available.
#9 0x00000000015545dd in task_timer_callback(void*) ()
No symbol table info available.
#10 0x0000000000e94959 in Timer::TimerTask::Run() ()
No symbol table info available.
#11 0x0000000000e8c72f in TaskImpl::execute() ()
No symbol table info available.
#12 0x00007f9e25d9c8ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
No symbol table info available.
#13 0x00007f9e25d985b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
No symbol table info available.
#14 0x00007f9e25d97c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
No symbol table info available.
#15 0x00007f9e25d9567f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
No symbol table info available.
#16 0x00007f9e25d95879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
No symbol table info available.
#17 0x00007f9e25fb7e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#18 0x00007f9e2529034d in clone () from /lib64/libc.so.6
No symbol table info available.

Path for sanity report:
http://10.204.216.50/Docs/logs/5.0-53_2018_05_14_18_17_40_1526311817.36/junit-noframes.html

Pulkit Tandon (pulkitt) wrote :

Few core dumps are kept at following path:
bhushana@mayamruga:/home/bhushana/Documents/technical/bugs/1771170

Pulkit Tandon (pulkitt) on 2018-06-08
tags: removed: sanityblocker
Pulkit Tandon (pulkitt) wrote :

This issue was observed for 1/2 builds only.
Not observed this issue post that till date.
Hence lowering the priority and removing the sanity blocker tag

Review in progress for https://review.opencontrail.org/43979
Submitter: Nagendra E S (<email address hidden>)

Reviewed: https://review.opencontrail.org/43979
Committed: http://github.com/Juniper/contrail-controller/commit/dd2daebe1374f659e280f6f7915e8ca2548149fb
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit dd2daebe1374f659e280f6f7915e8ca2548149fb
Author: Nagendra E S <email address hidden>
Date: Wed Jun 20 09:31:16 2018 +0530

Fix for agent crash in IGMP for VHOST0 interface.

Putting check for ignoring vhost and when vm's vrf differs
from native vrf.

Change-Id: If8b22969e94f027676fd20d6f7006428e803d75c
Partial-Bug: #1771170

Review in progress for https://review.opencontrail.org/44112
Submitter: Nagendra E S (<email address hidden>)

Reviewed: https://review.opencontrail.org/44112
Committed: http://github.com/Juniper/contrail-controller/commit/eae0682b05560fb167d236a8b11154150aad319a
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit eae0682b05560fb167d236a8b11154150aad319a
Author: Nagendra E S <email address hidden>
Date: Wed Jun 20 09:31:16 2018 +0530

Fix for agent crash in IGMP for VHOST0 interface.

Putting check for ignoring vhost and when vm's vrf differs
from native vrf.

Change-Id: If8b22969e94f027676fd20d6f7006428e803d75c
Partial-Bug: #1771170

Pulkit Tandon (pulkitt) wrote :

Multiple sanity runs has happened after the fix and issue is not observed again.
Latest build where sanity as run are:
ocata-5.0-152
coat-master-196
hence closing the bug.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers