[R5.0-Agent Crash]: Agent crash observed in k8s sanity run

Bug #1760960 reported by Pulkit Tandon
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
Critical
sangarshan p
Trunk
Fix Released
Critical
sangarshan p

Bug Description

Configuration:
K8s 1.9.2
contrail-5.0.0-50
Centos-7.4

Setup:
5 node setup.
2 Kube master. 3 Controller.
2 Agent+ K8s slaves

During k8s sanity run, agent crash observed.

(gdb) bt full
#0 0x00007f142b82c1f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f142b82d8e8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f142b825266 in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3 0x00007f142b825312 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4 0x0000000000c96d73 in NextHopTable::FreeInterfaceId(unsigned long) ()
No symbol table info available.
#5 0x0000000000c9d24f in NextHop::~NextHop() ()
No symbol table info available.
#6 0x0000000000cabc76 in CompositeNH::~CompositeNH() ()
No symbol table info available.
#7 0x0000000000ec5538 in DBTablePartition::Remove(DBEntryBase*) ()
No symbol table info available.
#8 0x0000000000ec0046 in DBPartition::QueueRunner::Run() ()
No symbol table info available.
#9 0x0000000000e9c1af in TaskImpl::execute() ()
No symbol table info available.
#10 0x00007f142c3fb8ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
No symbol table info available.
#11 0x00007f142c3f75b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
No symbol table info available.
#12 0x00007f142c3f6c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
No symbol table info available.
#13 0x00007f142c3f467f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
No symbol table info available.
#14 0x00007f142c3f4879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
No symbol table info available.
#15 0x00007f142c616e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#16 0x00007f142b8ef34d in clone () from /lib64/libc.so.6
No symbol table info available.

Pulkit Tandon (pulkitt)
Changed in juniperopenstack:
assignee: Mike (harp) → Hari Prasad Killi (haripk)
Jeba Paulaiyan (jebap)
tags: added: sanityblocker
Revision history for this message
alok kumar (kalok) wrote :
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/41528
Submitter: sangarshan p (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/41553
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/41553
Committed: http://github.com/Juniper/contrail-controller/commit/bae8746033b1c9bdf15242ead0d935097b624bff
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit bae8746033b1c9bdf15242ead0d935097b624bff
Author: sangarshp <email address hidden>
Date: Fri Apr 6 20:41:28 2018 +0530

same nh index is allocated for multiple composite nexthops

in current logic, nexthop is allocated from resource manager from
keys<nhid, label>derived from compositenhkey, this is done by iterating
component nh key list , populate <nhid, label> only if key is not NULL,
but this can create a scenario where two composite NH end up getting same index
for the following combination
composie NH1 : component NH keys NULL,B,C (one Key is NULL because of deletion of component nh)
composite NH2: component nh keys B,C
Since we are skipping NULL component keys, same set of keys are derived
from both composite NHs

Fix:
for inactive component NH, we are populating <nh_id, label> to 0,
applying the same logic for NULL key case as well to make it
unique key combination

Change-Id: I5a78b64cb346ae934746eb78bd6eb9f695de76dc
Closes-Bug: #1760960

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/41528
Committed: http://github.com/Juniper/contrail-controller/commit/fa48966f3327c0845c6577dcd4be2c36d354c952
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit fa48966f3327c0845c6577dcd4be2c36d354c952
Author: sangarshp <email address hidden>
Date: Fri Apr 6 20:41:28 2018 +0530

same nh index is allocated for multiple composite nexthops

in current logic, nexthop is allocated from resource manager from
keys<nhid, label>derived from compositenhkey, this is done by iterating
component nh key list , populate <nhid, label> only if key is not NULL,
but this can create a scenario where two composite NH end up getting same index
for the following combination
composie NH1 : component NH keys NULL,B,C (one Key is NULL because of deletion of component nh)
composite NH2: component nh keys B,C
Since we are skipping NULL component keys, same set of keys are derived
from both composite NHs

Fix:
for inactive component NH, we are populating <nh_id, label> to 0,
applying the same logic for NULL key case as well to make it
unique key combination

Change-Id: I5a78b64cb346ae934746eb78bd6eb9f695de76dc
Closes-Bug: #1760960

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

No crash observed on Trunk/R5.0 branch sanity runs
R5.0-12
trunk R5.1 - Build ID: 59-FB
Hence closing the bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.