[R4.1 - Build 7 Newton][K8s]: VMIs created after k8s POD creation do not get associated with a VRF and vrouter crash observed. Sporadic occurrence. After ~15 minutes, system recovers

Bug #1735670 reported by Pulkit Tandon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
New
Critical
Hari Prasad Killi
Trunk
New
Critical
Hari Prasad Killi

Bug Description

R4.1 - Build 7 - Newton

HA K8s setup with control data interfaces provisioned.
3 controllers and 2 computes
1 Kube master and 2 Slaves

Description:
multiple sanity test cases failed on a check where we verify that creation of a POD, corresponding VMI object gets created at the agent and is in Active state

Observed that in random cases, the VMI object is in Inactive state because it do not get a vrf_name
After running manually, observed that router crash is happening at such time.

Back Trace:
(gdb) bt full
#0 0x00007f37409cc428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
        resultvar = 0
        pid = 26081
        selftid = 27864
#1 0x00007f37409ce02a in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x4, sa_sigaction = 0x4}, sa_mask = {__val = {139875329642223, 0, 139874527070272, 0, 139875331780608, 25784246, 411, 25870320, 139874527074752, 0,
              139875284268348, 139875285365264, 139875285378912, 0, 139875285365264, 25784246}}, sa_flags = 1131851776, sa_restorer = 0x1896fb6}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f37409c4bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x1896fb6 "0", file=file@entry=0x18ab618 "controller/src/vnsw/agent/oper/vrf.cc", line=line@entry=411,
    function=function@entry=0x18abff0 "bool VrfEntry::DeleteTimeout()") at assert.c:92
        str = 0x7f36fc0e7220 " \221\v\374\066\177"
        total = 4096
#3 0x00007f37409c4c82 in __GI___assert_fail (assertion=0x1896fb6 "0", file=0x18ab618 "controller/src/vnsw/agent/oper/vrf.cc", line=411, function=0x18abff0 "bool VrfEntry::DeleteTimeout()")
    at assert.c:101
No locals.
#4 0x0000000000d49f10 in VrfEntry::DeleteTimeout() ()
No symbol table info available.
#5 0x0000000001835f85 in Timer::TimerTask::Run() ()
No symbol table info available.
#6 0x000000000182b8bd in TaskImpl::execute() ()
No symbol table info available.
#7 0x00007f3741628fdd in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
No symbol table info available.
#8 0x00007f37416220dc in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
No symbol table info available.
#9 0x00007f3741620fd3 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
No symbol table info available.
#10 0x00007f374161ca91 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
No symbol table info available.
#11 0x00007f374161ccf9 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
No symbol table info available.
#12 0x00007f37418466ba in start_thread (arg=0x7f37137fd700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7f37137fd700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139874527074048, -5327299189382462843, 0, 139874803903823, 139874527074752, 0, 5224042953409938053, 5223932111829582469}, mask_was_saved = 0}},
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#13 0x00007f3740a9e3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Crash copied at following location:
mayamruga.englab.juniper.net
Path:
/home/bhushana/Documents/technical/bugs/core_vrouter_k8s
Core file name:
core.contrail-vroute.26081.testbed-1-vm2.1512114415

More logs can be found in following sanity run:
http://10.204.216.50/Docs/logs/4.1.0.0-7_2017_12_01_00_05_21/logs/

Pulkit Tandon (pulkitt)
information type: Proprietary → Public
summary: [R4.1 - Build 7 Newton][K8s]: VMIs created after k8s POD creation do not
- get associated with a VRF and router crash observed. Sporadic
+ get associated with a VRF and vrouter crash observed. Sporadic
occurrence. After ~15 minutes, system recovers
Pulkit Tandon (pulkitt)
tags: added: blocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.