Control Node crashing on HA setup

Bug #1484784 reported by Anoop Kumar Sahu
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
High
Tapan Karwa
R2.1
Won't Fix
High
Tapan Karwa
R2.20
Fix Committed
High
Tapan Karwa
Trunk
Fix Committed
High
Tapan Karwa

Bug Description

This is 3CN and 3 TSN setup. The cores files can be found at 192.168.61.1 (/var/www/html/pub)

root@Host1-CN1:~#
root@Host1-CN1:~#
root@Host1-CN1:~#
root@Host1-CN1:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 2.20-79 79
contrail-config 2.20-79 79
contrail-config-openstack 2.20-79 79
contrail-control 2.20-79 79
contrail-dns 2.20-79 79
contrail-f5 2.20-79 79
contrail-fabric-utils 2.20-79 79
contrail-heat 2.20-79 79
contrail-install-packages 2.20-79~icehouse 79
contrail-lib 2.20-79 79
contrail-nodemgr 2.20-79 79
contrail-nova-networkapi 2.20-79 79
contrail-nova-vif 2.20-79 79
contrail-openstack 2.20-79 79
contrail-openstack-analytics 2.20-79 79
contrail-openstack-config 2.20-79 79
contrail-openstack-control 2.20-79 79
contrail-openstack-dashboard 2.20-79 79
contrail-openstack-database 2.20-79 79
contrail-openstack-ha 2.20-79 79
contrail-openstack-vrouter 2.20-79 79
contrail-openstack-webui 2.20-79 79
contrail-setup 2.20-79 79
contrail-utils 2.20-79 79
contrail-vrouter-agent 2.20-79 79
contrail-vrouter-common 2.20-79 79
contrail-vrouter-dkms 2.20-79 79
contrail-vrouter-init 2.20-79 79
contrail-vrouter-utils 2.20-79 79
contrail-web-controller 2.20-79 79
contrail-web-core 2.20-79 79
 ifmap-python-client 0.1-2 79
ifmap-server 0.3.2-1contrail1 79
neutron-plugin-contrail 2.20-79 79
nova-api 1:2014.1.3-0ubuntu1~cloud0.3contrail79
 nova-common 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-compute 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-compute-kvm 1:2014.1.3-0ubuntu1~cloud0.3contrail79
 nova-compute-libvirt 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-conductor 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-console 1:2014.1.3-0ubuntu1~cloud0.3contrail79
 nova-consoleauth 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-novncproxy 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-objectstore 1:2014.1.3-0ubuntu1~cloud0.3contrail79
nova-scheduler 1:2014.1.3-0ubuntu1~cloud0.3contrail79
python-contrail 2.20-79 79
 python-contrail-vrouter-api 2.20-79 79
python-neutronclient 2:2.3.4-0ubuntu1.2contrail 79
python-nova 1:2014.1.3-0ubuntu1~cloud0.3contrail79
python-opencontrail-vrouter-netns 2.20-79 79
root@Host1-CN1:~# contrail-status
vRouter is NOT PRESENT

== Contrail vRouter ==
supervisor-vrouter: inactive (disabled on boot)
unix:///tmp/supervisord_vrouter.sockno

== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery:0 active
contrail-schema backup
contrail-svc-monitor active
ifmap active

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

========Run time service failures=============
/var/crashes/core.contrail-contro.2267.Host1-CN1.1439507792
========Run time service failures=============
/var/crashes/core.contrail-contro.2267.Host1-CN1.1439507792
root@Host1-CN1:~# gdb contrail-control /var/crashes/core.contrail-contro.2267.Host1-CN1.1439507792
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from contrail-control...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 5387]
[New LWP 5390]
[New LWP 5384]
[New LWP 5396]
[New LWP 5391]
[New LWP 5388]
[New LWP 5395]
[New LWP 5401]
[New LWP 5400]
[New LWP 5393]
[New LWP 5371]
[New LWP 5377]
[New LWP 5374]
[New LWP 2267]
[New LWP 5403]
[New LWP 5399]
[New LWP 5376]
[New LWP 5392]
[New LWP 5382]
[New LWP 5381]
[New LWP 5373]
[New LWP 5402]
[New LWP 5383]
[New LWP 5386]
[New LWP 5380]
[New LWP 5398]
[New LWP 5394]
[New LWP 5397]
[New LWP 5372]
[New LWP 5385]
[New LWP 5379]
[New LWP 5375]
[New LWP 5389]
[New LWP 5378]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fd5391a8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt full
#0 0x00007fd5391a8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 2267
        selftid = 5387
#1 0x00007fd5391ac0d8 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7fff6f055e1c, sa_sigaction = 0x7fff6f055e1c}, sa_mask = {
            __val = {140553764138268, 11498000, 559, 215838300, 140553762780387, 4294967296, 140553505167536,
              3425094040, 140553772651449, 140551750658120, 0, 0, 0, 21474836480, 140553809707008,
              140553764153392}}, sa_flags = 11497677, sa_restorer = 0xaf7b20}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007fd5391a1b86 in __assert_fail_base (fmt=0x7fd5392f2830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0xaf70cd "state->advertised().empty()",
    file=file@entry=0xaf7210 "controller/src/ifmap/ifmap_exporter.cc", line=line@entry=559,
    function=function@entry=0xaf7b20 "void IFMapExporter::StateUpdateOnDequeue(IFMapUpdate*, const BitSet&, bool)")
    at assert.c:92
        str = 0x7fd4c096ab40 "\340\303d\301\324\177"
        total = 4096
#3 0x00007fd5391a1c32 in __GI___assert_fail (assertion=0xaf70cd "state->advertised().empty()",
    file=0xaf7210 "controller/src/ifmap/ifmap_exporter.cc", line=559,
    function=0xaf7b20 "void IFMapExporter::StateUpdateOnDequeue(IFMapUpdate*, const BitSet&, bool)")
    at assert.c:101
No locals.
#4 0x000000000045f776 in ?? ()
No symbol table info available.
#5 0x0000000000491804 in ?? ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#6 0x000000000049204b in ?? ()
No symbol table info available.
#7 0x0000000000acd040 in ?? ()
No symbol table info available.
#8 0x00007fd539f7fb3a in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#9 0x00007fd539f7b816 in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#10 0x00007fd539f7af4b in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#11 0x00007fd539f770ff in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#12 0x00007fd539f772f9 in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#13 0x00007fd53a19b182 in start_thread (arg=0x7fd529bf6700) at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7fd529bf6700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140553505171200, 7524229258567665609, 0, 0, 140553505171904,
                140553505171200, -7511654688496367671, -7511693471273662519}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#14 0x00007fd53926c47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
---Type <return> to continue, or q <return> to quit---
No locals.
(gdb) quit
root@Host1-CN1:~# gdb /usr/bin/contrail-control /var/crashes/core.contrail-contro.2267.Host1-CN1.1439507792
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/contrail-control...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 5387]
[New LWP 5390]
[New LWP 5384]
[New LWP 5396]
[New LWP 5391]
[New LWP 5388]
[New LWP 5395]
[New LWP 5401]
[New LWP 5400]
[New LWP 5393]
[New LWP 5371]
[New LWP 5377]
[New LWP 5374]
[New LWP 2267]
[New LWP 5403]
[New LWP 5399]
[New LWP 5376]
[New LWP 5392]
[New LWP 5382]
[New LWP 5381]
[New LWP 5373]
[New LWP 5402]
[New LWP 5383]
[New LWP 5386]
[New LWP 5380]
[New LWP 5398]
[New LWP 5394]
[New LWP 5397]
[New LWP 5372]
[New LWP 5385]
[New LWP 5379]
[New LWP 5375]
[New LWP 5389]
[New LWP 5378]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fd5391a8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fd5391a8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fd5391ac0d8 in __GI_abort () at abort.c:89
#2 0x00007fd5391a1b86 in __assert_fail_base (fmt=0x7fd5392f2830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0xaf70cd "state->advertised().empty()", file=file@entry=0xaf7210 "controller/src/ifmap/ifmap_exporter.cc",
    line=line@entry=559, function=function@entry=0xaf7b20 "void IFMapExporter::StateUpdateOnDequeue(IFMapUpdate*, const BitSet&, bool)")
    at assert.c:92
#3 0x00007fd5391a1c32 in __GI___assert_fail (assertion=0xaf70cd "state->advertised().empty()",
    file=0xaf7210 "controller/src/ifmap/ifmap_exporter.cc", line=559,
    function=0xaf7b20 "void IFMapExporter::StateUpdateOnDequeue(IFMapUpdate*, const BitSet&, bool)") at assert.c:101
#4 0x000000000045f776 in ?? ()
#5 0x0000000000491804 in ?? ()
#6 0x000000000049204b in ?? ()
#7 0x0000000000acd040 in ?? ()
#8 0x00007fd539f7fb3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fd539f7b816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fd539f7af4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fd539f770ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fd539f772f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fd53a19b182 in start_thread (arg=0x7fd529bf6700) at pthread_create.c:312
#14 0x00007fd53926c47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) quit
root@Host1-CN1:~# ls -altr /var/crashes
total 1621292
drwxr-xr-x 14 root root 4096 Aug 12 09:45 ..
drwxrwxrwx 2 root root 4096 Aug 13 16:16 .
-rw------- 1 contrail contrail 1834303488 Aug 13 16:16 core.contrail-contro.2267.Host1-CN1.1439507792
root@Host1-CN1:~# Write failed: Broken pipe
anoops-mbp:~ anoops$

Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Nischal Sheth (nsheth)
tags: added: contrail-control
removed: qfx
Nischal Sheth (nsheth)
Changed in juniperopenstack:
assignee: Nischal Sheth (nsheth) → Tapan Karwa (tkarwa)
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

This looks same as Bug 1453369, it is still open

Revision history for this message
Tapan Karwa (tkarwa) wrote :

root@Host1-CN1:/var/crashes# more /etc/issue
Ubuntu 14.04.2 LTS \n \l

contrail-install-packages 2.20-79~icehouse 79

/github-build/R2.20/79/ubuntu-14-04/icehouse/store/sandbox

Revision history for this message
Tapan Karwa (tkarwa) wrote :

ftp 192.168.61.1 as anonymous and no password.
Then, cd pub to see all files.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13710
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13711
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.1

Review in progress for https://review.opencontrail.org/13716
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.0

Review in progress for https://review.opencontrail.org/13717
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13711
Committed: http://github.org/Juniper/contrail-controller/commit/8c331af5256b30ec7d4aadd041286099e8ee0580
Submitter: Zuul
Branch: R2.20

commit 8c331af5256b30ec7d4aadd041286099e8ee0580
Author: Tapan Karwa <email address hidden>
Date: Wed Sep 9 09:40:35 2015 -0700

Delete state only after processing a DELETE IFMapUpdate.

There can be cases when while processing an UPDATE-IFMapUpdate dequeue event,
we find that the state is ready to be deleted. We want to wait until we have
processed the 'DELETE' IFMapUpdate event to finally delete the state.

EG: A node add, link add with that node, link delete, node delete, exporter
processes node delete before link delete i.e. dependency is still set and
exporter does not enqueue a DELETE-UPDATE-IFMapUpdate. Now, if we dequeue
UPDATE-IFMapUpdate, the state is ready to be deleted. But, we want to wait
until exporter processes the link-delete and triggers a node-change via
MaybeNotifyOnLinkDelete() to delete the state.

Change-Id: I42ebbd5a9cff824c9732c52a773b6bc9ee36dc1a
Closes-Bug: #1484784

Nischal Sheth (nsheth)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/13710
Committed: http://github.org/Juniper/contrail-controller/commit/6621afcfda3795f429161c0554d870e1ec33b664
Submitter: Zuul
Branch: master

commit 6621afcfda3795f429161c0554d870e1ec33b664
Author: Tapan Karwa <email address hidden>
Date: Wed Sep 9 09:40:35 2015 -0700

Delete state only after processing a DELETE IFMapUpdate.

There can be cases when while processing an UPDATE-IFMapUpdate dequeue event,
we find that the state is ready to be deleted. We want to wait until we have
processed the 'DELETE' IFMapUpdate event to finally delete the state.

EG: A node add, link add with that node, link delete, node delete, exporter
processes node delete before link delete i.e. dependency is still set and
exporter does not enqueue a DELETE-UPDATE-IFMapUpdate. Now, if we dequeue
UPDATE-IFMapUpdate, the state is ready to be deleted. But, we want to wait
until exporter processes the link-delete and triggers a node-change via
MaybeNotifyOnLinkDelete() to delete the state.

Change-Id: I42ebbd5a9cff824c9732c52a773b6bc9ee36dc1a
Closes-Bug: #1484784

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.0

Review in progress for https://review.opencontrail.org/13717
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13717
Committed: http://github.org/Juniper/contrail-controller/commit/2d9567062f5131a1515820ee9d19f73e9718ce5e
Submitter: Zuul
Branch: R2.0

commit 2d9567062f5131a1515820ee9d19f73e9718ce5e
Author: Tapan Karwa <email address hidden>
Date: Wed Sep 9 09:40:35 2015 -0700

Delete state only after processing a DELETE IFMapUpdate.

There can be cases when while processing an UPDATE-IFMapUpdate dequeue event,
we find that the state is ready to be deleted. We want to wait until we have
processed the 'DELETE' IFMapUpdate event to finally delete the state.

EG: A node add, link add with that node, link delete, node delete, exporter
processes node delete before link delete i.e. dependency is still set and
exporter does not enqueue a DELETE-UPDATE-IFMapUpdate. Now, if we dequeue
UPDATE-IFMapUpdate, the state is ready to be deleted. But, we want to wait
until exporter processes the link-delete and triggers a node-change via
MaybeNotifyOnLinkDelete() to delete the state.

Change-Id: I42ebbd5a9cff824c9732c52a773b6bc9ee36dc1a
Closes-Bug: #1484784

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.