contrail-collector core tor qfx bms testing

Bug #1456853 reported by Megh Bhatt
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Megh Bhatt
R2.20
Fix Committed
High
Megh Bhatt

Bug Description

Gnats Link:

https://gnats.juniper.net/web/default/1087395-1

Please find details logs, crash files at /volume/labcores/PR/PR-1087395

05-19-2015

root@ubuntu-cn:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent active (Collector connection down)
contrail-vrouter-nodemgr EXITED

== Contrail Control ==
supervisor-control: active
contrail-control initializing (Collector connection down)
contrail-control-nodemgr initializing (NTP state unsynchronized.)
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: inactive
unix:///tmp/supervisord_analytics.sockno

== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing (Collector connection down)
contrail-config-nodemgr initializing (NTP state unsynchronized.)
contrail-device-manager initializing (Collector connection down)
contrail-discovery:0 active
contrail-schema initializing (Collector connection down)
contrail-svc-monitor initializing (Collector connection down)
ifmap active

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Database ==
supervisor-database: active
contrail-database active
contrail-database-nodemgr initializing (NTP state unsynchronized.)
kafka active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

========Run time service failures=============
/var/crashes/core.contrail-collec.3485.ubuntu-cn.1431587887
/var/crashes/core.contrail-contro.682.ubuntu-cn.1431716896
/var/crashes/core.contrail-collec.2201.ubuntu-cn.1431587885
/var/crashes/core.contrail-contro.2725.ubuntu-cn.1431929188
/var/crashes/core.contrail-collec.2768.ubuntu-cn.1431399756
/var/crashes/core.contrail-contro.2644.ubuntu-cn.1431597955
/var/crashes/core.contrail-contro.1079.ubuntu-cn.1431931116
/var/crashes/core.contrail-contro.6299.ubuntu-cn.1431996053

Revision history for this message
Megh Bhatt (meghb) wrote :
Download full text (14.7 KiB)

It seems that we cannot get the proper bt of the assert from the contrail-collector core - core.contrail-collec.2768.ubuntu-cn.1431399756

(gdb) info threads
  Id Target Id Frame
  9 Thread 0x7f55c1f5e700 (LWP 3226) 0x00007f55c83bcf45 in readdir64 () from /lib/x86_64-linux-gnu/libc.so.6
  8 Thread 0x7f55c235f700 (LWP 3220) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
  7 Thread 0x7f55c1b5d700 (LWP 3227) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
  6 Thread 0x7f55c2760700 (LWP 3219) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
  5 Thread 0x7f55c135b700 (LWP 3229) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
  4 Thread 0x7f55c2b61700 (LWP 3218) 0x00007f55c9ae43bd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
  3 Thread 0x7f55c175c700 (LWP 3228) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
  2 Thread 0x7f55c2f62700 (LWP 3217) 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
* 1 Thread 0x7f55cb41d100 (LWP 2768) 0x00007f55c8335cc9 in __open_catalog () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) thread apply all bt

Thread 9 (Thread 0x7f55c1f5e700 (LWP 3226)):
#0 0x00007f55c83bcf45 in readdir64 () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f55c83bcc96 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f55c1f5cd70 in ?? ()
#3 0x00007f55a80010a0 in ?? ()
#4 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f55c235f700 (LWP 3220)):
#0 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f55c1b5d700 (LWP 3227)):
#0 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f55c2760700 (LWP 3219)):
#0 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f55c135b700 (LWP 3229)):
#0 0x00007f55c83f43e9 in __vsyslog_chk () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f55c2b61700 (LWP 3218)):
#0 0x00007f55c9ae43bd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000073ee10 in read (__nbytes=16384, __buf=0x7f55c2b5b230, __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2 redisBufferRead (c=c@entry=0x7f55bc01a480) at build/third_party/hiredis/src/hiredis.c:1078
#3 0x000000000073f265 in redisGetReply (c=0x7f55bc01a480, reply=0x7f55c2b5f2a0) at build/third_party/hiredis/src/hiredis.c:1164
#4 0x000000000073f94f in __redisBlockForReply (c=0x7f55bc01a480) at build/third_party/hiredis/src/hiredis.c:1259
#5 redisvCommand (ap=0x7f55c2b5f2a8, format=<optimized out>, c=0x7f55bc01a480) at build/third_party/hiredis/src/hiredis.c:1269
#6 redisCommand (c=0x7f55bc01a480, format=<optimized out>) at build/third_party/hiredis/src/hiredis.c:1276
#7 0x00000000004e3d50 in RedisProcessorExec::SyncGetSeq (redis_ip=..., redis_port=<optimized out>, redis_password=..., source=..., node_type=..., module=..., instance_id=..., seqReply=...)...

Changed in juniperopenstack:
assignee: nobody → Megh Bhatt (meghb)
importance: Undecided → High
Revision history for this message
Megh Bhatt (meghb) wrote :

Anoop can you please upload the other contrail-collector core and also the /var/log/contrail-collector.log*

Changed in juniperopenstack:
assignee: Megh Bhatt (meghb) → Anoop Kumar Sahu (anoops)
milestone: none → r2.30-fcs
status: New → Incomplete
Revision history for this message
Megh Bhatt (meghb) wrote :
Download full text (6.8 KiB)

Provided access to the box.

root@ubuntu-cn:~# gdb /var/tmp/vizd /var/crashes/core.contrail-collec.2768.ubuntu-cn.1431399756
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /var/tmp/vizd...done.

warning: core file may not match specified executable file.
[New LWP 2768]
[New LWP 3217]
[New LWP 3228]
[New LWP 3218]
[New LWP 3229]
[New LWP 3219]
[New LWP 3227]
[New LWP 3220]
[New LWP 3226]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f55c8335cc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f55c8335cc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f55c83390d8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f55c832eb86 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f55c832ec32 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000529aea in OpServerProxy::OpServerImpl::toConnectCallbackProcess (this=<optimized out>, c=<optimized out>, r=<optimized out>, privdata=<optimized out>)
    at controller/src/analytics/OpServerProxy.cc:242
#5 0x00000000004d707a in operator() (a2=0x0, a1=0x2947050, a0=0x294b030, this=0x7fffe1727120) at /usr/include/boost/function/function_template.hpp:767
#6 RedisAsyncConnection::RAC_AsyncCmdCallback (c=0x294b030, r=0x2947050, privdata=0x0) at controller/src/analytics/redis_connection.cc:239
#7 0x00000000007423f3 in __redisRunCallback (cb=0x7fffe1727240, cb=0x7fffe1727240, reply=<optimized out>, ac=0x294b030) at build/third_party/hiredis/src/async.c:219
#8 redisProcessCallbacks (ac=0x294b030) at build/third_party/hiredis/src/async.c:417
#9 0x0000000000743819 in redisBoostClient::handle_read (this=0x294afe0, ec=...) at build/third_party/hiredis/hiredis-boostasio-adapter/boostasio.cpp:62
#10 0x0000000000743ee4 in call<boost::shared_ptr<redisBoostClient>, boost::system::error_code> (b1=<synthetic pointer>, u=..., this=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:156
#11 operator()<boost::shared_ptr<redisBoostClient> > (a1=..., u=..., this=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:171
#12 operator()<boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<const boost::system::error_code&, long unsigned int const&> > (a=<synthetic pointer>, f=...,
    this=<optimized out>) at /usr/include/boost/bind/bind.hpp...

Read more...

Megh Bhatt (meghb)
Changed in juniperopenstack:
status: Incomplete → New
assignee: Anoop Kumar Sahu (anoops) → Megh Bhatt (meghb)
tags: added: blocker
information type: Proprietary → Public
Revision history for this message
Sundaresan Rajangam (srajanga) wrote :

By default, snapshot is enabled in redis.conf

On a loaded setup, upon redis-server restart, loading of dump.rdb in memory takes more time and this causes the issue [Logs below]
Should we consider disabling the snapshot in redis.conf?

[5141] 21 May 10:44:18.907 # Server started, Redis version 2.8.4
[5141] 21 May 10:44:18.907 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[5141] 21 May 10:52:46.450 * DB loaded from disk: 507.543 seconds <<<<<<
.....
.....
[7348] 21 May 13:32:48.114 # Server started, Redis version 2.8.4
[7348] 21 May 13:32:48.114 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[7348] 21 May 13:36:36.040 * DB loaded from disk: 227.926 seconds <<<<<<

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/11012
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11012
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/532a93bdbae8929a5d35ed50d23287f0612a517d
Submitter: Zuul
Branch: R2.20

commit 532a93bdbae8929a5d35ed50d23287f0612a517d
Author: Megh Bhatt <email address hidden>
Date: Thu May 28 16:05:53 2015 -0700

Disable redis server persistence since that is not used by analytics
or webui and causes issues with slow redis bootup and subsequent
collector core. Refactor redis server setup code and create a
setup_redis_server_node task.
Closes-Bug: #1456853

Change-Id: Ida4e0374f3e8360e64cb3ac4ba6edadb57388523

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/11075
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11075
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/4d83d527fb9f0508769bf0ee1998c3504c524f6a
Submitter: Zuul
Branch: master

commit 4d83d527fb9f0508769bf0ee1998c3504c524f6a
Author: Megh Bhatt <email address hidden>
Date: Thu May 28 16:05:53 2015 -0700

Disable redis server persistence since that is not used by analytics
or webui and causes issues with slow redis bootup and subsequent
collector core. Refactor redis server setup code and create a
setup_redis_server_node task.
Closes-Bug: #1456853

Change-Id: Ida4e0374f3e8360e64cb3ac4ba6edadb57388523
(cherry picked from commit 532a93bdbae8929a5d35ed50d23287f0612a517d)

Changed in juniperopenstack:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.