tor-agents and control-node crashed multiple times on scale setup (at __GI___open_catalog)

Bug #1431297 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
High
Hari Prasad Killi
R2.1
New
Undecided
Hari Prasad Killi

Bug Description

R2.1 Build 40 Ubuntu 14.04 Multi-node icehouse

nodei38 has 128 tor-agents (110 tors ) , 11K vmis, 100 vmis per tor

On supervisor-vrouter restart, it was seen that multiple tor-agents crashed repeatedly.. for about 30 mins and stopped.

Later restarts of supervisor-vrouter did not hit this

Few Cores will be in http://10.204.216.50/Docs/bugs/#

bt :

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-56.c'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f3216078bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>,
    nlspath=<optimized out>, env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65)
    at open_catalog.c:151
151 open_catalog.c: No such file or directory.
(gdb) bt
#0 0x00007f3216078bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>,
    nlspath=<optimized out>, env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65)
    at open_catalog.c:151
#1 0x6e672d78756e696c in ?? ()
#2 0x2f766e6f63672f75 in ?? ()

Revision history for this message
chhandak (chhandak) wrote :

Also seen in R2.1 Build 49

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-1.co'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007ff6c3d51bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>,
    nlspath=<optimized out>, env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65)
    at open_catalog.c:151
151 open_catalog.c: No such file or directory.
(gdb) bt
#0 0x00007ff6c3d51bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>,
    nlspath=<optimized out>, env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65)
    at open_catalog.c:151
#1 0x6e672d78756e696c in ?? ()
#2 0x2f766e6f63672f75 in ?? ()
#3 0x6f6d2d766e6f6367 in ?? ()

Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

Similar control-node crashes were also seen on R2.1 44, Core will be in http://10.204.216.50/Docs/bugs/#

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1277f19bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>, nlspath=<optimized out>,
    env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65) at open_catalog.c:151
151 open_catalog.c: No such file or directory.
(gdb) bt
#0 0x00007f1277f19bb9 in __GI___open_catalog (cat_name=0x20746e756f635f5f <error: Cannot access memory at address 0x20746e756f635f5f>, nlspath=<optimized out>,
    env_var=0x203d3c2029372026 <error: Cannot access memory at address 0x203d3c2029372026>, catalog=0x70616d2820666f65) at open_catalog.c:151
#1 0x6e672d78756e696c in ?? ()
#2 0x2f766e6f63672f75 in ?? ()
#3 0x6f6d2d766e6f6367 in ?? ()
#4 0x61632e73656c7564 in ?? ()
#5 0x0000000000656863 in ?? ()
#6 0x657a6973202f2029 in ?? ()
#7 0x5f6c6e5f2820666f in ?? ()
#8 0x79745f65756c6176 in ?? ()
#9 0x554e5f434c5f6570 in ?? ()
#10 0x5d305b434952454d in ?? ()
#11 0x0000000000002929 in ?? ()
#12 0x7328203c20746e63 in ?? ()

summary: - tor-agents crashed multiple times on scale setup (at
+ tor-agents and control-node crashed multiple times on scale setup (at
__GI___open_catalog)
tags: added: bms contrail-control scale
Revision history for this message
Hari Prasad Killi (haripk) wrote :

The initial tor-agent core file gave the following backtrace when checked with the correct image:

#0 0x00007fb959675bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fb959678fc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fb95966ea76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fb95966eb22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00000000008ce33a in OVSDB::OvsdbDBEntry::NotifyAdd (this=<optimized out>, row=<optimized out>)
    at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_entry.cc:135
#5 0x00000000008dafe6 in OVSDB::VrfOvsdbObject::OvsdbRouteNotify (this=0x7fb94c079d20, op=OVSDB::OvsdbClientIdl::OVSDB_ADD, row=0x7fb9440266c0)
    at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/unicast_mac_remote_ovsdb.cc:376
#6 0x00000000008eb302 in ovsdb_idl_insert_row ()
#7 0x00000000008ea57e in ovsdb_idl_process_update ()
#8 0x00000000008ea35a in ovsdb_idl_parse_update__ ()
#9 0x00000000008e9fe1 in ovsdb_idl_parse_update ()
#10 0x00000000008e984e in ovsdb_idl_msg_process ()
#11 0x00000000008ca604 in OVSDB::OvsdbClientIdl::ProcessMessage (this=<optimized out>, msg=0x7fb910005e60)
    at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_idl.cc:230
#12 0x00000000008cda8a in operator() (a0=0x7fb910005e60, this=0x7fb952cfaa90) at /usr/include/boost/function/function_template.hpp:767
#13 RunQueue (this=0x7fb9100464f0) at controller/src/base/queue_task.h:53
#14 QueueTaskRunner<OVSDB::OvsdbClientIdl::OvsdbMsg*, WorkQueue<OVSDB::OvsdbClientIdl::OvsdbMsg*> >::Run (this=0x7fb9100464f0)
    at controller/src/base/queue_task.h:36

This issue is same as https://bugs.launchpad.net/juniperopenstack/+bug/1426513 and has been resolved subsequently.

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Control node backtrace is as follows. Same as bug 1430091.

#0 0x00007f1277f19bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f1277f1cfc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1277f12a76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f1277f12b22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000045a996 in IFMapExporter::StateUpdateOnDequeue (this=0x16a8d90, update=update@entry=0x7f11e8bdd2f0, dequeue_set=...,
    is_delete=<optimized out>) at controller/src/ifmap/ifmap_exporter.cc:548
#5 0x0000000000489002 in IFMapUpdateSender::ProcessUpdate (this=this@entry=0x16aa090, update=update@entry=0x7f11e8bdd2f0, base_send_set=...)
    at controller/src/ifmap/ifmap_update_sender.cc:225
#6 0x0000000000489544 in IFMapUpdateSender::Send (this=0x16aa090, imarker=<optimized out>) at controller/src/ifmap/ifmap_update_sender.cc:184
#7 0x0000000000489c4b in IFMapUpdateSender::SendTask::Run (this=0x7f121db16fe0) at controller/src/ifmap/ifmap_update_sender.cc:41
#8 0x0000000000a5e930 in TaskImpl::execute (this=0x7f12716df940) at controller/src/base/task.cc:243

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.