collector crashes because it tries to delete generator that is not in redis

Bug #1670908 reported by Arvind
34
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Medium
Arvind
R3.0.3.x
Fix Committed
Medium
Arvind
R3.1
Fix Committed
Medium
Arvind
R3.2
Fix Committed
Medium
Arvind
Trunk
Fix Committed
Medium
Arvind

Bug Description

Received DisconnectSession from a generator, which results in DeleteUVEs call.
But redis is unable to locate the generator in the NGENERATORS and collector
crashed because of that.
The reason for redis not having the generator in its GENERATORS set is
because it adds to the set and then somehow the redis gets flushed.
The following was observed in the redis log:
root@a6s9:/tmp/var/crashes# grep -r "contrail-database-nodemgr\|Flush" /tmp/redis-server.log.1
[7662] 02 Mar 11:58:12.264 * GetSeq for a6s1:Database:contrail-database-nodemgr:0
[7662] 02 Mar 11:58:12.264 * GetSeq for a6s1:Database:contrail-database-nodemgr:0 done
[7662] 02 Mar 11:58:12.265 * WARNING: Flushing Redis UVE DB
[7662] 02 Mar 11:58:12.265 * WARNING: Flushing Redis UVE DB done
[7662] 02 Mar 11:58:12.273 * DelRequest for a6s1:Database:contrail-database-nodemgr:0

Here is the collector BT, It was observed in mainline build 3043.

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f59be6ebc37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f59be6ebc37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f59be6ef028 in __GI_abort () at abort.c:89
#2 0x00007f59be6e4bf6 in __assert_fail_base (
    fmt=0x7f59be8353b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f3eee "0",
    file=file@entry=0x813318 "controller/src/analytics/redis_processor_vizd.cc",
    line=line@entry=254,
    function=function@entry=0x8133e0 <RedisProcessorExec::SyncDeleteUVEs(std::string const&, unsigned short, std::string const&, std::string const&, std::string const&, std::string const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >&)::__PRETTY_FUNCTION__> "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:92
#3 0x00007f59be6e4ca2 in __GI___assert_fail (assertion=0x7f3eee "0",
    file=0x813318 "controller/src/analytics/redis_processor_vizd.cc", line=254,
    function=0x8133e0 <RedisProcessorExec::SyncDeleteUVEs(std::string const&, unsigned short, std::string const&, std::string const&, std::string const&, std::string const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >&)::__PRETTY_FUNCTION__> "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:101
#4 0x00000000005ba55d in RedisProcessorExec::SyncDeleteUVEs (redis_ip=...,
    redis_port=<optimized out>, redis_password="", source="a6s1", node_type="Database",
    module="contrail-database-nodemgr", instance_id="0",
    delReply=std::vector of length 0, capacity 0)
    at controller/src/analytics/redis_processor_vizd.cc:254
#5 0x0000000000618dfd in OpServerProxy::DeleteUVEs (this=0x141bf40, source="a6s1",
    module="contrail-database-nodemgr", node_type="Database", instance_id="0")
    at controller/src/analytics/OpServerProxy.cc:985
#6 0x00000000005ad1e1 in SandeshGenerator::DisconnectSession (
    this=this@entry=0x7f593c000fe0, vsession=vsession@entry=0x1437cd0)
---Type <return> to continue, or q <return> to quit---
    at controller/src/analytics/generator.cc:157
#7 0x000000000059ca7f in Collector::DisconnectSession (this=<optimized out>,
    session=<optimized out>) at controller/src/analytics/collector.cc:286
#8 0x00000000007545dd in SandeshServerConnection::ProcessDisconnect (this=0x14c2780, sess=
    0x1437cd0) at tools/sandesh/library/cpp/sandesh_connection.cc:203
#9 0x0000000000751200 in ssm::Established::react (this=this@entry=0x7f593c000990, event=...)
    at tools/sandesh/library/cpp/sandesh_state_machine.cc:348

Arvind (arvindv)
Changed in juniperopenstack:
importance: Undecided → Medium
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29420
Submitter: Arvind (<email address hidden>)

Revision history for this message
Sarath (nsarath) wrote :

_____________________________________________
From: Sarathbabu Narasimhan
Sent: Friday, March 10, 2017 12:34 PM
To: Arvind Viswanathan <email address hidden>
Cc: Sarathbabu Narasimhan <email address hidden>; Jeba Paulaiyan <email address hidden>
Subject: Bug #1670908 : collector crashes because it tries to delete generator that is not in redis

Hi Arvind,

I do observe the similar crash on latest 4.0.0.0-3045 on Vcenter-as-compute setup but there are few differences too..
This is non HA setup.
Please find below the tracebacks and let me know if this needs tracking separate bug. Also setup is in problem state and if required please access “10.84.13.32”

[New LWP 17731]
[New LWP 19877]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1678307c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f1678307c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f167830b028 in __GI_abort () at abort.c:89
#2 0x00007f1678300bf6 in __assert_fail_base (fmt=0x7f16784513b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x80346e "0",
    file=file@entry=0x823018 "controller/src/analytics/redis_processor_vizd.cc", line=line@entry=254,
    function=function@entry=0x8230e0 "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:92
#3 0x00007f1678300ca2 in __GI___assert_fail (assertion=0x80346e "0", file=0x823018 "controller/src/analytics/redis_processor_vizd.cc", line=254,
    function=0x8230e0 "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:101
#4 0x00000000005c576d in ?? ()
#5 0x000000000062504d in ?? ()
#6 0x00000000005b83f1 in ?? ()
#7 0x00000000005a7d9f in ?? ()
#8 0x000000000076364d in ?? ()
#9 0x0000000000760270 in ?? ()
#10 0x0000000000760598 in ?? ()
#11 0x000000000076007b in ?? ()
#12 0x0000000000758af5 in ?? ()
#13 0x000000000075f515 in ?? ()
#14 0x0000000000474d7f in ?? ()
#15 0x00007f167988fb3a in ?? () from /usr/lib/libtbb.so.2
#16 0x00007f167988b816 in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f167988af4b in ?? () from /usr/lib/libtbb.so.2
#18 0x00007f16798870ff in ?? () from /usr/lib/libtbb.so.2
#19 0x00007f16798872f9 in ?? () from /usr/lib/libtbb.so.2
#20 0x00007f1679aab184 in start_thread (arg=0x7f164edf1700) at pthread_create.c:312
#21 0x00007f16783cb37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Revision history for this message
Pavana (pavanap) wrote :

Similar crash observed on the latest 4.0.0.0-3045 (CentOS 7.1 kilo) non HA, single node Webui sanity setup

Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal 6, Aborted.
#0 0x00002af4ab1025d7 in raise () from /lib64/libc.so.6
#0 0x00002af4ab1025d7 in raise () from /lib64/libc.so.6
#1 0x00002af4ab103cc8 in abort () from /lib64/libc.so.6
#2 0x00002af4ab0fb546 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00002af4ab0fb5f2 in __assert_fail () from /lib64/libc.so.6
#4 0x000000000065e355 in RedisProcessorExec::SyncDeleteUVEs(std::string const&, unsigned short, std::string const&, std::string const&, std::string const&, std::string const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >&) ()
#5 0x0000000000716962 in OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&) ()
#6 0x0000000000647972 in SandeshGenerator::DisconnectSession(VizSession*) ()
#7 0x0000000000628ca5 in Collector::DisconnectSession(SandeshSession*) ()
#8 0x00000000008fa36e in SandeshServerConnection::ProcessDisconnect(SandeshSession*) ()
#9 0x00000000008e82df in ssm::Established::react(ssm::EvTcpClose const&) ()
#10 0x00000000008f762f in boost::statechart::detail::reaction_result boost::statechart::custom_reaction<ssm::EvTcpClose>::react<ssm::Established, boost::statechart::event_base, void const*>(ssm::Established&, boost::statechart::event_base const&, void const* const&) ()

tags: added: sanity
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/29562
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29562
Committed: http://github.org/Juniper/contrail-controller/commit/409b4f989c2b59e7a4162317102eebd49695b06d
Submitter: Zuul (<email address hidden>)
Branch: master

commit 409b4f989c2b59e7a4162317102eebd49695b06d
Author: arvindvis <email address hidden>
Date: Mon Mar 13 11:34:32 2017 -0700

This fix ensures all redis DB is flushed before any GetSeq calls are sent by the generators. This is needed because we can have the following sequence in the collector that can lead to a crash viz., generator issues GetSeq, collector to redis connection post process call resulting in Flush of database, DeleteUVE request from the generator due to disconnect session. The above sequence tries to delete a UVE thats not there in Redis. This fix prevents such a case from happening.
Closes-Bug:#1670908
Change-Id: I51dfedacdac75fdf5b271cdcaaafb7d4ddfd9bfc

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/29763
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29763
Committed: http://github.org/Juniper/contrail-controller/commit/4faef232f90b3230c4b24b3c1d8877896347b91a
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 4faef232f90b3230c4b24b3c1d8877896347b91a
Author: arvindvis <email address hidden>
Date: Mon Mar 13 11:34:32 2017 -0700

This fix ensures all redis DB is flushed before any GetSeq calls are sent by the generators. This is needed because we can have the following sequence in the collector that can lead to a crash viz., generator issues GetSeq, collector to redis connection post process call resulting in Flush of database, DeleteUVE request from the generator due to disconnect session. The above sequence tries to delete a UVE thats not there in Redis. This fix prevents such a case from happening.
Closes-Bug:#1670908
(cherry picked from commit 409b4f989c2b59e7a4162317102eebd49695b06d)
Change-Id: I51dfedacdac75fdf5b271cdcaaafb7d4ddfd9bfc

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29935
Submitter: Arvind (<email address hidden>)

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29935
Committed: http://github.org/Juniper/contrail-controller/commit/778d457130c73101842178fbf51022d3faa3b1d6
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 778d457130c73101842178fbf51022d3faa3b1d6
Author: arvindvis <email address hidden>
Date: Mon Mar 13 11:34:32 2017 -0700

This fix ensures all redis DB is flushed before any GetSeq calls are sent by the generators. This is needed because we can have the following sequence in the collector that can lead to a crash viz., generator issues GetSeq, collector to redis connection post process call resulting in Flush of database, DeleteUVE request from the generator due to disconnect session. The above sequence tries to delete a UVE thats not there in Redis. This fix prevents such a case from happening.
Closes-Bug:#1670908
(cherry picked from commit 409b4f989c2b59e7a4162317102eebd49695b06d)
Change-Id: I51dfedacdac75fdf5b271cdcaaafb7d4ddfd9bfc
(cherry picked from commit 4faef232f90b3230c4b24b3c1d8877896347b91a)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/30773
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/30774
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/30773
Committed: http://github.com/Juniper/contrail-controller/commit/c9fe6840dbd6eb53cf129ae550dc8bed8b985ce2
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit c9fe6840dbd6eb53cf129ae550dc8bed8b985ce2
Author: arvindvis <email address hidden>
Date: Mon Mar 13 11:34:32 2017 -0700

This fix ensures all redis DB is flushed before any GetSeq calls are sent by the generators. This is needed because we can have the following sequence in the collector that can lead to a crash viz., generator issues GetSeq, collector to redis connection post process call resulting in Flush of database, DeleteUVE request from the generator due to disconnect session. The above sequence tries to delete a UVE thats not there in Redis. This fix prevents such a case from happening.
Closes-Bug:#1670908
(cherry picked from commit 409b4f989c2b59e7a4162317102eebd49695b06d)
Change-Id: I51dfedacdac75fdf5b271cdcaaafb7d4ddfd9bfc
(cherry picked from commit 4faef232f90b3230c4b24b3c1d8877896347b91a)
(cherry picked from commit 778d457130c73101842178fbf51022d3faa3b1d6)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/30774
Committed: http://github.com/Juniper/contrail-controller/commit/5773a28d7bc7526a75744bd31ac117bea0689f4e
Submitter: Zuul (<email address hidden>)
Branch: R3.0.3.x

commit 5773a28d7bc7526a75744bd31ac117bea0689f4e
Author: arvindvis <email address hidden>
Date: Mon Mar 13 11:34:32 2017 -0700

This fix ensures all redis DB is flushed before any GetSeq calls are sent by the generators. This is needed because we can have the following sequence in the collector that can lead to a crash viz., generator issues GetSeq, collector to redis connection post process call resulting in Flush of database, DeleteUVE request from the generator due to disconnect session. The above sequence tries to delete a UVE thats not there in Redis. This fix prevents such a case from happening.
Closes-Bug:#1670908
(cherry picked from commit 409b4f989c2b59e7a4162317102eebd49695b06d)
Change-Id: I51dfedacdac75fdf5b271cdcaaafb7d4ddfd9bfc
(cherry picked from commit 4faef232f90b3230c4b24b3c1d8877896347b91a)
(cherry picked from commit 778d457130c73101842178fbf51022d3faa3b1d6)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.