3.1.3-75:Potential bug in restore_cassandra_db

Bug #1706832 reported by Sandeep Sridhar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.1
Fix Committed
High
Megh Bhatt
R3.2
Fix Committed
High
Megh Bhatt
R4.0
Fix Committed
High
Megh Bhatt
Trunk
Fix Committed
High
Megh Bhatt

Bug Description

Contrail Version: 3.1.3-75~mitaka

We have noticed that whenever we run `fab stop_collector` after `fab stop_database`, at times, the task doesn’t end and contrail-alarm-gen-0-stdout.log has the following logs:

Connect attempt to <BrokerConnection host=192.168.0.131 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.131 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.133 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.133 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.132 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.132 port=9092>
FailedPayloadsError for -uve-5:0

There are 2 functions in our code that uses this order:

def restore_cassandra_db():
def stop_contrail_control_services():

So, whenever this is run, there is a manual intervention that is required to get things back up and running. We don’t see this, when we change the order as below:

def restore_cassandra_db():
<snip>
try:
execute(stop_cfgm)
execute(stop_collector)
execute(stop_database)
#execute(stop_collector)
execute(restore_cassandra, backup_data_path, store_db,cassandra_backup)
execute(start_cfgm)
execute(start_database)
execute(start_collector)
root@sv-31:/opt/contrail/utils# fab restore_cassandra_db
(Finished without manual intervention.)

I am in discussion with Megh Bhatt from BU on this. He is aware of this and has the relevant logs and info. This bug is filed for tracking purposes.

Tags: analytics
information type: Proprietary → Public
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Megh Bhatt (meghb)
milestone: none → r3.1.4.0
Jeba Paulaiyan (jebap)
tags: added: analytics
Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Analytics Team - We need to speed up the progress on this. Can I please get an update?

-Sandeep.

Revision history for this message
Megh Bhatt (meghb) wrote :
Download full text (6.2 KiB)

Looked at the customer setup:

The issue seems to be that "service supervisor-analytics stop" command issued by fab stop_collector is hanging. "service supervisor-analytics stop" in turn calls "supervisorctl -s unix:///var/run/supervisord_analytics.sock stop all" and that command hangs. The reason for the hang seems to be that once the fab stop_database is run, contrail-alarm-gen loses connection to kafka and it then outputs a lot of messages to stdout

FailedPayloadsError for -uve-19:0
Connect attempt to <BrokerConnection host=192.168.0.138 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.138 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.137 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.137 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.139 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.139 port=9092>
FailedPayloadsError for -uve-19:0
Connect attempt to <BrokerConnection host=192.168.0.138 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.138 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.137 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.137 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.139 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.139 port=9092>
FailedPayloadsError for -uve-19:0
Connect attempt to <BrokerConnection host=192.168.0.139 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.139 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.138 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.138 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.137 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.137 port=9092>
FailedPayloadsError for -uve-19:0
Connect attempt to <BrokerConnection host=192.168.0.137 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.137 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.139 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.139 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.138 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.138 port=9092>
FailedPayloadsError for -uve-19:0
Connect attempt to <BrokerConnection host=192.168.0.138 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.138 port=9092>
Connect attempt to <BrokerConnection host=192.168.0.139 port=9092> returned error 111. Disconnecting.
Skipping unconnected connection: <BrokerConnection host=192.168.0.139 port=9092>
Connect at...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/34412
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34412
Committed: http://github.com/Juniper/contrail-sandesh/commit/640747f4e16d435dee9f89a2281f9353dcdd8afb
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 640747f4e16d435dee9f89a2281f9353dcdd8afb
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 13:31:04 2017 -0700

Add SandeshLogger API to configure user specified logger

Add a SandeshLogger API - set_logger_params() that allows to configure
user specified logger as per the logging parameters similar to how
sandesh logger is configured. contrail-alarm-gen will use the API
to configure kafka and kazoo loggers.

Change-Id: I33ddf504010b27e9ac5726ff596608e5f8f05f68
Partial-Bug: #1706832

Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Megh - Which build in R3.1 Branch has this fix?

Greetings,
Sandeep.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/34490
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/34491
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/34492
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/34493
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34491
Committed: http://github.com/Juniper/contrail-sandesh/commit/06315e74ee87463c8f09f91096e618faff631b79
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 06315e74ee87463c8f09f91096e618faff631b79
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 13:31:04 2017 -0700

Add SandeshLogger API to configure user specified logger

Add a SandeshLogger API - set_logger_params() that allows to configure
user specified logger as per the logging parameters similar to how
sandesh logger is configured. contrail-alarm-gen will use the API
to configure kafka and kazoo loggers.

Change-Id: I33ddf504010b27e9ac5726ff596608e5f8f05f68
Partial-Bug: #1706832
(cherry picked from commit 640747f4e16d435dee9f89a2281f9353dcdd8afb)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34493
Committed: http://github.com/Juniper/contrail-sandesh/commit/6e6ca0bc59722320ae48e9ecb83fa9ec756c0794
Submitter: Zuul (<email address hidden>)
Branch: master

commit 6e6ca0bc59722320ae48e9ecb83fa9ec756c0794
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 13:31:04 2017 -0700

Add SandeshLogger API to configure user specified logger

Add a SandeshLogger API - set_logger_params() that allows to configure
user specified logger as per the logging parameters similar to how
sandesh logger is configured. contrail-alarm-gen will use the API
to configure kafka and kazoo loggers.

Change-Id: I33ddf504010b27e9ac5726ff596608e5f8f05f68
Partial-Bug: #1706832
(cherry picked from commit 640747f4e16d435dee9f89a2281f9353dcdd8afb)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/34561
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/34562
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/34563
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34490
Committed: http://github.com/Juniper/contrail-controller/commit/d235608dd6b5084a20b2a80f1ed32286e3e20263
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit d235608dd6b5084a20b2a80f1ed32286e3e20263
Author: Megh Bhatt <email address hidden>
Date: Fri Aug 11 11:14:35 2017 -0700

Configure kafka and kazoo loggers in contrail-alarm-gen

When contrail-alarm-gen disconnects from kafka, it retries
to reconnect to it and spits out bunch of messages. The kafka
python library logger was configured to output those messages
to stdout and due to this supervisor-analytics would get very
busy and not be able to handle other commands like stop all
and hence service supervisor-analytics stop would hang. Fix is
to configure the kafka and kazoo loggers in contrail-alarm-gen
as per the logging parameters and log to the file instead.

Change-Id: I553c53b3b549269a8cc0a87feda137980f008643
Closes-Bug: #1706832

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34492
Committed: http://github.com/Juniper/contrail-sandesh/commit/21fca210c72f7f62511bf4caaed78c070a5d4879
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 21fca210c72f7f62511bf4caaed78c070a5d4879
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 13:31:04 2017 -0700

Add SandeshLogger API to configure user specified logger

Add a SandeshLogger API - set_logger_params() that allows to configure
user specified logger as per the logging parameters similar to how
sandesh logger is configured. contrail-alarm-gen will use the API
to configure kafka and kazoo loggers.

Change-Id: I33ddf504010b27e9ac5726ff596608e5f8f05f68
Partial-Bug: #1706832
(cherry picked from commit 640747f4e16d435dee9f89a2281f9353dcdd8afb)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34563
Committed: http://github.com/Juniper/contrail-controller/commit/e2571c1e6d2dc1f16d3f51cac0b2d48a59edf8ae
Submitter: Zuul (<email address hidden>)
Branch: master

commit e2571c1e6d2dc1f16d3f51cac0b2d48a59edf8ae
Author: Megh Bhatt <email address hidden>
Date: Fri Aug 11 11:14:35 2017 -0700

Configure kafka and kazoo loggers in contrail-alarm-gen

When contrail-alarm-gen disconnects from kafka, it retries
to reconnect to it and spits out bunch of messages. The kafka
python library logger was configured to output those messages
to stdout and due to this supervisor-analytics would get very
busy and not be able to handle other commands like stop all
and hence service supervisor-analytics stop would hang. Fix is
to configure the kafka and kazoo loggers in contrail-alarm-gen
as per the logging parameters and log to the file instead.

Change-Id: I553c53b3b549269a8cc0a87feda137980f008643
Closes-Bug: #1706832

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34562
Committed: http://github.com/Juniper/contrail-controller/commit/6dfd78fca3b1c105b31de8b8cbaefada29d9f042
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 6dfd78fca3b1c105b31de8b8cbaefada29d9f042
Author: Megh Bhatt <email address hidden>
Date: Fri Aug 11 11:14:35 2017 -0700

Configure kafka and kazoo loggers in contrail-alarm-gen

When contrail-alarm-gen disconnects from kafka, it retries
to reconnect to it and spits out bunch of messages. The kafka
python library logger was configured to output those messages
to stdout and due to this supervisor-analytics would get very
busy and not be able to handle other commands like stop all
and hence service supervisor-analytics stop would hang. Fix is
to configure the kafka and kazoo loggers in contrail-alarm-gen
as per the logging parameters and log to the file instead.

Change-Id: I553c53b3b549269a8cc0a87feda137980f008643
Closes-Bug: #1706832

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34561
Committed: http://github.com/Juniper/contrail-controller/commit/01c786e94e5f3ccb360b8d47bd3b3a58a385aec1
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 01c786e94e5f3ccb360b8d47bd3b3a58a385aec1
Author: Megh Bhatt <email address hidden>
Date: Fri Aug 11 11:14:35 2017 -0700

Configure kafka and kazoo loggers in contrail-alarm-gen

When contrail-alarm-gen disconnects from kafka, it retries
to reconnect to it and spits out bunch of messages. The kafka
python library logger was configured to output those messages
to stdout and due to this supervisor-analytics would get very
busy and not be able to handle other commands like stop all
and hence service supervisor-analytics stop would hang. Fix is
to configure the kafka and kazoo loggers in contrail-alarm-gen
as per the logging parameters and log to the file instead.

Change-Id: I553c53b3b549269a8cc0a87feda137980f008643
Closes-Bug: #1706832

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.