report state failed

Bug #1589880 reported by kaka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned

Bug Description

Description:
=================
set master database read_only=on when switching master nova database to slave,after that,I check nova service status
# nova-manage service list
Binary Host Zone Status State Updated_At
nova-consoleauth 11_120 internal enabled XXX 2016-06-07 08:28:46
nova-conductor 11_120 internal enabled XXX 2016-06-07 08:28:45
nova-cert 11_120 internal enabled XXX 2016-05-17 08:12:10
nova-scheduler 11_120 internal enabled XXX 2016-05-17 08:12:24
nova-compute 11_121 bx enabled XXX 2016-06-07 08:28:49
nova-compute 11_122 bx enabled XXX 2016-06-07 08:28:42
=================

Steps to reproduce
=================
# mysql
MariaDB [nova]> set global read_only=on;
=================

Environment
================
Version:Liberty
openstack-nova-conductor-12.0.0-1.el7.noarch

Logs
================

2016-05-12 11:01:20.343 9198 ERROR oslo.service.loopingcall
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall Traceback (most recent call last):
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 113, in _run_loop
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 87, in _report_state
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall service.service_ref.save()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 213, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return fn(self, *args, **kwargs)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 250, in save
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall db_service = db.service_update(self._context, self.id, updates)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 153, in service_update
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return IMPL.service_update(context, service_id, values)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 146, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall ectxt.value = e.inner_exc
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 136, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return f(*args, **kwargs)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 532, in service_update
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall service_ref.update(values)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.rollback()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.commit()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self._prepare_impl()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.session.flush()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self._flush(objects)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall transaction.rollback(_capture_exception=True)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall flush_context.execute()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall rec.execute(self)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall uow
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall mapper, table, update)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 672, in _emit_update_statements
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall execute(statement, multiparams)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return meth(self, multiparams, params)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall return connection._execute_clauseelement(self, multiparams, params)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall compiled_sql, distilled_params
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall context)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1335, in _handle_dbapi_exception
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall util.raise_from_cause(newraise, exc_info)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall reraise(type(exception), exception, tb=exc_tb)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall context)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 442, in do_execute
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall cursor.execute(statement, parameters)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall self.errorhandler(self, exc, value)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall raise errorclass, errorvalue
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall OperationalError: (_mysql_exceptions.OperationalError) (1290, 'The MariaDB server is running with the --read-only option so it cannot execute this statement') [SQL: u'UPDATE services SET updated_at=%s, report_count=%s, last_seen_up=%s WHERE services.id = %s'] [parameters: (datetime.datetime(2016, 5, 12, 3, 1, 17, 572767), 1189606, datetime.datetime(2016, 5, 12, 3, 1, 17, 571934), 2L)]

Tags: conductor db oslo
tags: added: conductor db
Revision history for this message
Takashi Natsume (natsume-takashi) wrote :

It is not an issue of nova but an issue of HA configuration.
Need more information.

* nova.conf of nova-conductor

Which products and version are used?

* MariaDB Galera Cluster?
* HAProxy?

Changed in nova:
status: New → Incomplete
Revision history for this message
kaka (mingb1989) wrote :

I use MariaDB master-slave replication and the version is 10.0.16
haproxy + keepalived,the version of haproxy is 1.5.14,keepalived is 1.2.13

I think it is not an issue of HA configuration
the function "_report_state" run at "_run_loop",when an exception occurs in "_report_state","_run_loop" will break.
Preferably "_run_loop" can have a retry mechanism

tags: added: oslo
Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :

Does the error last for ever or just in the duration of switching database connection ?

Revision history for this message
kaka (mingb1989) wrote :

just switch database,it never reports state after that

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

kaka

I think it's duplicate of another problem (#1517926) that was fixed in:

rpodolyaka@rpodolyaka-pc:~/src/nova$ git tag --contains e0647dd4b2ae9f5f6f908102d2ac447440622785
12.0.1
12.0.2
12.0.3
12.0.4
rpodolyaka@rpodolyaka-pc:~/src/nova$ git show e0647dd4b2ae9f5f6f908102d2ac447440622785
commit e0647dd4b2ae9f5f6f908102d2ac447440622785
Author: Roman Podoliaka <email address hidden>
Date: Thu Nov 19 16:00:01 2015 +0200

    servicegroup: stop zombie service due to exception

    If an exception is raised out of the _report_state call, we find that
    the service no longer reports any updates to the database, so the
    service is considered dead, thus creating a kind of zombie service.

    I55417a5b91282c69432bb2ab64441c5cea474d31 seems to introduce a
    regression, which leads to nova-* services marked as 'down', if an
    error happens in a remote nova-conductor while processing a state
    report: only Timeout errors are currently handled, but other errors
    are possible, e.g. a DBError (wrapped with RemoteError on RPC
    client side), if a DB temporarily goes away. This unhandled exception
    will effectively break the state reporting thread - service will be
    up again only after restart.

    While the intention of I55417a5b91282c69432bb2ab64441c5cea474d31 was
    to avoid cathing all the possible exceptions, but it looks like we must
    do that to avoid creating a zombie.
    The other part of that change was to ensure that during upgrade, we do
    not spam the log server about MessagingTimeouts while the
    nova-conductors are being restarted. This change ensures that still
    happens.

    Closes-Bug: #1517926

    Change-Id: I44f118f82fbb811b790222face4c74d79795fe21
    (cherry picked from commit 49b0d1741c674714fabf24d8409810064b953202)

and you seem to be using version 12.0.0.

Please try to update to the latest stable/liberty version of code and re-open the bug if it's still reproduced (I did not manage to reproduce it locally, presumably due to the fact that I use the version with the commit above ^ applied).

no longer affects: oslo.service
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.