Retry mechanism does not work on startup when used with MySQL

Bug #1376211 reported by Ihar Hrachyshka
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
High
Ann Taraday
oslo.db
Fix Released
High
Roman Podoliaka
Juno
Fix Released
High
Roman Podoliaka

Bug Description

This is initially revealed as Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1144181

The problem shows up when Neutron or any other oslo.db based projects start while MySQL server is not up yet. Instead of retrying connection as per max_retries and retry_interval, service just crashes with return code 1.

This is because during engine initialization, "engine.execute("SHOW VARIABLES LIKE 'sql_mode'")" is called, which opens the connection, *before* _test_connection() succeeds. So the server just bail out to sys.exit() at the top of the stack.

This behaviour was checked for both oslo.db 0.4.0 and 1.0.1.

I suspect this is a regression from the original db code from oslo-incubator though I haven't checked it specifically.

The easiest way to reproduce the traceback is:

1. stop MariaDB.
2. execute the following Python script:

'''
import oslo.db.sqlalchemy.session

url = 'mysql://neutron:123456@10.35.161.235/neutron'
engine = oslo.db.sqlalchemy.session.EngineFacade(url)
'''

The following traceback can be seen in service log:

2014-10-01 13:46:10.588 5812 TRACE neutron Traceback (most recent call last):
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/bin/neutron-server", line 10, in <module>
2014-10-01 13:46:10.588 5812 TRACE neutron sys.exit(main())
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/server/__init__.py", line 47, in main
2014-10-01 13:46:10.588 5812 TRACE neutron neutron_api = service.serve_wsgi(service.NeutronApiService)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 105, in serve_wsgi
2014-10-01 13:46:10.588 5812 TRACE neutron LOG.exception(_('Unrecoverable error: please check log '
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 82, in __exit__
2014-10-01 13:46:10.588 5812 TRACE neutron six.reraise(self.type_, self.value, self.tb)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 102, in serve_wsgi
2014-10-01 13:46:10.588 5812 TRACE neutron service.start()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 73, in start
2014-10-01 13:46:10.588 5812 TRACE neutron self.wsgi_app = _run_wsgi(self.app_name)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 168, in _run_wsgi
2014-10-01 13:46:10.588 5812 TRACE neutron app = config.load_paste_app(app_name)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/common/config.py", line 182, in load_paste_app
2014-10-01 13:46:10.588 5812 TRACE neutron app = deploy.loadapp("config:%s" % config_path, name=app_name)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
2014-10-01 13:46:10.588 5812 TRACE neutron return loadobj(APP, uri, name=name, **kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
2014-10-01 13:46:10.588 5812 TRACE neutron return context.create()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 710, in create
2014-10-01 13:46:10.588 5812 TRACE neutron return self.object_type.invoke(self)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2014-10-01 13:46:10.588 5812 TRACE neutron **context.local_conf)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/util.py", line 56, in fix_call
2014-10-01 13:46:10.588 5812 TRACE neutron val = callable(*args, **kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 25, in urlmap_factory
2014-10-01 13:46:10.588 5812 TRACE neutron app = loader.get_app(app_name, global_conf=global_conf)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2014-10-01 13:46:10.588 5812 TRACE neutron name=name, global_conf=global_conf).create()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 710, in create
2014-10-01 13:46:10.588 5812 TRACE neutron return self.object_type.invoke(self)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2014-10-01 13:46:10.588 5812 TRACE neutron **context.local_conf)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/util.py", line 56, in fix_call
2014-10-01 13:46:10.588 5812 TRACE neutron val = callable(*args, **kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/auth.py", line 67, in pipeline_factory
2014-10-01 13:46:10.588 5812 TRACE neutron app = loader.get_app(pipeline[-1])
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2014-10-01 13:46:10.588 5812 TRACE neutron name=name, global_conf=global_conf).create()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 710, in create
2014-10-01 13:46:10.588 5812 TRACE neutron return self.object_type.invoke(self)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 146, in invoke
2014-10-01 13:46:10.588 5812 TRACE neutron return fix_call(context.object, context.global_conf, **context.local_conf)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/paste/deploy/util.py", line 56, in fix_call
2014-10-01 13:46:10.588 5812 TRACE neutron val = callable(*args, **kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/api/v2/router.py", line 71, in factory
2014-10-01 13:46:10.588 5812 TRACE neutron return cls(**local_config)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/api/v2/router.py", line 75, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron plugin = manager.NeutronManager.get_plugin()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/manager.py", line 219, in get_plugin
2014-10-01 13:46:10.588 5812 TRACE neutron return weakref.proxy(cls.get_instance().plugin)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/manager.py", line 213, in get_instance
2014-10-01 13:46:10.588 5812 TRACE neutron cls._create_instance()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/openstack/common/lockutils.py", line 249, in inner
2014-10-01 13:46:10.588 5812 TRACE neutron return f(*args, **kwargs)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/manager.py", line 199, in _create_instance
2014-10-01 13:46:10.588 5812 TRACE neutron cls._instance = cls()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/manager.py", line 114, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron plugin_provider)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/manager.py", line 140, in _get_plugin_instance
2014-10-01 13:46:10.588 5812 TRACE neutron return plugin_class()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 113, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron self.type_manager.initialize()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 74, in initialize
2014-10-01 13:46:10.588 5812 TRACE neutron driver.obj.initialize()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/type_vlan.py", line 162, in initialize
2014-10-01 13:46:10.588 5812 TRACE neutron self._sync_vlan_allocations()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/type_vlan.py", line 99, in _sync_vlan_allocations
2014-10-01 13:46:10.588 5812 TRACE neutron session = db_api.get_session()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 60, in get_session
2014-10-01 13:46:10.588 5812 TRACE neutron facade = _create_facade_lazily()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 34, in _create_facade_lazily
2014-10-01 13:46:10.588 5812 TRACE neutron _FACADE = session.EngineFacade.from_config(cfg.CONF, sqlite_fk=True)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/session.py", line 795, in from_config
2014-10-01 13:46:10.588 5812 TRACE neutron retry_interval=conf.database.retry_interval)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/session.py", line 711, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron **engine_kwargs)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/session.py", line 386, in create_engine
2014-10-01 13:46:10.588 5812 TRACE neutron connection_trace=connection_trace
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/utils.py", line 890, in __call__
2014-10-01 13:46:10.588 5812 TRACE neutron self._url_from_target(target), target, arg, kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/utils.py", line 927, in _dispatch_on
2014-10-01 13:46:10.588 5812 TRACE neutron return self._dispatch_on_db_driver(dbname, driver, arg, kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/utils.py", line 981, in _dispatch_on_db_driver
2014-10-01 13:46:10.588 5812 TRACE neutron if self._invoke_fn(fn, arg, kw) is not None:
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/utils.py", line 930, in _invoke_fn
2014-10-01 13:46:10.588 5812 TRACE neutron return fn(*arg, **kw)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib/python2.7/site-packages/oslo/db/sqlalchemy/session.py", line 456, in _init_events
2014-10-01 13:46:10.588 5812 TRACE neutron realmode = engine.execute("SHOW VARIABLES LIKE 'sql_mode'").fetchone()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1751, in execute
2014-10-01 13:46:10.588 5812 TRACE neutron connection = self.contextual_connect(close_with_result=True)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1799, in contextual_connect
2014-10-01 13:46:10.588 5812 TRACE neutron self.pool.connect(),
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 338, in connect
2014-10-01 13:46:10.588 5812 TRACE neutron return _ConnectionFairy._checkout(self)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 641, in _checkout
2014-10-01 13:46:10.588 5812 TRACE neutron fairy = _ConnectionRecord.checkout(pool)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 440, in checkout
2014-10-01 13:46:10.588 5812 TRACE neutron rec = pool._do_get()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 961, in _do_get
2014-10-01 13:46:10.588 5812 TRACE neutron return self._create_connection()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 285, in _create_connection
2014-10-01 13:46:10.588 5812 TRACE neutron return _ConnectionRecord(self)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 411, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron self.connection = self.__connect()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 537, in __connect
2014-10-01 13:46:10.588 5812 TRACE neutron connection = self.__pool._creator()
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 96, in connect
2014-10-01 13:46:10.588 5812 TRACE neutron connection_invalidated=invalidated
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
2014-10-01 13:46:10.588 5812 TRACE neutron reraise(type(exception), exception, tb=exc_tb)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 90, in connect
2014-10-01 13:46:10.588 5812 TRACE neutron return dialect.connect(*cargs, **cparams)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 377, in connect
2014-10-01 13:46:10.588 5812 TRACE neutron return self.dbapi.connect(*cargs, **cparams)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
2014-10-01 13:46:10.588 5812 TRACE neutron return Connection(*args, **kwargs)
2014-10-01 13:46:10.588 5812 TRACE neutron File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
2014-10-01 13:46:10.588 5812 TRACE neutron super(Connection, self).__init__(*args, **kwargs2)
2014-10-01 13:46:10.588 5812 TRACE neutron OperationalError: (OperationalError) (2003, "Can't connect to MySQL server on '10.35.161.235' (111)") None None

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This bug affects neutron since after a reboot it might be possible that the service will fail at startup if it's started before the mysql service.

Changed in oslo.db:
status: New → Triaged
importance: Undecided → High
Changed in oslo.db:
assignee: nobody → Roman Podoliaka (rpodolyaka)
status: Triaged → In Progress
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
importance: Medium → High
Changed in oslo.db:
milestone: none → next-kilo
tags: added: icehouse-backport-potential
tags: added: juno-rc-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :
Changed in oslo.db:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
tags: added: juno-backport-potential
removed: juno-rc-potential
Changed in oslo.db:
status: Fix Committed → Fix Released
Changed in neutron:
assignee: nobody → Ann Kamyshnikova (akamyshnikova)
tags: added: db
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

This bug can be consider as fix committed as it was fixed and released in oslo.db and version of oslo.db is updated in requirements https://review.openstack.org/136296

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

@Ann, are all stable branches using oslo.db?

Revision history for this message
Ann Taraday (akamyshnikova) wrote :

Neutron swicthed to oslo.db since juno https://review.openstack.org/103519 So, stable/juno uses it.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

@Ann, stable/icehouse does not use oslo.db though.

Revision history for this message
Ann Taraday (akamyshnikova) wrote :

@Ihar, yes it does not. Does this bug affect stable/icehouse?

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

@Ann, sorry, I've now browsed the code in stable/icehouse and am pretty sure it should not affect the branch for neutron. The problem seems to be introduced in oslo.db when doing refactoring to introduce init dispatcher for db types.

Revision history for this message
Michael Zhdanov (zhdan) wrote :

I use OpenStack latest Juno release on Ubuntu 14.04.1. The problem is still exist (after reboot scheduler and cinder services can not reconnect to MySql db).

I have fixed it on my own by adding try/catch to /usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py (not sure that this is good variant). You can see example here - https://ask.openstack.org/en/question/53602/services-start-up-priority-at-boot-on-controller/?answer=57270#post-id-57270

Revision history for this message
zk (zack-kassab) wrote :

Hello,
I am using OpenStack Juno release and I am experiencing this problem in my environment

Revision history for this message
Yuki Nishiwaki (uckey-1067) wrote :

As far as I know , this bug is solved with oslo.db 1.0.2 that is discussed at here ( https://bugs.launchpad.net/oslo.db/+bug/1374497 ) .
If you encountered this problem , please check your version of oslo.db .
This may be lower than 1.0.2. If it is so , you must update it to 1.0.2 .

And if you use ubunutu as operating system , there is not deb package of oslo.db 1.0.2. So you have to create deb package from source by using py2dsc.

Revision history for this message
Kyle Mestery (mestery) wrote :

Per discussion above, marking this bug as "Invalid" for Neutron. It looks to be fixed in oslo.db and the bug was in fact in oslo.db.

Changed in neutron:
status: Confirmed → Invalid
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The fix was ported to Juno here - https://review.openstack.org/#/c/126732/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.