Nova cells can die unexpectedly on boot due to db failure

Bug #1342257 reported by Christopher Lefelhocz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Christopher Lefelhocz

Bug Description

We have seen a crash in the cells booting process with the following traceback:

2014-07-15 01:00:07.688 3070 CRITICAL nova [req-badc12a2-4ad9-4209-bcd4-f2429e134820 None] DBError: (1030, 'Got error 28 from storage engine')
2014-07-15 01:00:07.688 3070 TRACE nova Traceback (most recent call last):
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/current/nova/bin/nova-cells", line 13, in <module>
2014-07-15 01:00:07.688 3070 TRACE nova sys.exit(main())
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/cmd/cells.py", line 45, in main
2014-07-15 01:00:07.688 3070 TRACE nova manager=CONF.cells.manager)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/service.py", line 275, in create
2014-07-15 01:00:07.688 3070 TRACE nova db_allowed=db_allowed)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/service.py", line 148, in __init__
2014-07-15 01:00:07.688 3070 TRACE nova self.manager = manager_class(host=self.host, *args, **kwargs)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/cells/manager.py", line 90, in __init__
2014-07-15 01:00:07.688 3070 TRACE nova self.state_manager = cell_state_manager()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/cells/state.py", line 161, in __new__
2014-07-15 01:00:07.688 3070 TRACE nova return CellStateManagerDB(cell_state_cls)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/cells/state.py", line 174, in __init__
2014-07-15 01:00:07.688 3070 TRACE nova self._cell_data_sync(force=True)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 325, in inner
2014-07-15 01:00:07.688 3070 TRACE nova return f(*args, **kwargs)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/cells/state.py", line 436, in _cell_data_sync
2014-07-15 01:00:07.688 3070 TRACE nova db_cells = self.db.cell_get_all(ctxt)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/api.py", line 1599, in cell_get_all
2014-07-15 01:00:07.688 3070 TRACE nova return IMPL.cell_get_all(context)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/api.py", line 93, in __getattr__
2014-07-15 01:00:07.688 3070 TRACE nova return getattr(self._db_api, key)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/api.py", line 85, in _db_api
2014-07-15 01:00:07.688 3070 TRACE nova backend_mapping=_BACKEND_MAPPING)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/openstack/common/db/api.py", line 128, in __init__
2014-07-15 01:00:07.688 3070 TRACE nova self._load_backend()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/openstack/common/db/api.py", line 143, in _load_backend
2014-07-15 01:00:07.688 3070 TRACE nova self._backend = backend_mod.get_backend()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/api.py", line 42, in get_backend
2014-07-15 01:00:07.688 3070 TRACE nova return API()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/api.py", line 75, in __init__
2014-07-15 01:00:07.688 3070 TRACE nova self._launch_monitor()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/api.py", line 89, in _launch_monitor
2014-07-15 01:00:07.688 3070 TRACE nova self._check_schema()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/api.py", line 56, in inner
2014-07-15 01:00:07.688 3070 TRACE nova result = f(*args, **kwargs)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/api.py", line 81, in _check_schema
2014-07-15 01:00:07.688 3070 TRACE nova schema = conn.get_schema()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/connection.py", line 186, in get_schema
2014-07-15 01:00:07.688 3070 TRACE nova tables = self._get_tables()
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/connection.py", line 169, in _get_tables
2014-07-15 01:00:07.688 3070 TRACE nova columns = self._get_columns(table_name)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/connection.py", line 159, in _get_columns
2014-07-15 01:00:07.688 3070 TRACE nova cursor = self.execute('DESCRIBE %s' % name)
2014-07-15 01:00:07.688 3070 TRACE nova File "/opt/rackstack/863.0/nova/lib/python2.6/site-packages/nova/db/mysqldb/connection.py", line 79, in inner
2014-07-15 01:00:07.688 3070 TRACE nova raise db_exc.DBError(e)
2014-07-15 01:00:07.688 3070 TRACE nova DBError: (1030, 'Got error 28 from storage engine')
2014-07-15 01:00:07.688 3070 TRACE nova

Since this is a DB issue it seems the process should at the very least retry.

Tags: cells
description: updated
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/107168

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/107168
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9afab4d44e91e87634d85a0867664876baf880a
Submitter: Jenkins
Branch: master

commit b9afab4d44e91e87634d85a0867664876baf880a
Author: Christopher Lefelhocz <email address hidden>
Date: Tue Jul 15 15:05:58 2014 -0500

    Fix nova cells exiting on db failure at launch

    We have seen cases where db errors at launch can cause cell
    services to exit without retrying. The service shouldn't
    exit __init__ much like it doesn't later by handling
    this type of exceptions. We'll wait an hour and
    then give up.

    Change-Id: I24c9eb811d50d1fa6a5e4a5f595ebf68ded3b7b5
    Closes-Bug: 1342257

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.