Build requests can be orphaned without instance mappings

Bug #1784093 reported by melanie witt on 2018-07-28
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
melanie witt
Queens
Medium
Unassigned
Rocky
Medium
Unassigned
Stein
Medium
Unassigned

Bug Description

Mohammed reported this in the nova channel today [1] and the RDO cloud people have run into the same issue too. The deployment got into a situation where instances would show up in a 'nova list' in BUILD/scheduling state but were unable to be deleted. (They show up in 'nova list' because 'nova list' lists build requests and all instances in all cells).

Inspection of the database showed that the "instance" had a build request but *no* instance mapping and *no* instance record in any cell. And the instance could not be deleted even though it appeared in the 'nova list' because the delete API first does a compute API().get in order to get the instance object to pass down to the compute API().delete method. The compute API().get fails with InstanceNotFound because the _get_instance method raises InstanceNotFound if there is no instance mapping for the instance.

Mohammed was able to share this trace [2] which shows the instance_mapping.create() failing due to database errors, right after the build_request.create() succeeded:

2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/nova/compute/api.py", line 937, in _provision_instances
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi inst_mapping.create()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/nova/objects/instance_mapping.py", line 92, in create
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi db_mapping = self._create_in_db(self._context, changes)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 986, in wrapper
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi return fn(*args, **kwargs)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.gen.next()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1036, in _transaction_scope
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi yield resource
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.gen.next()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 646, in _session
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.session.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 907, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.transaction.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 532, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi util.reraise(*rollback_err)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 497, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi t[1].rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1632, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._do_rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1670, in _do_rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.connection._rollback_impl()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._handle_dbapi_exception(e, None, None, None, None)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._autorollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 76, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi compat.reraise(type_, value, traceback)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._autorollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 827, in _autorollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._root._rollback_impl()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._handle_dbapi_exception(e, None, None, None, None)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1334, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi exc_info
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.engine.dialect.do_rollback(self.connection)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1773, in do_rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi dbapi_connection.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 786, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._read_ok_packet()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 760, in _read_ok_packet
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi pkt = self._read_packet()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 1018, in _read_packet
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi packet.check_error()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 384, in check_error
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi err.raise_mysql_exception(self._data)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi raise errorclass(errno, errval)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi InternalError: (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') (Background on this error at: http://sqlalche.me/e/2j85)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi

and the API request returned with a 500:

"HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible."

Mohammed is going to try a fix to do the build request and instance mapping creates in a single database transaction, so that the build request cannot be orphaned.

Another way to handle it would be to leave the creates as-is and make the API handle deletion of orphaned build requests, but doing that would allow another avenue for instances in ERROR state, whereas doing the build request and instance mapping creates in a single transaction would avoid that.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/latest.log.html#t2018-07-28T00:27:59
[2] http://paste.openstack.org/show/726772

Fix proposed to branch: master
Review: https://review.openstack.org/586742

Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem) wrote :

Shouldn't delete go through _delete_while_booting and delete the build request and not barf on the instance mapping not being found?

Another simple change before a big transactional insert would be to create the instance mapping before the build request, but I'm not entirely sure that will fix the delete issue you're pointing out if *both* of the instance mapping and build request don't exist.

Mohammed Naser (mnaser) wrote :

Now that you say it, we might be able to work around it this way:

https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2351

Add a clause or edit that same one if inst_mapping is None.

I’ll try to push up a quick patch..

Fix proposed to branch: master
Review: https://review.openstack.org/586824

Mohammed Naser (mnaser) wrote :

I pushed up a patch that does this alongside tests ^

melanie witt (melwitt) wrote :

That's what I was saying in the last paragraph of comment 0 -- that we could make delete handle the case where only a build request exists and no instance mapping.

However, if that's the *only* thing we did to fix this bug, if build_request.create() succeeds and instance_mapping.create() fails, we get an instance that the user needs to delete (is it in ERROR state?). Whereas if we do the single transaction thing, we can avoid leaving an instance around if the scenario happens and avoid having to get users to delete instances in this case.

Change abandoned by Mohammed Naser (<email address hidden>) on branch: master
Review: https://review.openstack.org/586742

Change abandoned by Mohammed Naser (<email address hidden>) on branch: master
Review: https://review.openstack.org/586824

Changed in nova:
assignee: Mohammed Naser (mnaser) → melanie witt (melwitt)
melanie witt (melwitt) wrote :

NOTE: If you run into this issue and have orphaned build_requests table records causing servers without 'instance_mappings' table or 'instances' table records ('openstack server show' does not work on them) to show up in 'openstack server list', you can clean up the situation by manually deleting the matching records from the 'request_specs' and 'build_requests' tables in the API database.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers