Build requests can be orphaned without instance mappings

Bug #1784093 reported by melanie witt
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
melanie witt
Queens
Fix Committed
Medium
melanie witt
Rocky
Fix Committed
Medium
melanie witt
Stein
Fix Committed
Medium
melanie witt

Bug Description

Mohammed reported this in the nova channel today [1] and the RDO cloud people have run into the same issue too. The deployment got into a situation where instances would show up in a 'nova list' in BUILD/scheduling state but were unable to be deleted. (They show up in 'nova list' because 'nova list' lists build requests and all instances in all cells).

Inspection of the database showed that the "instance" had a build request but *no* instance mapping and *no* instance record in any cell. And the instance could not be deleted even though it appeared in the 'nova list' because the delete API first does a compute API().get in order to get the instance object to pass down to the compute API().delete method. The compute API().get fails with InstanceNotFound because the _get_instance method raises InstanceNotFound if there is no instance mapping for the instance.

Mohammed was able to share this trace [2] which shows the instance_mapping.create() failing due to database errors, right after the build_request.create() succeeded:

2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/nova/compute/api.py", line 937, in _provision_instances
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi inst_mapping.create()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/nova/objects/instance_mapping.py", line 92, in create
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi db_mapping = self._create_in_db(self._context, changes)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 986, in wrapper
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi return fn(*args, **kwargs)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.gen.next()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1036, in _transaction_scope
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi yield resource
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.gen.next()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 646, in _session
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.session.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 907, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.transaction.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 532, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi util.reraise(*rollback_err)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 497, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi t[1].rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1632, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._do_rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1670, in _do_rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.connection._rollback_impl()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._handle_dbapi_exception(e, None, None, None, None)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._autorollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 76, in __exit__
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi compat.reraise(type_, value, traceback)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._autorollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 827, in _autorollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._root._rollback_impl()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._handle_dbapi_exception(e, None, None, None, None)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1334, in _handle_dbapi_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi exc_info
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self.engine.dialect.do_rollback(self.connection)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1773, in do_rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi dbapi_connection.rollback()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 786, in rollback
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi self._read_ok_packet()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 760, in _read_ok_packet
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi pkt = self._read_packet()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 1018, in _read_packet
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi packet.check_error()
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/connections.py", line 384, in check_error
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi err.raise_mysql_exception(self._data)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi File "/openstack/venvs/nova-17.0.3/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi raise errorclass(errno, errval)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi InternalError: (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') (Background on this error at: http://sqlalche.me/e/2j85)
2018-07-25 04:20:12.946 7926 ERROR nova.api.openstack.wsgi

and the API request returned with a 500:

"HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible."

Mohammed is going to try a fix to do the build request and instance mapping creates in a single database transaction, so that the build request cannot be orphaned.

Another way to handle it would be to leave the creates as-is and make the API handle deletion of orphaned build requests, but doing that would allow another avenue for instances in ERROR state, whereas doing the build request and instance mapping creates in a single transaction would avoid that.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/latest.log.html#t2018-07-28T00:27:59
[2] http://paste.openstack.org/show/726772

Tags: cells
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/586742

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

Shouldn't delete go through _delete_while_booting and delete the build request and not barf on the instance mapping not being found?

Another simple change before a big transactional insert would be to create the instance mapping before the build request, but I'm not entirely sure that will fix the delete issue you're pointing out if *both* of the instance mapping and build request don't exist.

Revision history for this message
Mohammed Naser (mnaser) wrote :

Now that you say it, we might be able to work around it this way:

https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2351

Add a clause or edit that same one if inst_mapping is None.

I’ll try to push up a quick patch..

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/586824

Revision history for this message
Mohammed Naser (mnaser) wrote :

I pushed up a patch that does this alongside tests ^

Revision history for this message
melanie witt (melwitt) wrote :

That's what I was saying in the last paragraph of comment 0 -- that we could make delete handle the case where only a build request exists and no instance mapping.

However, if that's the *only* thing we did to fix this bug, if build_request.create() succeeds and instance_mapping.create() fails, we get an instance that the user needs to delete (is it in ERROR state?). Whereas if we do the single transaction thing, we can avoid leaving an instance around if the scenario happens and avoid having to get users to delete instances in this case.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Mohammed Naser (<email address hidden>) on branch: master
Review: https://review.openstack.org/586742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Mohammed Naser (<email address hidden>) on branch: master
Review: https://review.openstack.org/586824

Changed in nova:
assignee: Mohammed Naser (mnaser) → melanie witt (melwitt)
Revision history for this message
melanie witt (melwitt) wrote :

NOTE: If you run into this issue and have orphaned build_requests table records causing servers without 'instance_mappings' table or 'instances' table records ('openstack server show' does not work on them) to show up in 'openstack server list', you can clean up the situation by manually deleting the matching records from the 'request_specs' and 'build_requests' tables in the API database.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/586742
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=85f8d033d27b31a6398529e0a25da74eae523b08
Submitter: Zuul
Branch: master

commit 85f8d033d27b31a6398529e0a25da74eae523b08
Author: Mohammed Naser <email address hidden>
Date: Fri Jul 27 21:09:22 2018 -0400

    Create request spec, build request and mappings in one transaction

    The transaction context is currently not shared when creating the
    RequestSpec, BuildRequest and InstanceMapping. Because of this,
    it is possible that the database ends in an inconsistent state
    due to the fact that one of these was created and the system
    crashed afterwards.

    This patch adds a function which handles the creation of all those
    resources in a single transaction.

    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: #1784093

    Change-Id: If897a0d721180152ebdceb7a0c23e8f283ce6d10

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/656018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/656030

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/656031

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/656018
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=870e5bcfb6a5243a785b66353dda299286274e59
Submitter: Zuul
Branch: stable/stein

commit 870e5bcfb6a5243a785b66353dda299286274e59
Author: Mohammed Naser <email address hidden>
Date: Fri Jul 27 21:09:22 2018 -0400

    Create request spec, build request and mappings in one transaction

    The transaction context is currently not shared when creating the
    RequestSpec, BuildRequest and InstanceMapping. Because of this,
    it is possible that the database ends in an inconsistent state
    due to the fact that one of these was created and the system
    crashed afterwards.

    This patch adds a function which handles the creation of all those
    resources in a single transaction.

    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: #1784093

    Change-Id: If897a0d721180152ebdceb7a0c23e8f283ce6d10
    (cherry picked from commit 85f8d033d27b31a6398529e0a25da74eae523b08)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/656030
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc0ded897b5e5df89166c318f8bed3dbb6553fce
Submitter: Zuul
Branch: stable/rocky

commit dc0ded897b5e5df89166c318f8bed3dbb6553fce
Author: Mohammed Naser <email address hidden>
Date: Fri Jul 27 21:09:22 2018 -0400

    Create request spec, build request and mappings in one transaction

    The transaction context is currently not shared when creating the
    RequestSpec, BuildRequest and InstanceMapping. Because of this,
    it is possible that the database ends in an inconsistent state
    due to the fact that one of these was created and the system
    crashed afterwards.

    This patch adds a function which handles the creation of all those
    resources in a single transaction.

    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: #1784093

     Conflicts:
     nova/tests/unit/compute/test_compute_api.py

    NOTE(melwitt): The conflict is because change
    Iaffbb019fef7779e7fa44306aacca954512b6970 is not in Rocky.
    The difference in the test is because change
    I7f5f08691ca3f73073c66c29dddb996fb2c2b266 is not in Rocky.

    Change-Id: If897a0d721180152ebdceb7a0c23e8f283ce6d10
    (cherry picked from commit 85f8d033d27b31a6398529e0a25da74eae523b08)
    (cherry picked from commit 870e5bcfb6a5243a785b66353dda299286274e59)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/656031
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=22cd12dbb7fa5a7e0b98e4c42a73b4e7d34b0851
Submitter: Zuul
Branch: stable/queens

commit 22cd12dbb7fa5a7e0b98e4c42a73b4e7d34b0851
Author: Mohammed Naser <email address hidden>
Date: Fri Jul 27 21:09:22 2018 -0400

    Create request spec, build request and mappings in one transaction

    The transaction context is currently not shared when creating the
    RequestSpec, BuildRequest and InstanceMapping. Because of this,
    it is possible that the database ends in an inconsistent state
    due to the fact that one of these was created and the system
    crashed afterwards.

    This patch adds a function which handles the creation of all those
    resources in a single transaction.

    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: #1784093

    Change-Id: If897a0d721180152ebdceb7a0c23e8f283ce6d10
    (cherry picked from commit 85f8d033d27b31a6398529e0a25da74eae523b08)
    (cherry picked from commit 870e5bcfb6a5243a785b66353dda299286274e59)
    (cherry picked from commit dc0ded897b5e5df89166c318f8bed3dbb6553fce)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.