Failed instances stuck in BUILD state after Rocky upgrade

Bug #1832860 reported by Mark Goddard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Mark Goddard
kolla
Fix Released
High
Mark Goddard
Rocky
Fix Committed
High
Mark Goddard
Stein
Fix Released
High
Mark Goddard
Train
Fix Released
High
Mark Goddard

Bug Description

Steps to reproduce
==================

Starting with a cloud running the Queens release, upgrade to Rocky.

Create a flavor that cannot fit on any compute node, e.g.

openstack flavor create --ram 100000000 --disk 2147483647 --vcpus 10000 huge

Then create an instance using that flavor:

openstack server create huge --flavor huge --image cirros --network demo-net

Expected
========

The instance fails to boot and ends up in the ERROR state.

Actual
======

The instance fails to boot and gets stuck in the BUILD state.

From nova-conductor.log:

2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1244, in schedule_and_build_instances
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server tags=tags)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1193, in _bury_in_cell0
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server instance.create()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return fn(self, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server db_inst = db.instance_create(self._context, updates)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return IMPL.instance_create(context, values)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ectxt.value = e.inner_exc
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(context, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1774, in instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ec2_instance_create(context, instance_ref['uuid'])
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(context, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 5286, in ec2_instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ec2_instance_ref.save(context.session)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server session.flush()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2254, in flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self._flush(objects)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2380, in _flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server transaction.rollback(_capture_exception=True)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server compat.reraise(exc_type, exc_value, exc_tb)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2344, in _flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server flush_context.execute()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server rec.execute(self)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server uow
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server mapper, table, insert)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 866, in _emit_insert_statements
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server execute(statement, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return meth(self, multiparams, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return connection._execute_clauseelement(self, multiparams, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server compiled_sql, distilled_params
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server context)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server util.raise_from_cause(newraise, exc_info)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server reraise(type(exception), exception, tb=exc_tb, cause=cause)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server context)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server cursor.execute(statement, parameters)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result = self._query(query)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server conn.query(q)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 516, in query
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 727, in _read_query_result
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result.read()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1066, in read
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server first_packet = self.connection._read_packet()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server packet.check_error()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server err.raise_mysql_exception(self._data)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server raise errorclass(errno, errval)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server DBError: (pymysql.err.InternalError) (1054, u"Unknown column 'trusted_certs' in 'field list'") [SQL: u'INSERT INTO instance_extra (created_at, upd
ated_at, deleted_at, deleted, instance_uuid, device_metadata, numa_topology, pci_requests, flavor, vcpu_model, migration_context, keypairs, trusted_certs) VALUES (%(created_at)s, %(updated_at)s, %(deleted
_at)s, %(deleted)s, %(instance_uuid)s, %(device_metadata)s, %(numa_topology)s, %(pci_requests)s, %(flavor)s, %(vcpu_model)s, %(migration_context)s, %(keypairs)s, %(trusted_certs)s)'] [parameters: {'instan
ce_uuid': u'df1bd38c-67cb-4eb0-b2d2-ac08233dadae', 'keypairs': '{"nova_object.version": "1.3", "nova_object.name": "KeyPairList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}', 'p
ci_requests': '[]', 'vcpu_model': None, 'device_metadata': None, 'created_at': datetime.datetime(2019, 6, 12, 15, 0, 24, 430084), 'updated_at': None, 'numa_topology': None, 'trusted_certs': None, 'deleted
': 0, 'migration_context': None, 'flavor': '{"new": null, "old": null, "cur": {"nova_object.version": "1.2", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 214 ... (234 c
haracters truncated) ... , "swap": 0, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 6, "name": "huge"}, "nova_object.namespace": "nova"}}', 'deleted_at': None}] (Backg
round on this error at: http://sqlalche.me/e/2j85)

Workaround
==========

On the controller, perform a nova DB sync:

docker exec -it nova_api nova-manage db sync

Despite this making no changes to the database (checked with mysqldump), it seems to 'fix' nova. New instances created using the 'huge' flavor will go to the ERROR state.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Some things to note:

I'm pretty confident that the DB sync had been run using the rocky nova-api container prior to the upgrade.

The 'missing' trusted_certs column did exist in the instance_extra table in the nova DB prior to performing the workaround DB sync.

No restart of services was necessary.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Digging into this with John G, it looks like it's the nova_cell0 DB that does not have the trusted_certs field in the instance_extra table.

I was looking through the nova-manage code and found an error message that made me wonder if we had the ordering of DB syncs wrong in kolla. I switched the order of the 'nova-manage api_db sync' and 'nova-manage db sync' commands, and this did indeed fix the problem.

I then restored back my DB to queens and investigated. Running the api_db sync then db sync commands gives no error. However, it we run the db sync first (as we do in kolla), then we see this error:

()[nova@72f9e8e99a94 /]$ nova-manage db sync
ERROR: Could not access cell0.
Has the nova_api database been created?
Has the nova_cell0 database been created?
Has "nova-manage api_db sync" been run?
Has "nova-manage cell_v2 map_cell0" been run?
Is [api_database]/connection set in nova.conf?
Is the cell0 database connection URL correct?
Error: (pymysql.err.InternalError) (1054, u"Unknown column 'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT cell_mappings.created_at AS cell_mappings_created_at, cell_mappings.updated_at AS cell_mappings_updated_at, cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS cell_mappings_uuid, cell_mappings.name AS cell_mappings_name, cell_mappings.transport_url AS cell_mappings_transport_url, cell_mappings.database_connection AS cell_mappings_database_connection, cell_mappings.disabled AS cell_mappings_disabled \nFROM cell_mappings \nWHERE cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters: {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}] (Background on this error at: http://sqlalche.me/e/2j85)

However, the command exits zero, so we do not notice this error message.

()[nova@72f9e8e99a94 /]$ echo $?
0

As a general comment, these tools (e.g. nova-manage and nova-status) are typically not run by humans, so text output is not too useful unless accompanied by a non-zero exit code.

So I think we need two fixes.

Kolla: run api_db sync, db sync.
Nova: exit non-zero on the above error, make ordering requirement more clear in the upgrade docs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/665626

Changed in nova:
assignee: nobody → Mark Goddard (mgoddard)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/665627

Mark Goddard (mgoddard)
Changed in kolla-ansible:
milestone: none → 8.0.0
affects: kolla-ansible → kolla
Changed in kolla:
milestone: 8.0.0 → none
no longer affects: kolla-ansible/rocky
Changed in kolla:
importance: Undecided → High
Revision history for this message
Mark Goddard (mgoddard) wrote :

I checked tripleo and OSA and they do appear to have these commands in the correct order.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (master)

Fix proposed to branch: master
Review: https://review.opendev.org/665634

Changed in kolla:
assignee: nobody → Mark Goddard (mgoddard)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (master)

Reviewed: https://review.opendev.org/665634
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=84a07102e2db92333ce416a520151f1a1856f162
Submitter: Zuul
Branch: master

commit 84a07102e2db92333ce416a520151f1a1856f162
Author: Mark Goddard <email address hidden>
Date: Mon Jun 17 10:49:21 2019 +0100

    Run nova api_db sync before db sync

    Backport: Stein, Rocky

    Since cells v2 was introduced, nova operators must run two commands to
    migrate the database schemas of nova's databases - nova-manage api_db
    sync and nova-manage db sync. It is necessary to run them in this order,
    since the db sync may depend on schema changes made to the api
    database in the api_db sync. Executing the db sync first may fail,
    for example with the following seen in a Queens to Rocky upgrade:

        nova-manage db sync
        ERROR: Could not access cell0.
        Has the nova_api database been created?
        Has the nova_cell0 database been created?
        Has "nova-manage api_db sync" been run?
        Has "nova-manage cell_v2 map_cell0" been run?
        Is [api_database]/connection set in nova.conf?
        Is the cell0 database connection URL correct?
        Error: (pymysql.err.InternalError) (1054, u"Unknown column
                'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
        cell_mappings.created_at AS cell_mappings_created_at,
        cell_mappings.updated_at AS cell_mappings_updated_at,
        cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
        cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
        cell_mappings.transport_url AS cell_mappings_transport_url,
        cell_mappings.database_connection AS
        cell_mappings_database_connection, cell_mappings.disabled AS
        cell_mappings_disabled \nFROM cell_mappings \nWHERE
        cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
        {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
        (Background on this error at: http://sqlalche.me/e/2j85)

    Despite this error, the command actually exits zero, so kolla does not notice
    the failure.

    This change reorders these two commands to run the api_db sync before the db
    sync.

    Change-Id: I0514c02aaee1a413a7979c3a426e0484df839861
    Closes-Bug: #1832860

Changed in kolla:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/667938

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/667939

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (stable/stein)

Reviewed: https://review.opendev.org/667938
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=09226a59866e5ace7d573bac2e0772109a2554a0
Submitter: Zuul
Branch: stable/stein

commit 09226a59866e5ace7d573bac2e0772109a2554a0
Author: Mark Goddard <email address hidden>
Date: Mon Jun 17 10:49:21 2019 +0100

    Run nova api_db sync before db sync

    Backport: Stein, Rocky

    Since cells v2 was introduced, nova operators must run two commands to
    migrate the database schemas of nova's databases - nova-manage api_db
    sync and nova-manage db sync. It is necessary to run them in this order,
    since the db sync may depend on schema changes made to the api
    database in the api_db sync. Executing the db sync first may fail,
    for example with the following seen in a Queens to Rocky upgrade:

        nova-manage db sync
        ERROR: Could not access cell0.
        Has the nova_api database been created?
        Has the nova_cell0 database been created?
        Has "nova-manage api_db sync" been run?
        Has "nova-manage cell_v2 map_cell0" been run?
        Is [api_database]/connection set in nova.conf?
        Is the cell0 database connection URL correct?
        Error: (pymysql.err.InternalError) (1054, u"Unknown column
                'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
        cell_mappings.created_at AS cell_mappings_created_at,
        cell_mappings.updated_at AS cell_mappings_updated_at,
        cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
        cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
        cell_mappings.transport_url AS cell_mappings_transport_url,
        cell_mappings.database_connection AS
        cell_mappings_database_connection, cell_mappings.disabled AS
        cell_mappings_disabled \nFROM cell_mappings \nWHERE
        cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
        {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
        (Background on this error at: http://sqlalche.me/e/2j85)

    Despite this error, the command actually exits zero, so kolla does not notice
    the failure.

    This change reorders these two commands to run the api_db sync before the db
    sync.

    Change-Id: I0514c02aaee1a413a7979c3a426e0484df839861
    Closes-Bug: #1832860
    (cherry picked from commit 84a07102e2db92333ce416a520151f1a1856f162)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (stable/rocky)

Reviewed: https://review.opendev.org/667939
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=b68632ff7e1e782712d94af6005c6348f6b1c640
Submitter: Zuul
Branch: stable/rocky

commit b68632ff7e1e782712d94af6005c6348f6b1c640
Author: Mark Goddard <email address hidden>
Date: Mon Jun 17 10:49:21 2019 +0100

    Run nova api_db sync before db sync

    Backport: Stein, Rocky

    Since cells v2 was introduced, nova operators must run two commands to
    migrate the database schemas of nova's databases - nova-manage api_db
    sync and nova-manage db sync. It is necessary to run them in this order,
    since the db sync may depend on schema changes made to the api
    database in the api_db sync. Executing the db sync first may fail,
    for example with the following seen in a Queens to Rocky upgrade:

        nova-manage db sync
        ERROR: Could not access cell0.
        Has the nova_api database been created?
        Has the nova_cell0 database been created?
        Has "nova-manage api_db sync" been run?
        Has "nova-manage cell_v2 map_cell0" been run?
        Is [api_database]/connection set in nova.conf?
        Is the cell0 database connection URL correct?
        Error: (pymysql.err.InternalError) (1054, u"Unknown column
                'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
        cell_mappings.created_at AS cell_mappings_created_at,
        cell_mappings.updated_at AS cell_mappings_updated_at,
        cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
        cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
        cell_mappings.transport_url AS cell_mappings_transport_url,
        cell_mappings.database_connection AS
        cell_mappings_database_connection, cell_mappings.disabled AS
        cell_mappings_disabled \nFROM cell_mappings \nWHERE
        cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
        {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
        (Background on this error at: http://sqlalche.me/e/2j85)

    Despite this error, the command actually exits zero, so kolla does not notice
    the failure.

    This change reorders these two commands to run the api_db sync before the db
    sync.

    Change-Id: I0514c02aaee1a413a7979c3a426e0484df839861
    Closes-Bug: #1832860
    (cherry picked from commit 84a07102e2db92333ce416a520151f1a1856f162)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Mark Goddard (<email address hidden>) on branch: master
Review: https://review.opendev.org/665627
Reason: Squashed these changes into https://review.opendev.org/#/c/665626.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla 8.0.0.0rc2

This issue was fixed in the openstack/kolla 8.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/665626
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e99937c9a95f7049a53bcae7beff9a2a00c5889c
Submitter: Zuul
Branch: master

commit e99937c9a95f7049a53bcae7beff9a2a00c5889c
Author: Mark Goddard <email address hidden>
Date: Mon Jun 17 09:56:15 2019 +0100

    Exit 1 when db sync runs before api_db sync

    Since cells v2 was introduced, nova operators must run two commands to
    migrate the database schemas of nova's databases - nova-manage api_db
    sync and nova-manage db sync. It is necessary to run them in this order,
    since the db sync may depend on schema changes made to the api database
    in the api_db sync. Executing the db sync first may fail, for example
    with the following seen in a Queens to Rocky upgrade:

    nova-manage db sync
    ERROR: Could not access cell0.
    Has the nova_api database been created?
    Has the nova_cell0 database been created?
    Has "nova-manage api_db sync" been run?
    Has "nova-manage cell_v2 map_cell0" been run?
    Is [api_database]/connection set in nova.conf?
    Is the cell0 database connection URL correct?
    Error: (pymysql.err.InternalError) (1054, u"Unknown column
            'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
    cell_mappings.created_at AS cell_mappings_created_at,
    cell_mappings.updated_at AS cell_mappings_updated_at,
    cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
    cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
    cell_mappings.transport_url AS cell_mappings_transport_url,
    cell_mappings.database_connection AS
    cell_mappings_database_connection, cell_mappings.disabled AS
    cell_mappings_disabled \nFROM cell_mappings \nWHERE
    cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
    {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
    (Background on this error at: http://sqlalche.me/e/2j85)

    Despite this error, the command actually exits zero, so deployment tools
    are likely to continue with the upgrade, leading to issues down the
    line.

    This change modifies the command to exit 1 if the cell0 sync fails.

    This change also clarifies this ordering in the upgrade and nova-manage
    documentation, and adds information on exit codes for the command.

    Change-Id: Iff2a23e09f2c5330b8fc0e9456860b65bd6ac149
    Closes-Bug: #1832860

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla 7.0.4

This issue was fixed in the openstack/kolla 7.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla 9.0.0.0rc1

This issue was fixed in the openstack/kolla 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.