Comment 6 for bug 1829933

Revision history for this message
Bart Wensley (bartwensley) wrote :

When controller-1 is rebooted at 15:01:52, the nova-api on controller-0 loses access to mariadb, which prevents any of the nova related CLIs (e.g. server list) from running. For example:
{"log":"2019-05-21 15:02:18.491 1 ERROR oslo_db.sqlalchemy.exc_filters [req-2c25a5f7-22c7-4736-8c06-5d8c10e5c159 53347a9c6a7340f5be77da8cd6f2b3d8 efe5dd8fd94c428fa1eac07d31dc4184 - default default] DBAPIError exception wrapped from (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') [SQL: u'SELECT instance_mappings.created_at AS instance_mappings_created_at, instance_mappings.updated_at AS instance_mappings_updated_at, instance_mappings.id AS instance_mappings_id, instance_mappings.instance_uuid AS instance_mappings_instance_uuid, instance_mappings.cell_id AS instance_mappings_cell_id, instance_mappings.project_id AS instance_mappings_project_id, instance_mappings.user_id AS instance_mappings_user_id, instance_mappings.queued_for_delete AS instance_mappings_queued_for_delete, cell_mappings_1.created_at AS cell_mappings_1_created_at, cell_mappings_1.updated_at AS cell_mappings_1_updated_at, cell_mappings_1.id AS cell_mappings_1_id, cell_mappings_1.uuid AS cell_mappings_1_uuid, cell_mappings_1.name AS cell_mappings_1_name, cell_mappings_1.transport_url AS cell_mappings_1_transport_url, cell_mappings_1.database_connection AS cell_mappings_1_database_connection, cell_mappings_1.disabled AS cell_mappings_1_disabled \\nFROM instance_mappings LEFT OUTER JOIN cell_mappings AS cell_mappings_1 ON instance_mappings.cell_id = cell_mappings_1.id \\nWHERE instance_mappings.instance_uuid = %(instance_uuid_1)s \\n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'instance_uuid_1': u'6e8a1aea-b316-4c5d-af48-5d32ffe0a443'}] (Background on this error at: http://sqlalche.me/e/2j85): InternalError: (1047, u'WSREP has not yet prepared node for application use')\n","stream":"stdout","time":"2019-05-21T15:02:18.492940446Z"}

The mariadb is running with a single node and is non-Primary, so is not functional:
{"log":"2019-05-21 15:01:58,543 - OpenStack-Helm Mariadb - INFO - 2019-05-21 15:01:58 139633481164544 [Note] WSREP: New cluster view: global state: 09649517-7ba1-11e9-8a1c-6e7cd8f18acb:42492, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3\n","stream":"stderr","time":"2019-05-21T15:01:58.543232866Z"}

There are other issues after that, but the trigger was the mariadb problem. I can see this log constantly coming out on both controllers:
2019-05-21T15:00:13.530 controller-0 OCF_dbmon(dbmon)[2903462]: info INFO: Unable to get mariadb password. Exiting.

This is due to a bug in the dbmon ocf script:
https://bugs.launchpad.net/starlingx/+bug/1826891

I am marking this LP as a duplicate of the dbmon issue (LP1826891). Once that issue is fixed, please re-test.