nova-scheduler can fail to start if keystone setup take too long
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gnocchi Charm |
New
|
Undecided
|
Unassigned | ||
MySQL InnoDB Cluster Charm |
Invalid
|
High
|
Unassigned | ||
OpenStack Nova Cloud Controller Charm |
Confirmed
|
High
|
Unassigned |
Bug Description
Keystone fails witha traceback on accessing database.
Original issue is in nova-cloud-
n-c-c log shows plenty of keystone 500 errors
And keystone_error.log shows plenty of tracebacks
oslo_db.
[SQL: SELECT project.id AS project_id, project.name AS project_name, project.domain_id AS project_domain_id, project.description AS project_
This is the first time we encountered it in our tests.
Artifacts from the test run
https:/
I am guessing that this is a cloud that has significant resource contention as the problem is down so the nova-scheduler systemd service giving up waiting for keystone to be ready. This can be seen by looking at https:/ /oil-jenkins. canonical. com/artifacts/ 842595bd- b750-405e- b7a7-2fee838ebd 90/generated/ generated/ openstack/ juju-crashdump- openstack- 2021-06- 08-03.31. 59.tar. gz
The nova-cc/2 unit show the nova-scheduler that failed to start and that reports that keystone is returning 500s:
nova-cloud- controller_ 2/var/log/ nova/nova- scheduler. log exceptions. http.InternalSe rverError: Internal Server Error (HTTP 500)
2021-06-08 00:20:18.011 109392 ERROR nova keystoneauth1.
The timestamp seems to tie up with keystone being unable to access MySQL due to an ACL issue:
keystone_ 2/var/log/ apache2/ keystone_ error.log exc.Operational Error: (pymysql. err.Operational Error) (1045, "Access denied for user 'keystone' @'192.168. 33.92' (using password: YES)")
2021-06-08 00:21:06.464096 sqlalchemy.
It seems the grant was made shortly after this: cluster_ 0/var/log/ juju/machine- 0-lxd-7. log
mysql-innodb-
2021-06-08 00:21:06 DEBUG juju-log db-router:153: Grant does NOT exist for host '192.168.33.92' on db 'keystone'
2021-06-08 00:21:06 DEBUG juju-log db-router:153: Grant exists for host '192.168.33.92' on db 'keystone'
The journal file on nova-cc/2 shows the scheduler gave up ~50s before keystone was granted db access: controller_ 2/var/log/ journal/ 50fc85977028476 087ad98d996a6f0 07/system. journal | grep 'Failed to start OpenStack Compute Scheduler'
$ journalctl --utc --file nova-cloud-
Jun 08 00:20:18 juju-695a4b-4-lxd-8 systemd[1]: Failed to start OpenStack Compute Scheduler.
(I notice that the error included in the bug description is a red herring it shows that at the keystone was trying to access the local sqlite db. This is quite normal when standing up a new cloud as the relation with MySQL has not yet formed and the mysql config is not in the service conf file so the service falls back to its default of using a local sqlite db.
)