Comment 1 for bug 2043500

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

At the same time as the error occurred in the nova-conductor log, the keystone charm was saying:

2023-11-09 21:01:25 DEBUG unit.keystone/0.juju-log server.go:325 ha:287: cluster peers are in the following states: dict_values(['NOTREADY', 'NOTREADY'])
2023-11-09 21:01:25 DEBUG unit.keystone/0.juju-log server.go:325 ha:287: Some units are not ready
2023-11-09 21:01:25 INFO unit.keystone/0.juju-log server.go:325 ha:287: Keystone charm unit not ready - deferring identity-relation updates

keystone/0 shut down shortly later

[Thu Nov 09 21:01:26.068983 2023] [mpm_event:notice] [pid 188079:tid 140702016874368] AH00492: caught SIGWINCH, shutting down gracefully

keystone_2 came on line at 21:02
keystone_0 came on line at 20:46
keystone_1 came on line at 20:56

haproxy logs for the keystone units around 21:01 indicate

keystone_0 was up.
keystone_1 was down.
keystone_2 was up.

---

It seems that nova-conductor just isn't resilient enough when keystone is coming up. It may be that keystone should not be handing out creds, but this is probably a non-tls -> tls bootstrap issue when vault is unsealed.

As this only tends to happen at deployment time, and can be worked around (restart the service), this is not a high priority bug.

In terms of a fix, it's probably necessary that the nova-cc charm attempt to restart nova-conductor if it sees it is 'down' during an update-status hook, or the package just continually tries to restart it.