keystone can be active even when Service not bootstrapped and it leads to Timed out while waiting for units openstack-hypervisor

Bug #2067016 reported by Nobuto Murata
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Snap
Fix Released
High
Guillaume Boutry
Sunbeam Charms
Fix Released
High
Unassigned

Bug Description

Following the MAAS tutorial:
https://microstack.run/docs/multi-node-maas

keystone (and possibly others) can be "active" even when "Service not bootstrapped".

That can mislead Sunbeam to proceed with other steps and those steps can hit the timeout unnecessarily (time to wait for keystone and possibly others to be ready *after* workload status gets "active" + its own timeout).

$ sunbeam cluster deploy --manifest manifest.yaml
Timed out while waiting for units openstack-hypervisor/0,openstack-hypervisor/1,openstack-hypervisor/2 to be ready
Error: Timed out while waiting for units openstack-hypervisor/0,openstack-hypervisor/1,openstack-hypervisor/2 to be ready

$ juju show-status-log -m openstack keystone/0 --days 1 | tail
24 May 2024 03:57:24Z workload maintenance (bootstrap) Service not bootstrapped
24 May 2024 04:01:24Z workload active
24 May 2024 04:01:24Z workload maintenance (bootstrap) Service not bootstrapped
24 May 2024 04:07:12Z workload active
24 May 2024 04:07:12Z workload maintenance (bootstrap) Service not bootstrapped
24 May 2024 04:13:06Z workload active
24 May 2024 04:13:06Z workload maintenance (bootstrap) Service not bootstrapped
24 May 2024 04:18:55Z workload active
24 May 2024 04:18:55Z workload maintenance (bootstrap) Service not bootstrapped
24 May 2024 04:18:55Z workload active

$ juju show-status-log -m openstack-machines openstack-hypervisor/2 --days 1 | tail
24 May 2024 03:19:41Z juju-unit executing running identity-credentials-relation-joined hook for keystone/1
24 May 2024 03:20:04Z juju-unit executing running identity-credentials-relation-changed hook for keystone/1
24 May 2024 03:21:40Z juju-unit executing running identity-credentials-relation-joined hook for keystone/2
24 May 2024 03:22:03Z juju-unit executing running identity-credentials-relation-changed hook for keystone/2
24 May 2024 03:23:39Z juju-unit executing running nova-service-relation-joined hook for nova/1
24 May 2024 03:24:03Z juju-unit executing running nova-service-relation-changed hook for nova/1
24 May 2024 03:25:38Z juju-unit executing running nova-service-relation-joined hook for nova/2
24 May 2024 03:26:00Z juju-unit executing running nova-service-relation-changed hook for nova/2
24 May 2024 03:27:34Z juju-unit idle
24 May 2024 04:20:59Z workload active

^^^ there is a huge gap between 3:27 and 4:20, and during that period keystone signals it's active but "Service not bootstrapped".

Revision history for this message
Nobuto Murata (nobuto) wrote :
Changed in snap-openstack:
status: New → Confirmed
status: Confirmed → Triaged
importance: Undecided → High
tags: added: open-2188
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sunbeam-charms (main)
Changed in snap-openstack:
assignee: nobody → Guillaume Boutry (gboutry)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sunbeam-charms (main)

Reviewed: https://review.opendev.org/c/openstack/sunbeam-charms/+/921401
Committed: https://opendev.org/openstack/sunbeam-charms/commit/52377e84cfaab576c0fcc95b38e5d700d68df054
Submitter: "Zuul (22348)"
Branch: main

commit 52377e84cfaab576c0fcc95b38e5d700d68df054
Author: Guillaume Boutry <email address hidden>
Date: Wed Jun 5 19:14:44 2024 +0200

    [all] only publish active status at the end of the hook

    The status_compound by designed set the unit's status many times during
    hook execution. The unit's status is directly published to the
    controller. This leads to outside observers seeing active status (and
    lot of chatter around statuses) when the unit is in fact not ready.

    With this change, units will only publish an active status at the end of
    the hook execution. All other levels are still directly published to the
    controller.
    Units will no longer publish the WaitingStatus("no status yet"). This
    creates a lot of chatter and holds little value.

    Re-organize keystone __init__ not to publish false `Service not
    bootstrap` status.

    Closes-Bug: #2067016
    Change-Id: Ie73b95972a44833ba4509f8fd2c2f52ed476004d

tags: added: in-main
Revision history for this message
Guillaume Boutry (gboutry) wrote (last edit ):
Download full text (6.6 KiB)

This fix has been released to 2024.1/edge.

Keystone without the change:
juju show-status-log -n 500 keystone/0 --type workload
Time Type Status Message
05 Jun 2024 20:41:57+02:00 workload waiting installing agent
05 Jun 2024 20:42:36+02:00 workload waiting agent initialising
05 Jun 2024 20:42:44+02:00 workload maintenance installing charm software
05 Jun 2024 20:42:46+02:00 workload waiting no status set yet
05 Jun 2024 20:42:46+02:00 workload blocked (ingress-public) integration missing
05 Jun 2024 20:42:47+02:00 workload waiting no status set yet
05 Jun 2024 20:42:47+02:00 workload active
05 Jun 2024 20:42:47+02:00 workload blocked (ingress-public) integration missing
05 Jun 2024 20:42:47+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:47+02:00 workload blocked (database) integration missing
05 Jun 2024 20:42:47+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:48+02:00 workload waiting no status set yet
05 Jun 2024 20:42:48+02:00 workload active
05 Jun 2024 20:42:48+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:49+02:00 workload waiting no status set yet
05 Jun 2024 20:42:49+02:00 workload active
05 Jun 2024 20:42:49+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:50+02:00 workload waiting no status set yet
05 Jun 2024 20:42:50+02:00 workload active
05 Jun 2024 20:42:50+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:51+02:00 workload waiting no status set yet
05 Jun 2024 20:42:51+02:00 workload active
05 Jun 2024 20:42:51+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:52+02:00 workload waiting no status set yet
05 Jun 2024 20:42:52+02:00 workload active
05 Jun 2024 20:42:53+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:53+02:00 workload waiting no status set yet
05 Jun 2024 20:42:53+02:00 workload active
05 Jun 2024 20:42:54+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:54+02:00 workload waiting no status set yet
05 Jun 2024 20:42:55+02:00 workload active
05 Jun 2024 20:42:55+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:56+02:00 workload waiting no status set yet
05 Jun 2024 20:42:56+02:00 workload active
05 Jun 2024 20:42:56+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:57+02:00 workload waiting no status set yet
05 Jun 2024 20:42:57+02:00 workload active
05 Jun 2024 20:42:57+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:58+02:00 workload waiting no status set yet
05 Jun 2024 20:42:58+02:00 workload active
05 Jun 2024 20:42:58+02:00 workload waiting (ingress-public) integration incomplete
05 Jun 2024 20:42:59+02:00 workload waiting ...

Read more...

Changed in sunbeam-charms:
status: New → Fix Released
importance: Undecided → High
Changed in snap-openstack:
status: In Progress → Fix Released
Revision history for this message
Nobuto Murata (nobuto) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.