Application status is not synced with all unit's status

Bug #2074195 reported by Judit Novak

This bug report will be marked for expiration in 36 days if no further activity occurs. (find out why)

6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
Undecided
Unassigned

Bug Description

When all units are in a certain state, the application state does not get automatically changed accordinly.

Example:

Model Controller Cloud/Region Version SLA Timestamp
test localhost-localhost localhost/localhost 3.4.3 unsupported 10:03:24Z

App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 2 opensearch 2/edge 120 no
opensearch-dashboards active 3 opensearch-dashboards 0 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no

Unit Workload Agent Machine Public address Ports Message
opensearch-dashboards/0 blocked idle 0 10.115.14.104 5601/tcp Opensearch connection is missing
opensearch-dashboards/1* blocked idle 1 10.115.14.165 5601/tcp Opensearch connection is missing
opensearch-dashboards/2 blocked idle 2 10.115.14.132 5601/tcp Opensearch connection is missing
opensearch/0* active idle 3 10.115.14.133 9200/tcp
opensearch/1 active idle 4 10.115.14.196 9200/tcp
self-signed-certificates/0* active idle 5 10.115.14.115

The application is reported to be 'active' while in reality all nodes are in a blocked state.

See more details (Juju/LXD/etc versions, waiting time in the 'blocked' status, etc.) on the corresponding demonstrative POC pipeline.

Note specific printouts higlighting that the application state was NOT manually overridden after all nodes got in the 'blocked' state.

https://github.com/canonical/opensearch-dashboards-operator/actions/runs/10108907878/job/27956272382#step:22:591

Application status information for application opensearch-dashboards:
 Status (application-status): active
    since: 2024-07-26 09:53:05
    status message: None

Full status information for unit opensearch-dashboards/0:
 Status (workload-status): blocked
    since: 2024-07-26 09:53:04
    status message: Opensearch connection is missing
 Status (agent-status): idle
    since: 2024-07-26 09:53:06
    status message: None

Full status information for unit opensearch-dashboards/1:
 Status (workload-status): blocked
    since: 2024-07-26 09:52:23
    status message: Opensearch connection is missing
 Status (agent-status): idle
    since: 2024-07-26 09:53:06
    status message: None

Full status information for unit opensearch-dashboards/2:
 Status (workload-status): blocked
    since: 2024-07-26 09:51:43
    status message: Opensearch connection is missing
 Status (agent-status): idle
    since: 2024-07-26 09:53:06
    status message: None

(In case you may want to verify where these outputs come from, feel free to take a look at the corresponding test code: https://github.com/canonical/opensearch-dashboards-operator/blob/bbb39812a248617ee07f377eda986e99458fe895/tests/integration/test_charm.py)

This bug is a problem, since:
 1) The application status is incorrect
 2) There is no (elegant) way to provide a secuirty measure or workaround for this in the charm code.
    (Note that the 'ops' library does NOT provide a way for units --even the leader-- to get information about other unit's status. Thus the leader can't adjust the 'app' state to all unit's state -- since this information can't be retrieved by the library. https://ops.readthedocs.io/en/latest/#ops.Application.status)

Thank you very much for looking into the issue.

Revision history for this message
Ian Booth (wallyworld) wrote :

Quick comment without digging too deeply. Juju expects the leader unit to set the overall application status. So if the app status needs to be set to blocked because the units are blocked, the leader unit needs to do that. The leader unit is the only entity which has the knowledge to collect all of the individual unit status and understand what that means in the context of the overall app status.
Note also that if the leader unit never sets the app status, then juju will attempt to guess the status by looking at the worst case values. But any update made by the leader unit will turn off this logic. The fact that the app status shows Active rather than Unknown implies the leader unit did set the app status at some stage. So therefore it is expected to correctly tell juju the status from that point forward; juju will not update it.

I'll mark as incomplete given that it seems like it could be a charm issue. If the leader unit is setting the app status and juju is not reflecting that, then feel free to reopen.

Changed in juju:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.