Canonical Juju

pylibjuju `model.wait_for_idle` does not seem to align with `juju status`

Bug #2034562 reported by Mehdi B. on 2023-09-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Undecided	Caner Derici

Bug Description

Hello,

Given the following juju deployment:

status:
    app: blocked
    units:
        0: active / idle (since +120sec)
        1: active / idle (since +120sec)
        2: active / idle (since +120sec)

I would expect the following to succeed:
await ops_test.model.wait_for_idle(
apps=[app], status="active", timeout=1000, wait_for_exact_units=3, idle_period=120
)

But it does not and rather **consistently** hits a TimeoutError (1000sec) waiting for model, even though continuously reporting:
INFO juju.model:model.py:2618 Waiting for model:
  opensearch/1 [idle] active:
  opensearch/2 [idle] active:
  opensearch/3 [idle] active:

-----
The same issue happens **intermittently** even when the whole combination "app + unit" is fully active and for a duration greater than idle_period.

-----
juju: 2.9.44

Thank you

See original description

Mehdi B. (medib) on 2023-09-06

description:

updated

Mehdi B. (medib) on 2023-09-06

description:

updated

Vitaly Antonenko (anvial) on 2023-09-07

Changed in juju:
assignee:	nobody → Caner Derici (cderici)
status:	New → Triaged

Revision history for this message

Caner Derici (cderici) wrote on 2023-09-07:

Hi Mehdi, thanks for reporting this!

I'm not quite surprised about the first example where the status of the application seems to be "blocked", while the units are "active / idle". Wait_for_idle, unfortunately has no distinction of app vs units when it comes to waiting for a particular status. I.e., if you're waiting for a particular status, then both the application and also the units (if you're waiting for a particular number of units) need to be in that status. We're currently working on a new set of "wait_for" methods to provide a more granular control for this.

Your second example where you said you're getting a Timeout even when everything's active, is more concerning. And I'd like to see more details about that, on how to reproduce etc.

One thing to note here is that I see that you set the idle_period to 120 seconds. That means you want everything to be in the idle (with status=active) for 120 seconds before the wait_for_idle decides that it's done waiting. Unless you have a good reason for this, it might be a good idea to reduce that number (default is 15 seconds, I think), because during that 120 seconds the units might occasionally change into different states for maintenance/update purposes etc which would reset that 120sec timer. That might be the reason for your 1000 seconds timeout is getting triggered.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.