juju-check-wait fails when a machine is down

Bug #1906530 reported by Sheila Miguez on 2020-12-02
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mojo: Continuous Delivery for Juju

Bug Description

This happened when I was running a manifest.

020-12-02 14:40:02 [INFO] Retrieve the spec's manifest
2020-12-02 14:40:02 [INFO] Running 'mojo run -m manifests/manifest-verify'
2020-12-02 14:40:02 [INFO] Checking Juju status
2020-12-02 14:40:02 [INFO] Waiting up to 1800 seconds for environment to become ready (not blocked or in maintenance)
2020-12-02 14:40:03 [ERROR] Unknown error
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/mojo/cli.py", line 684, in run_with_args
  File "/usr/lib/python3/dist-packages/mojo/utils.py", line 380, in wrapped
    return method(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/mojo/cli.py", line 352, in run_from_manifest
    manifest.run(project, workspace, args.stage, args.interactive)
  File "/usr/lib/python3/dist-packages/mojo/manifest.py", line 126, in run
    phase_name.run(project, workspace, stage)
  File "/usr/lib/python3/dist-packages/mojo/phase.py", line 1094, in run
    timeout=status_timeout, wait_for_steady=True, max_wait=max_wait, wait_for_workload=wait_for_workload
  File "/usr/lib/python3/dist-packages/mojo/juju/status.py", line 496, in check_and_wait
    if self.ready(): # self.ready() will raise exceptions on error states
  File "/usr/lib/python3/dist-packages/mojo/juju/status.py", line 555, in ready
    machines_ready = self.machines_ready()
  File "/usr/lib/python3/dist-packages/mojo/juju/status.py", line 540, in machines_ready
    if not self._check_ready(machine["machine-status"]["current"], "machine {} machine-status".format(num)):
KeyError: 'current'

I did a manual juju status, which shows one of my machines as down. nova list shows it in a Shutdown state, so maybe you can reproduce this in a deployed environment by shutting down a machine.

Sheila Miguez (codersquid) wrote :

Addendum. I've restarted the machine and I'm still getting this even after juju status shows that it is back up.

summary: - checking status fails when a machine is down
+ juju-check-wait fails when a machine is down
Tom Haddon (mthaddon) wrote :

Confirmed it was working again once the machine was back up, but the original bug still stands.

Sheila Miguez (codersquid) wrote :

It is now working again. I did not manage to do a juju status --format yaml while it wasn't working to see if 'current' was missing from the yaml. it's there now.

Sheila Miguez (codersquid) wrote :

this is machine 3.
$ juju status --format yaml 3 |grep -A3 machine-status
      message: SHUTOFF
      since: 02 Dec 2020 16:37:51Z

I was able to reproduce this by stopping the machine with nova. The snippet of yaml from juju status indeed shows that 'current' is missing.

I do not know if this is expected behavior from juju or not. If it is, then obvs the check will need to account for that. It seems like bad behavior from juju.

Sheila Miguez (codersquid) wrote :

Bringing it back up, it looks like juju status takes a little bit of time to catch up.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers