juju status sometimes fails with KeyError: 'current'

Bug #1847117 reported by Jonathan Hartley
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Undecided
Unassigned
Mojo: Continuous Delivery for Juju
Fix Released
Undecided
Unassigned

Bug Description

After deploying a snapstore service such as snapfind, running the 'productionize' post-deploy command:

    mojo run --manifest-file golive/manifest

will intermittently fail (about 1 in 8 times), with the output shown below.

The error is harmless, and retrying the 'mojo run' command generally then succeeds. However, I think it's important not to train people to ignore errors during deploys.

I'm not sure this is the right place to report the error. Maybe something in the snapstore manifest/etc is awry?

I'm not 100% certain whether the following output represents more than one error. Ideally, under normal operation, I would like to not see any error messages during a deploy.

Any insight appreciated. Thank you!

--

2019-10-07 15:26:55 [INFO] Checking Juju status
WARNING closing api sessions failed closing statusAPI: codec.handleResponse rpc.Header{RequestId:0x2, Request:rpc.Request{Type:"", Version:0, Id:"", Action:""}, Error:"", ErrorCode:"", Version:1} error: error handling response: json: cannot unmarshal object into Go struct field ApplicationStatus.err of type error
2019-10-07 15:27:05 [ERROR] Unknown error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 627, in run_with_args
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/mojo/utils.py", line 305, in wrapped
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 343, in run_from_manifest
    manifest.run(project, workspace, args.stage, args.interactive)
  File "/usr/lib/python2.7/dist-packages/mojo/manifest.py", line 126, in run
    phase_name.run(project, workspace, stage)
  File "/usr/lib/python2.7/dist-packages/mojo/phase.py", line 1099, in run
    juju_status.check_and_wait(timeout=status_timeout, wait_for_steady=True, max_wait=max_wait)
  File "/usr/lib/python2.7/dist-packages/mojo/juju/status.py", line 464, in check_and_wait
    if self.ready() and not wait_for_steady:
  File "/usr/lib/python2.7/dist-packages/mojo/juju/status.py", line 509, in ready
    applications_ready = self.applications_ready()
  File "/usr/lib/python2.7/dist-packages/mojo/juju/status.py", line 444, in applications_ready
    if not self._check_ready(application['application-status']['current'], name):
KeyError: 'current'

Related branches

Revision history for this message
Tom Haddon (mthaddon) wrote :

This seems like it's a failure message from juju itself (basically from `juju status --format=yaml`). We could potentially deal with this a bit more gracefully in terms of the message we're displaying, but short of retrying we'd still need to consider this an error.

Revision history for this message
Jonathan Hartley (tartley) wrote :

Tom, Thanks for the clarity. I'm still getting up to speed on how all this works.

I'll close this for now then, and attempt to reproduce with a raw juju call as you describe, and re-file this as a Juju bug.

Changed in mojo:
status: New → Invalid
Revision history for this message
Richard Harding (rharding) wrote :

Is this not a key error assuming a key exists when it might no yet?

    if not self._check_ready(application['application-status']['current'], name):

It'd be good to know what the response actually was so we can see did status not return? Did it return something unexpected? etc.

Changed in juju:
status: New → Triaged
status: Triaged → Incomplete
Revision history for this message
Jonathan Hartley (tartley) wrote :

During a snapfind deploy, I polled `juju status --format=yaml' every second. This revealed transient instances of:

    application-status: {}

That would provoke the traceback seen above.

I'm hazy about what the source at the bottom of that traceback is (ie "dist-packages/mojo/juju/status.py") Is that directory Mojo's vendored version of Juju? Or Mojo's internal package for interfacing with Juju? (I ask because that seems to determine who I should pester about this)

Revision history for this message
Richard Harding (rharding) wrote :

That's in here:
https://bazaar.launchpad.net/~mojo-maintainers/mojo/trunk/view/head:/mojo/juju/status.py#L444

So I would suggest that the code in there should be resilient to the object not being populated during some windows.

Changed in juju:
status: Incomplete → Invalid
Changed in mojo:
status: Invalid → New
Revision history for this message
Tom Haddon (mthaddon) wrote :

Thanks for catching that, @rharding. I've added a fix and it's been released in 0.5.1-7. Marking this as fix released.

Changed in mojo:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.