Juju status fails due to timeout getting MAAS API version

Bug #1908102 reported by Michael Skalka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth
2.9
Fix Released
High
Ian Booth

Bug Description

As seen during this test run: https://solutions.qa.canonical.com/testruns/testRun/25c338a2-43e6-447e-9e1a-c3c31fd0a682
Full artifcats here: https://oil-jenkins.canonical.com/artifacts/25c338a2-43e6-447e-9e1a-c3c31fd0a682/index.html
Controller crashdump: https://oil-jenkins.canonical.com/artifacts/25c338a2-43e6-447e-9e1a-c3c31fd0a682/generated/generated/juju_maas_controller/juju-crashdump-controller-2020-12-13-17.48.03.tar.gz

A call to `juju status` during execution of juju-wait (a service that polls status checking for certain condition) resulted in this error:

...
ERROR:root:ERROR Get "http://10.X.X.X:80/MAAS/api/2.0/version/": proxyconnect tcp: dial tcp 91.X.X.X:3128: i/o timeout

ERROR:root:juju status --format=json failed: 1
...

This status call was being made continually (every 5-10 seconds) for the past 30-40 minutes, indicating that Juju could indeed talk to the MAAS server at some point seconds before the failed call. Juju should be falling back to cached information for these status calls whenever possible and only surfacing an error when it truly cannot give status about the model. The information from MAAS in a deployed model will likely not be changing over time.

Tags: status
Revision history for this message
Pen Gale (pengale) wrote :

We probably should be using cached information here, rather than hitting the provider API for each status.

Changed in juju:
status: New → Confirmed
importance: Undecided → High
milestone: none → 2.8.8
status: Confirmed → Triaged
Revision history for this message
Pen Gale (pengale) wrote :

Note that it looks like you are routing to MAAS through a proxy here. You might be able to work around this by talking directly to MAAS, avoiding soaking whatever box Juju is jumping through.

Revision history for this message
John A Meinel (jameinel) wrote :

As a general design issue, 'juju status' shouldn't be connecting to the underlying provider to return the status results.

Changed in juju:
milestone: 2.8.8 → 2.8.9
Changed in juju:
milestone: 2.8.9 → 2.8.10
Changed in juju:
milestone: 2.8.10 → 2.8.11
Revision history for this message
John A Meinel (jameinel) wrote :

We also got a field issue on an external cloud:

lihuiguo
21:41

Hi folks, I hit this issue on a customer cloud that juju status is working but juju status --format always returned 'timed out'. any thoughts of how to fix it? thanks

$ juju status --format=yaml
ERROR health ping timed out after 30s
{}
ERROR Get "http://172.18.247.21/MAAS/api/2.0/version/": EOF

$ juju export-bundle --filename bundle-$(date +%F).yaml
ERROR getting provider registry: Get "http://172.18.247.21/MAAS/api/2.0/version/": EOF

tags: added: status
Revision history for this message
John A Meinel (jameinel) wrote :

Both of those are signs of poor communication with the underlying provider, but that shouldn't impact status.

Revision history for this message
Ian Booth (wallyworld) wrote :

The server side facade was eagerly creating a storage registry and pool manager which hit the maas api, even though status was just reading data from mongo and never using those apis. We now lazily create the storage pool manager etc

https://github.com/juju/juju/pull/13027

Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.