Config command failed because of bogus upgrade in progress

Bug #1669834 reported by Curtis Hovey
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Unassigned

Bug Description

As seen at
   http://reports.vapour.ws/releases/issue/58b99401749a562867f5003f

We know the juju under test is the latest juju, there is nothing to upgrade to. the call to set a charm just deployed failed because of a bogus upgrade in progress. Juju allowed the deploy and status does not indicate what juju thinks it us upgrading too.

08:22:27 ERROR cmd supercommand.go:458 upgrade in progress (upgrade in progress)
2017-03-01 08:22:27 ERROR Command '('juju', '--show-log', 'config', '-m', 'passuser_controller:super-permissions', 'dummy-source', 'token=one')' returned non-zero exit status 1
Traceback (most recent call last):

Changed in juju:
milestone: 2.2-alpha1 → 2.2-beta1
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Tim Penhey (thumper) wrote :

The error message isn't great.

Effectively what we are trying to do is to block user commands until the agent has checked that it is in a good state. We do that right now by having an upgrade in progress flag which is cleared when the upgrade has finished.

Another potential solution is to that user API connections just enter a select loop with the apiserver stopping, and a channel that the API server closes when the agent is in a good state.
This would stop the bogus error being returned.

tags: added: intermittent-failure
Changed in juju:
importance: High → Medium
milestone: 2.2-rc1 → none
tags: added: usability
tags: added: apiserver
Revision history for this message
Anastasia (anastasia-macmood) wrote :

We have actually started working on the solution in December 2016. Essentially, the plan was to have model have a state which could be something more descriptive like "updating", "coming up, "migrating", etc... The api calls and user commands will be blocked until the model reports active state but the error message that user will see during "busy" stage will be indicative of why the model is actually un-usable yet.

This work has been de-prioritized and stalled since.

Revision history for this message
Casey Marshall (cmars) wrote :
Download full text (7.7 KiB)

Today I received a nagios alert which seems to have been caused by nrpe getting restarted while it got polled (that's the best explanation I can rationalize it with).

The nrpe unit agent log showed:

2017-10-23 12:05:46 ERROR juju.api monitor.go:59 health ping timed out after 30s
2017-10-23 12:05:46 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
2017-10-23 12:06:01 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:06:15 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:06:47 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:07:13 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: unknown model: "49b0ef05-5460-4ede-8210-824563552d39" (model not found)
2017-10-23 12:07:25 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: login for "unit-nrpe-103" blocked because upgrade in progress
2017-10-23 12:07:30 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: login for "unit-nrpe-103" blocked because upgrade in progress
2017-10-23 12:07:36 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: login for "unit-nrpe-103" blocked because upgrade in progress
2017-10-23 12:07:41 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: login for "unit-nrpe-103" blocked because upgrade in progress
2017-10-23 12:07:53 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:08:02 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:08:11 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:08:20 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:08:29 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:08:37 ERROR juju.worker.dependency engine.go:546 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
2017-10-23 12:09:05 DEBUG config-changed Hit:1 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial InRelease
2017-10-23 12:09:05 DEBUG config-changed Get:2 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
2017-10-23 12:09:05 DEBUG config-changed Get:3 http://security.ubuntu.com/ubuntu xe...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1669834] Re: Config command failed because of bogus upgrade in progress
Download full text (8.7 KiB)

Until juju has finished initialization and determined that there isn't an
upgrade, we reject connections with "upgrade in progress". So that isn't an
actual upgrade vs just taking a while to initialize. Looking at the
timestamps, I'm a little surprised for it to take a full minute for it to
be available, but otherwise it isn't too surprising.

John
=:->

On Tue, Oct 24, 2017 at 2:07 AM, Casey Marshall <
<email address hidden>> wrote:

> Today I received a nagios alert which seems to have been caused by nrpe
> getting restarted while it got polled (that's the best explanation I can
> rationalize it with).
>
> The nrpe unit agent log showed:
>
> 2017-10-23 12:05:46 ERROR juju.api monitor.go:59 health ping timed out
> after 30s
> 2017-10-23 12:05:46 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: api connection
> broken unexpectedly
> 2017-10-23 12:06:01 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:06:15 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:06:47 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:07:13 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> unknown model: "49b0ef05-5460-4ede-8210-824563552d39" (model not found)
> 2017-10-23 12:07:25 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> login for "unit-nrpe-103" blocked because upgrade in progress
> 2017-10-23 12:07:30 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> login for "unit-nrpe-103" blocked because upgrade in progress
> 2017-10-23 12:07:36 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> login for "unit-nrpe-103" blocked because upgrade in progress
> 2017-10-23 12:07:41 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> login for "unit-nrpe-103" blocked because upgrade in progress
> 2017-10-23 12:07:53 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:08:02 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:08:11 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:08:20 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error: cannot open api:
> try again (try again)
> 2017-10-23 12:08:29 ERROR juju.worker.dependency engine.go:546
> "api-caller" manifold worker returned unexpected error...

Read more...

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Based on previous comments, I am closing this report.

It looks like the behavior of Juju nowdays is correct - it will try to figure out if it needs to or can upgrade on bootstrap and, in this particular instances, it may have taken longer than usual.
Also, since the report has been filed, we have improved how Juju behaves when it determines if the upgrade is needed, what commands and API calls can be made during the 'upgrade' itself, how we are reporting what we are doing back to the user via Juju status. Whilst it is difficult to pinpoint the exact change that took place to improve this experience, we have not heard about any similar inconveniences for years.

Considering how old this report is and it is unclear how we can further improve this particular experience, it's not helpful to keep this open. If there are still some concerns, please file a report with logs, reproduction steps and desired outcomes.

Changed in juju:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.