charm hook failures after controller model upgrade from 2.1.2 to 2.3.4
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Incomplete
|
High
|
Unassigned |
Bug Description
# Quick summary
After upgrading a controller from 2.1 => 2.2 or newer, it is necessary to upgrade the model agents as well, or you may get errors about "unexpected end of JSON input".
# Issue
I upgraded our juju controllers (3x HA, MAAS cloud) from 2.1.2 to 2.3.4 with `juju upgrade-juju -m controller --agent-version 2.3.4`.
This took about 15 min and the upgrade seemed to be successfull on the controller model.
Directly after this, 2 of the charms deployed in another model started reporting hook failures for all its units.
The charms in question are `nova-cloud-
Please note that the juju environment and charms were working fine before the upgrade.
I have tested this exact upgrade scenario (2.1.2 to 2.3.4, 3x HA, same cloud, same base deployment of openstack) multiple times in staging previously without running into this issue.
Also, when looking at the logs for controller and the failing juju units, there are some seemingly related connection error messages which weren't there before upgrading.
So I think it is safe to say that this is not a charm issue, but a juju issue.
Any help would be appreciated.
# Logs
N.B. All the different units show the same error messages, i.e. nova-cloud-
Same goes for the controllers and nova-compute charm, therefore I only added one excerpt from each.
## Controllers
`juju status -m controller --format yaml`: https:/
`juju ssh -m controller 0 'sudo less /var/log/
All the controllers show some variation of these error messages. The IP's seem to correspond to connections from the juju controller to the nova-cloud-
## nova-cloud-
`juju status -m openstack nova-cloud-
`juju ssh nova-cloud-
## nova-compute
`juju status -m openstack nova-compute --format yaml`: https:/
`juju ssh -m openstack nova-compute/0 'sudo less /var/log/
# Troubleshooting steps
* Tried to restart the juju controllers one by one
* Tried to restart the nova-cloud-
* Running the relation-get command manually for nova-cloud-
$ juju run --unit nova-cloud-
{"admin_
$ juju run --unit nova-cloud-
{"admin_
$ juju run --unit nova-cloud-
{"admin_
* Running the relation-get command manually for nova-compute:
$ juju run --unit nova-compute/0 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-
ERROR timed out waiting for result from: unit nova-compute/0
...
$ juju run --unit nova-compute/19 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-
ERROR timed out waiting for result from: unit nova-compute/1
As noted above, running the command which is causing these hook failures manually works fine for the nova-cloud-
I do not understand why the former is failing, however the latter is most likely failing as the relation in question is related to the nova-cloud-
# Versions
Juju 2.1.2, 2.3.4
MAAS 2.1.2
description: | updated |
description: | updated |
description: | updated |
description: | updated |
If you have very large configuration, I think upgrading the agents in the model might fix some of the things like "unexpected end of JSON input".
IIRC, the issue is that in 2.3 (2.2?) we introduced a json/websocket library that can transmit responses in frames, that ended up not supported by older clients.
And looking at the status output, it does look like all the agents in the "openstack" model are running 2.1.2 still, while the controller is running 2.3.4.
That should be as simple as:
juju upgrade-juju -m openstack
(if necessary, you might pass --agent- version= 2.3.4)