# Issue
I upgraded our juju controllers (3x HA, MAAS cloud) from 2.1.2 to 2.3.4 with `juju upgrade-juju -m controller --agent-version 2.3.4`.
This took about 15 min and the upgrade seemed to be successfull on the controller model.
Directly after this, 2 of the charms deployed in another model started reporting hook failures for all its units.
The charms in question are `nova-cloud-controller` and `nova-compute` deployed on a model called `openstack`
Please note that the juju environment and charms were working fine before the upgrade.
I have tested this exact upgrade scenario (2.1.2 to 2.3.4, 3x HA, same cloud, same base deployment of openstack) multiple times in staging previously without running into this issue.
Also, when looking at the logs for controller and the failing juju units, there are some seemingly related connection error messages which weren't there before upgrading.
So I think it is safe to say that this is not a charm issue, but a juju issue.
Any help would be appreciated.
# Logs
N.B. All the different units show the same error messages, i.e. nova-cloud-controller/0, nova-cloud-controller/1 and nova-cloud-controller/2 have the same messaging and exact same traceback in the logs.
Same goes for the controllers and nova-compute charm, therefore I only added one excerpt from each.
controller
`juju status -m controller --format yaml`: https://pastebin.com/kaxpuL8M
`juju ssh -m controller 0 'sudo less /var/log/juju/machine-0.log'`: https://pastebin.com/L3hFuZvb
All the controllers show some variation of these error messages. The IP's seem to correspond to connections from the juju controller to the nova-cloud-controller units.
nova-cloud-controller
`juju status -m openstack nova-cloud-controller --format yaml`: https://pastebin.com/uiS2BXBX
`juju ssh nova-cloud-controller/0 'sudo less /var/log/juju/unit-nova-cloud-controller-0.log'`: https://pastebin.com/GYKBASUu
# Troubleshooting steps
* Tried to restart the juju controllers
* Tried to restart the nova-cloud-controller and nova-compute juju services (i.e. jujud-unit-nova-cloud-controller-0.service, jujud-unit-nova-compute-0.service and etc)
* Tried to manually run relation-get command that the charms are failing on:
* The ones for nova-cloud-controller actually work when run manually with `juju run`:
$ juju run --unit nova-cloud-controller/0 'relation-get --format=json -r identity-service:28 - keystone/0' {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
$ juju run --unit nova-cloud-controller/1 'relation-get --format=json -r identity-service:28 - keystone/0' {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
$ juju run --unit nova-cloud-controller/2 'relation-get --format=json -r identity-service:28 - keystone/0' {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
* However the ones for nova-compute time out:
$ juju run --unit nova-compute/0 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-controller/0'
ERROR timed out waiting for result from: unit nova-compute/0
...
$ juju run --unit nova-compute/19 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-controller/0'
ERROR timed out waiting for result from: unit nova-compute/1
As noted above, running the command which is causing these hook failures manually works fine for the nova-cloud-controller units, but not for the nova-compute units.
I do not understand why the former is failing, however the latter is most likely failing as the relation in question is related to the nova-cloud-controller which is also failing.
Also, why is another unrelated production model getting hit by these issues when only the controller model is upgraded?
# Issue controller` and `nova-compute` deployed on a model called `openstack`
I upgraded our juju controllers (3x HA, MAAS cloud) from 2.1.2 to 2.3.4 with `juju upgrade-juju -m controller --agent-version 2.3.4`.
This took about 15 min and the upgrade seemed to be successfull on the controller model.
Directly after this, 2 of the charms deployed in another model started reporting hook failures for all its units.
The charms in question are `nova-cloud-
Please note that the juju environment and charms were working fine before the upgrade.
I have tested this exact upgrade scenario (2.1.2 to 2.3.4, 3x HA, same cloud, same base deployment of openstack) multiple times in staging previously without running into this issue.
Also, when looking at the logs for controller and the failing juju units, there are some seemingly related connection error messages which weren't there before upgrading.
So I think it is safe to say that this is not a charm issue, but a juju issue.
Any help would be appreciated.
# Logs controller/ 0, nova-cloud- controller/ 1 and nova-cloud- controller/ 2 have the same messaging and exact same traceback in the logs.
N.B. All the different units show the same error messages, i.e. nova-cloud-
Same goes for the controllers and nova-compute charm, therefore I only added one excerpt from each.
controller /pastebin. com/kaxpuL8M juju/machine- 0.log'` : https:/ /pastebin. com/L3hFuZvb controller units.
`juju status -m controller --format yaml`: https:/
`juju ssh -m controller 0 'sudo less /var/log/
All the controllers show some variation of these error messages. The IP's seem to correspond to connections from the juju controller to the nova-cloud-
nova-cloud- controller controller --format yaml`: https:/ /pastebin. com/uiS2BXBX controller/ 0 'sudo less /var/log/ juju/unit- nova-cloud- controller- 0.log'` : https:/ /pastebin. com/GYKBASUu
`juju status -m openstack nova-cloud-
`juju ssh nova-cloud-
nova-compute /pastebin. com/6XJcLWNL juju/unit- nova-compute- 0.log'` : https:/ /pastebin. com/Pp4PymTa
`juju status -m openstack nova-compute --format yaml`: https:/
`juju ssh -m openstack nova-compute/0 'sudo less /var/log/
# Troubleshooting steps controller and nova-compute juju services (i.e. jujud-unit- nova-cloud- controller- 0.service, jujud-unit- nova-compute- 0.service and etc) controller actually work when run manually with `juju run`: controller/ 0 'relation-get --format=json -r identity-service:28 - keystone/0'
{"admin_ token": "redacted" ,"api_version" :"2","auth_ host":" keystone. maas"," auth_port" :"35357" ,"auth_ protocol" :"http" ,"private- address" :"aa.bb. 2.130", "service_ host":" keystone. maas"," service_ password" :"redacted" ,"service_ port":" 5000"," service_ protocol" :"http" ,"service_ tenant" :"services" ,"service_ tenant_ id":"4baaf52f80 2a47fa8309b56c1 0b95e6c" ,"service_ username" :"nova" } controller/ 1 'relation-get --format=json -r identity-service:28 - keystone/0'
{"admin_ token": "redacted" ,"api_version" :"2","auth_ host":" keystone. maas"," auth_port" :"35357" ,"auth_ protocol" :"http" ,"private- address" :"aa.bb. 2.130", "service_ host":" keystone. maas"," service_ password" :"redacted" ,"service_ port":" 5000"," service_ protocol" :"http" ,"service_ tenant" :"services" ,"service_ tenant_ id":"4baaf52f80 2a47fa8309b56c1 0b95e6c" ,"service_ username" :"nova" } controller/ 2 'relation-get --format=json -r identity-service:28 - keystone/0'
{"admin_ token": "redacted" ,"api_version" :"2","auth_ host":" keystone. maas"," auth_port" :"35357" ,"auth_ protocol" :"http" ,"private- address" :"aa.bb. 2.130", "service_ host":" keystone. maas"," service_ password" :"redacted" ,"service_ port":" 5000"," service_ protocol" :"http" ,"service_ tenant" :"services" ,"service_ tenant_ id":"4baaf52f80 2a47fa8309b56c1 0b95e6c" ,"service_ username" :"nova" } controller/ 0' controller/ 0'
* Tried to restart the juju controllers
* Tried to restart the nova-cloud-
* Tried to manually run relation-get command that the charms are failing on:
* The ones for nova-cloud-
$ juju run --unit nova-cloud-
$ juju run --unit nova-cloud-
$ juju run --unit nova-cloud-
* However the ones for nova-compute time out:
$ juju run --unit nova-compute/0 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-
ERROR timed out waiting for result from: unit nova-compute/0
...
$ juju run --unit nova-compute/19 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-
ERROR timed out waiting for result from: unit nova-compute/1
As noted above, running the command which is causing these hook failures manually works fine for the nova-cloud- controller units, but not for the nova-compute units. controller which is also failing.
I do not understand why the former is failing, however the latter is most likely failing as the relation in question is related to the nova-cloud-
Also, why is another unrelated production model getting hit by these issues when only the controller model is upgraded?
# Versions
Juju 2.1.2, 2.3.4
MAAS 2.1.2