Comment 0 for bug 1755155

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

# Issue
I upgraded our juju controllers (3x HA, MAAS cloud) from 2.1.2 to 2.3.4 with `juju upgrade-juju -m controller --agent-version 2.3.4`.
This took about 15 min and the upgrade seemed to be successfull on the controller model.
Directly after this, 2 of the charms deployed in another model started reporting hook failures for all its units.
The charms in question are `nova-cloud-controller` and `nova-compute` deployed on a model called `openstack`

Please note that the juju environment and charms were working fine before the upgrade.
I have tested this exact upgrade scenario (2.1.2 to 2.3.4, 3x HA, same cloud, same base deployment of openstack) multiple times in staging previously without running into this issue.
Also, when looking at the logs for controller and the failing juju units, there are some seemingly related connection error messages which weren't there before upgrading.
So I think it is safe to say that this is not a charm issue, but a juju issue.

Any help would be appreciated.

# Logs
N.B. All the different units show the same error messages, i.e. nova-cloud-controller/0, nova-cloud-controller/1 and nova-cloud-controller/2 have the same messaging and exact same traceback in the logs.
Same goes for the controllers and nova-compute charm, therefore I only added one excerpt from each.

controller
`juju status -m controller --format yaml`: https://pastebin.com/kaxpuL8M
`juju ssh -m controller 0 'sudo less /var/log/juju/machine-0.log'`: https://pastebin.com/L3hFuZvb
All the controllers show some variation of these error messages. The IP's seem to correspond to connections from the juju controller to the nova-cloud-controller units.

nova-cloud-controller
`juju status -m openstack nova-cloud-controller --format yaml`: https://pastebin.com/uiS2BXBX
`juju ssh nova-cloud-controller/0 'sudo less /var/log/juju/unit-nova-cloud-controller-0.log'`: https://pastebin.com/GYKBASUu

nova-compute
`juju status -m openstack nova-compute --format yaml`: https://pastebin.com/6XJcLWNL
`juju ssh -m openstack nova-compute/0 'sudo less /var/log/juju/unit-nova-compute-0.log'`: https://pastebin.com/Pp4PymTa

# Troubleshooting steps
* Tried to restart the juju controllers
* Tried to restart the nova-cloud-controller and nova-compute juju services (i.e. jujud-unit-nova-cloud-controller-0.service, jujud-unit-nova-compute-0.service and etc)
* Tried to manually run relation-get command that the charms are failing on:
    * The ones for nova-cloud-controller actually work when run manually with `juju run`:
        $ juju run --unit nova-cloud-controller/0 'relation-get --format=json -r identity-service:28 - keystone/0'
        {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
        $ juju run --unit nova-cloud-controller/1 'relation-get --format=json -r identity-service:28 - keystone/0'
        {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
        $ juju run --unit nova-cloud-controller/2 'relation-get --format=json -r identity-service:28 - keystone/0'
        {"admin_token":"redacted","api_version":"2","auth_host":"keystone.maas","auth_port":"35357","auth_protocol":"http","private-address":"aa.bb.2.130","service_host":"keystone.maas","service_password":"redacted","service_port":"5000","service_protocol":"http","service_tenant":"services","service_tenant_id":"4baaf52f802a47fa8309b56c10b95e6c","service_username":"nova"}
    * However the ones for nova-compute time out:
        $ juju run --unit nova-compute/0 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-controller/0'
        ERROR timed out waiting for result from: unit nova-compute/0
        ...
        $ juju run --unit nova-compute/19 'relation-get --format=json -r cloud-compute:38 network_manager nova-cloud-controller/0'
        ERROR timed out waiting for result from: unit nova-compute/1

As noted above, running the command which is causing these hook failures manually works fine for the nova-cloud-controller units, but not for the nova-compute units.
I do not understand why the former is failing, however the latter is most likely failing as the relation in question is related to the nova-cloud-controller which is also failing.
Also, why is another unrelated production model getting hit by these issues when only the controller model is upgraded?

# Versions
Juju 2.1.2, 2.3.4
MAAS 2.1.2