charm install hook fired on upgrade to juju 2.8.2

Bug #1890465 reported by Liam Young
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Heather Lanigan

Bug Description

A bug was recently reported against the nova-cloud-controller charm that nova services went offline after a juju upgrade *1. Looking at the unit logs it shows juju running the install hook after the agent was upgraded. I don't think the install hook should ever be called other than at install time. Below is the relevant part of the unit log (the full log can be found attached to Bug #1890399).

ERROR must restart: an agent upgrade is available
2020-08-04 17:38:52 INFO juju.cmd supercommand.go:54 running jujud [2.8.2 0 1d5703e8a6cef8b08c24132548644f43bb468b93 gc go1.14.6]
2020-08-04 17:38:52 DEBUG juju.cmd supercommand.go:55 args: []string{"/var/lib/juju/tools/unit-nova-cloud-controller-3/jujud", "unit", "--data-dir", "/var/lib/juju", "--unit-name", "nova-cloud-controller/3", "--debug"}
2020-08-04 17:38:52 DEBUG juju.agent agent.go:583 read agent config, format "2.0"
2020-08-04 17:38:52 INFO juju.cmd.jujud agent.go:138 setting logging config to "<root>=WARNING;unit=DEBUG"
2020-08-04 17:39:05 DEBUG juju-log 0 section(s) found
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/nova/nova.conf
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/nova/api-paste.ini
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/nova/vendor_data.json
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/haproxy/haproxy.cfg
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/apache2/sites-available/openstack_https_frontend.conf
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/memcached.conf
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/apache2/sites-enabled/wsgi-api-os-compute.conf
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/apache2/sites-enabled/wsgi-placement-api.conf
2020-08-04 17:39:05 INFO juju-log Registered config file: /etc/apache2/sites-enabled/wsgi-openstack-metadata.conf
2020-08-04 17:39:05 DEBUG juju-log Hardening function 'install'
2020-08-04 17:39:05 DEBUG juju-log No hardening applied to 'install'
2020-08-04 17:39:05 INFO juju-log DEPRECATION WARNING: Function configure_installation_source is being removed on/around 2017-07 : use charmhelpers.fetch.add_source() instead.
2020-08-04 17:39:06 INFO juju-log Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
2020-08-04 17:39:06 DEBUG install Reading package lists...

*1 https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1890399

Tags: upgrade-juju
Revision history for this message
Pen Gale (pengale) wrote :

This has come up in conversation, but we haven't commented yet. Apologies for the delay.

This is a valid issue, having to do with a race during the upgrade process where the uniter temporarily has access to out of date information about the status of the install hook.

Changed in juju:
status: New → Triaged
milestone: none → 2.8.2
importance: Undecided → Critical
importance: Critical → High
milestone: 2.8.2 → 2.8.3
importance: High → Critical
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The impact of this is potentially service-affecting for production clouds. ex. An unplanned ceph upgrade, or an unintentional neutron upgrade immediately following an upgrade to Juju 2.8.

The existing contract with charm authors is that the install hooking only ever fires exactly once: during the initial deployment.

While we should expect the hooks to be idempotent, it is not safe to assume that the charm's payload can be upgraded at an arbitrary time.

Revision history for this message
Pen Gale (pengale) wrote :

Agreed that this is a critical, potentially service impacting bug. I marked it for 2.8.3, but we may manage to sneak this is for 2.8.2 instead ...

tags: added: upgrade-juju
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

It's highly possible that this has the same root cause as 1890828:
https://bugs.launchpad.net/juju/+bug/1890828/comments/5

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@ichb1n, has this been seen outside of the nova-cloud-controller bug? Additional logs would be helpful. Especially for the machine running this unit.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Outside reproduction with:
https://github.com/openstack/charm-designate/blob/master/src/tests/bundles/bionic-train.yaml

So far I have not been able to reproduce in 6 runs or so

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

A potential fix for 1890828 has been landed in the 2.8 branch. It'd be great is someone could try to reproduce with the 2.8/edge snap.

Changed in juju:
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.8.3 → 2.8.2
assignee: nobody → Heather Lanigan (hmlanigan)
Revision history for this message
Ian Booth (wallyworld) wrote :

Marking as fix committed since a potential fix has been landed. As per comment #8, we can re-open if there's feedback there's still an issue.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.