unit(s) not upgrading packages after changing openstack-origin

Bug #2039604 reported by Gabriel Cocenza
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
New
Undecided
Unassigned
Ceph RADOS Gateway Charm
Confirmed
Undecided
Unassigned
OpenStack AODH Charm
Confirmed
Undecided
Unassigned
OpenStack Barbican Charm
Confirmed
Undecided
Unassigned
OpenStack Ceilometer Charm
Confirmed
Undecided
Unassigned
OpenStack Designate Charm
Confirmed
Undecided
Unassigned
OpenStack Octavia Charm
Confirmed
Undecided
Unassigned
OpenStack Placement Charm
Confirmed
Undecided
Unassigned
charms.openstack
Confirmed
Undecided
Unassigned

Bug Description

I'm in a cloud upgrading from victoria to wallaby and I noticed that one unit didn't get upgrade after changing the openstack-origin to cloud:focal-wallaby (action-managed-upgrade was set to False).

The unit that didn't upgrade is active idle for some hours and the packages version didn't change:

https://pastebin.canonical.com/p/jf293y3dYV/

If I run `juju run-action barbican/2 openstack-upgrade` then after the unit is stable, the package is upgraded.

The same pattern happened on charm-aodh, so maybe it's a problem with reactive openstack-charms

Tags: soleng-561
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

The workaround for ceph-radosgw was manually changing /etc/apt/sources.list.d/cloud-archive.list to wallaby in the leader unit since there is no openstack-upgrade action and pausing and resume was not enough to get the packages upgrade.

summary: - leader unit not upgrading packages after openstack-upgrade
+ single unit not upgrading packages after changing openstack-origin
description: updated
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote : Re: single unit not upgrading packages after changing openstack-origin

all units on charm-octavia didn't get upgrade

summary: - single unit not upgrading packages after changing openstack-origin
+ unit(s) not upgrading packages after changing openstack-origin
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Gabriel

Did you observe this behaviour on all the charms that you've added to this bug?

* Please could you add charm logs and juju machine logs for all the units of the applications/charms affected, and the rough time *when* the "juju config <app> openstack-origin=cloud:focal-wallaby" was run. This is so we can see when the config-changed hook ran for the units in question.
* Please could provide the version of Juju being used and/or a juju status output

Thanks

Changed in charm-ceph-radosgw:
status: New → Incomplete
Changed in charm-aodh:
status: New → Incomplete
Changed in charm-barbican:
status: New → Incomplete
Changed in charm-designate:
status: New → Incomplete
Changed in charm-octavia:
status: New → Incomplete
Changed in charm-placement:
status: New → Incomplete
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

Hi Alex.

Yes, I've observed this behavior on all charms that I've added.

Isn't enough the juju-status and the sos-report attached of barbican in the bug description?

I didn't collect the logs for other charms because the behavior was very similar.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Gabriel

No, to understand what's going on the units' debug logs and the units' juju machine logs along with when the config command is run *is* needed. Anyone debugging this needs to be able to match up when the config command was made, to link it to the machine log which will indicate when the config-changed hook ran to link it to the debug log for that config changed hook. Then it would be possible to rule out whether config-changed didn't run, but whether the value changed from the perspective of the charm, or whether the charm took no action on the changed value.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Placement Charm because there has been no activity for 60 days.]

Changed in charm-placement:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Octavia Charm because there has been no activity for 60 days.]

Changed in charm-octavia:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Ceph RADOS Gateway Charm because there has been no activity for 60 days.]

Changed in charm-ceph-radosgw:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Designate Charm because there has been no activity for 60 days.]

Changed in charm-designate:
status: Incomplete → Expired
Revision history for this message
Aliaksandr Vasiuk (valexby) wrote :
Download full text (6.2 KiB)

Hi,

I faced the same issue today during Control Plane upgrades from Victoria to Wallaby on a list of services:
- aodh
- barbican
- ceilometer
- designate
- placement

Environment is:
* Focal-Victoria cloud during upgrade to Focal-Wallaby
* Juju 2.9.45
* All charms were refreshed to their victoria/latest on the 29th of May.
* The juju status --relations output: https://pastebin.ubuntu.com/p/p4SNTnnNpk/
* The sanized bundle is here: https://pastebin.canonical.com/p/4fJ3JXrkTR/

Steps to reproduce:
1. I don't know if that matters, but the cloud was upgraded from Ussuri to Victoria the last week. So maybe you will not be able to reproduce just deploying Victoria cloud and upgrading to Wallaby.
2. Upgrading all at once. action-managed-upgrade for all services is "false". Auto restarts for rabbit and ovn-central are enabled.
3. For every service:
3.1 Refresh charm from 'victoria/stable' to the new channel: 'wallaby/stable'
3.2 Change charm config 'openstack-origin' to 'cloud:focal-wallaby'

Note, that from Ussuri to Wallaby I upgraded the same way and all services were upgraded without any issue.

What I faced:

### aodh ###

aodh/0, aodh/1 succesfully upgraded.
aodh/2 stayed on Victoria.
Worked around with:
juju run-action aodh/2 openstack-upgrade

### barbican ###

barbican/1, barbican/2 succesfully upgraded.
barbican/0 stayed on Victoria.
Worked around with:
juju run-action barbican/0 openstack-upgrade

### ceilometer ###
ceilometer/0, ceilometer/1 succesfully upgraded.
ceilometer/2 stayed on Victoria
Worked around didn't work:
```
$ juju run-action --wait ceilometer/2 openstack-upgrade
unit-ceilometer-2:
  UnitId: ceilometer/2
  id: "6455"
  results:
    outcome: no upgrade available.
  status: completed
  timing:
    completed: 2024-06-03 10:05:37 +0000 UTC
  ...

Read more...

Revision history for this message
Aliaksandr Vasiuk (valexby) wrote :
Revision history for this message
Andrea Ieri (aieri) wrote :

Marking this bug back to new for the services mentioned by Alex. The attached logs will hopefully be sufficient for evaluating this bug.

Changed in charm-aodh:
status: Incomplete → New
Changed in charm-barbican:
status: Incomplete → New
Changed in charm-designate:
status: Expired → New
Changed in charm-placement:
status: Expired → New
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I've set the charms' bug status to confirmed as more than one person has experienced the bug. However, I've not triaged it (yet) as it's not clear whether it's Juju or a strange charms.reactive/charms.openstack upgrade bug. It could be due to:

* charms.openstack at the stable/<version> not having the correct metadata to determine whether a charm needs to upgrade openstack.
* a bug in the logic for openstack upgrades in charms.openstack.
* Juju not running the config-changed hook?
* <something else>

Changed in charm-aodh:
status: New → Confirmed
Changed in charm-barbican:
status: New → Confirmed
Changed in charm-ceilometer:
status: New → Confirmed
Changed in charm-designate:
status: New → Confirmed
Changed in charm-octavia:
status: Expired → Fix Committed
status: Fix Committed → Confirmed
Changed in charm-placement:
status: New → Confirmed
Changed in charms.openstack:
status: New → Incomplete
status: Incomplete → Confirmed
Changed in charm-ceph-radosgw:
status: Expired → Confirmed
Revision history for this message
Samuel Allan (samuelallan) wrote :

I recently observed similar issues during an upgrade from focal/ussuri to focal/wallaby, with these charms: ceph-mon, aodh, barbican, ceph-radosgw, designate, placement, octavia.

The symptoms varied, but all seem related:

- ceph-mon: no units wrote the updated cloud-archive ppa (cloud-archive.list still contained the line for victoria)

- aodh: all units wrote the updated cloud-archive ppa, but only 2 out of 3 upgraded the aodh packages. aodh/0 still had aodh v11.0.0 installed, and apt was showing v12.0.0 was available (eg. ppa added successfully, apt update run, but not apt upgrade)

- barbican: all units wrote the updated cloud-archive ppa, but no units upgraded the barbican packages.

- ceph-radosgw: no units wrote the updated cloud-archive ppa

- designate: all units wrote the updated cloud-archive ppa, but only 2 out of 3 upgraded the designate packages. designate/0 was not upgraded.

- placement: all units wrote the updated cloud-archive ppa, but only 2 out of 3 upgraded the designate packages. placement/0 was not upgraded.

- octavia: all units wrote the updated cloud-archive ppa, but no units upgraded the octavia packages.

Revision history for this message
Samuel Allan (samuelallan) wrote :
Download full text (9.1 KiB)

I think I managed to reproduce this on ceph-mon (single unit). Logs after changing openstack-origin from cloud:focal-victoria to cloud:focal-wallaby:

```
unit-ceph-mon-0: 12:13:39 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)
unit-ceph-mon-0: 12:18:12 INFO unit.ceph-mon/0.juju-log Updating status.
unit-ceph-mon-0: 12:18:13 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)
unit-ceph-mon-0: 12:23:08 INFO unit.ceph-mon/0.juju-log Updating status.
unit-ceph-mon-0: 12:23:09 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)
unit-ceph-mon-0: 12:25:30 INFO juju.worker.uniter found queued "upgrade-charm" hook
unit-ceph-mon-0: 12:25:31 INFO unit.ceph-mon/0.juju-log Making dir /var/lib/charm/ceph-mon ceph:ceph 555
unit-ceph-mon-0: 12:25:35 INFO unit.ceph-mon/0.juju-log Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
unit-ceph-mon-0: 12:25:36 WARNING unit.ceph-mon/0.upgrade-charm Unit /etc/systemd/system/ceph-create-keys.service is masked, ignoring.
unit-ceph-mon-0: 12:25:38 WARNING unit.ceph-mon/0.upgrade-charm Error ENOENT: key 'autotune' doesn't exist
unit-ceph-mon-0: 12:25:39 WARNING unit.ceph-mon/0.upgrade-charm set autotune
unit-ceph-mon-0: 12:25:39 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:39.880+0000 7f42733ca700 -1 auth: unable to find a keyring on /var/lib/ceph/mon/ceph-/keyring: (2) No such file or directory
unit-ceph-mon-0: 12:25:39 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:39.880+0000 7f42733ca700 -1 AuthRegistry(0x7f426c059290) no keyring found at /var/lib/ceph/mon/ceph-/keyring, disabling cephx
unit-ceph-mon-0: 12:25:40 WARNING unit.ceph-mon/0.upgrade-charm exported keyring for client.glance
unit-ceph-mon-0: 12:25:40 WARNING unit.ceph-mon/0.upgrade-charm updated caps for client.glance
unit-ceph-mon-0: 12:25:41 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:41.144+0000 7fdaff818700 -1 auth: unable to find a keyring on /var/lib/ceph/mon/ceph-/keyring: (2) No such file or directory
unit-ceph-mon-0: 12:25:41 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:41.144+0000 7fdaff818700 -1 AuthRegistry(0x7fdaf8059290) no keyring found at /var/lib/ceph/mon/ceph-/keyring, disabling cephx
unit-ceph-mon-0: 12:25:41 WARNING unit.ceph-mon/0.upgrade-charm exported keyring for client.cinder-ceph
unit-ceph-mon-0: 12:25:41 WARNING unit.ceph-mon/0.upgrade-charm updated caps for client.cinder-ceph
unit-ceph-mon-0: 12:25:42 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:42.392+0000 7fdce6db4700 -1 auth: unable to find a keyring on /var/lib/ceph/mon/ceph-/keyring: (2) No such file or directory
unit-ceph-mon-0: 12:25:42 WARNING unit.ceph-mon/0.upgrade-charm 2024-07-17T02:55:42.392+0000 7fdce6db4700 -1 AuthRegistry(0x7fdce0059290) no keyring found at /var/lib/ceph/mon/ceph-/keyring, disabling cephx
unit-ceph-mon-0: 12:25:42 WARNING unit.ceph-mon/0.upgrade-charm exported keyring for client.nova-compute
unit-ceph-mon-0: 12:25:43 WARNING unit.ceph-mon/0.upgrade-charm updated caps for client.nova-compute
unit-ceph-mon-0: 12:25:44 INFO juju.worker.uniter.opera...

Read more...

Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):

I started reading through the logs from Aliaksandr above (from https://bugs.launchpad.net/charm-barbican/+bug/2039604/comments/11 ). Starting with aodh:

> aodh/0, aodh/1 succesfully upgraded.
> aodh/2 stayed on Victoria.
> Worked around with:
> juju run-action aodh/2 openstack-upgrade

aodh/0 succeeded, and I see this:

```
WARNING unit.aodh/0.juju-log server.go:316 Package openstack-release has no installation candidate.
WARNING unit.aodh/0.juju-log server.go:316 Package openstack-release has no installation candidate.
INFO unit.aodh/0.juju-log server.go:316 Performing OpenStack upgrade to wallaby.
INFO unit.aodh/0.juju-log server.go:316 Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
INFO unit.aodh/0.juju-log server.go:316 Upgrading with options: ['--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef']
INFO unit.aodh/0.juju-log server.go:316 Installing ['aodh-api', 'aodh-evaluator', 'aodh-expirer', 'aodh-notifier', 'aodh-listener', 'python3-aodh', 'libapache2-mod-wsgi-py3', 'python3-apt', 'memcached', 'python3-memcache', 'haproxy', 'apache2'] with options: ['--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef']
WARNING unit.aodh/0.juju-log server.go:316 Package python-aodh has no installation candidate.
WARNING unit.aodh/0.juju-log server.go:316 Package python-memcache has no installation candidate.
WARNING unit.aodh/0.juju-log server.go:316 DEPRECATION: should not use port_map parameter in APIConfigurationAdapter.__init__()
WARNING unit.aodh/0.juju-log server.go:316 DEPRECATION: should not use service_name parameter in APIConfigurationAdapter.__init__()
WARNING unit.aodh/0.juju-log server.go:316 Not adding haproxy listen stanza for aodh-api_int port is already in use
WARNING unit.aodh/0.juju-log server.go:316 Not adding haproxy listen stanza for aodh-api_public port is already in use
INFO unit.aodh/0.juju-log server.go:316 Deferring DB sync to leader
```

Note that this is _before_ the config-changed hook. (the upgrade should not happen until the openstack-origin config is updated) There are some other differences; see the attached diff image in the next comment.

aodh/2 failed to upgrade, and the above was not present.
Although it was present about 25m later, which I guess was Aliaksandr manually running the openstack-upgrade action?

Revision history for this message
Samuel Allan (samuelallan) wrote :

Here is a visual diff cleaned up as much as possible for the aodh upgrade - timestamps and names updated make the diff cleaner. aodh/0 (succeeded) on the left, aodh/2 (failed) on the right.

Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):
Download full text (44.6 KiB)

Similar pattern for barbican and placement (again from Aliaksandr's logs):

Logs for barbican/1, which succeeded:

```
2024-06-03 09:39:30 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/focal/barbican-156 from API server
2024-06-03 09:39:31 INFO juju.worker.uniter resolver.go:159 found queued "upgrade-charm" hook
2024-06-03 09:40:44 INFO unit.barbican/1.juju-log server.go:316 Reactive main running for hook upgrade-charm
2024-06-03 09:40:45 INFO unit.barbican/1.juju-log server.go:316 Initializing Leadership Layer (is leader)
2024-06-03 09:40:45 INFO unit.barbican/1.juju-log server.go:316 Invoking reactive handler: reactive/layer_openstack.py:46:default_upgrade_charm
2024-06-03 09:40:45 INFO unit.barbican/1.juju-log server.go:316 Invoking reactive handler: reactive/barbican_handlers.py:45:render_stuff
2024-06-03 09:40:45 INFO unit.barbican/1.juju-log server.go:316 about to call the render_configs with (<relations.rabbitmq.requires.RabbitMQRequires object at 0x7fd1047ba670>, <relations.keystone.requires.KeystoneRequires object at 0x7fd1047ba250>, <relations.mysql-shared.requires.MySQLSharedRequires object at 0x7fd1047ba5e0>)
2024-06-03 09:40:45 WARNING unit.barbican/1.juju-log server.go:316 Not adding haproxy listen stanza for barbican-worker_public port is already in use
2024-06-03 09:40:46 WARNING unit.barbican/1.juju-log server.go:316 configure_ssl method is DEPRECATED, please use configure_tls instead.
2024-06-03 09:40:46 INFO unit.barbican/1.juju-log server.go:316 Making dir /etc/apache2/ssl/barbican root:root 555
2024-06-03 09:40:46 INFO unit.barbican/1.juju-log server.go:316 Making dir /etc/apache2/ssl/barbican root:root 555
2024-06-03 09:40:46 INFO unit.barbican/1.juju-log server.go:316 Making dir /etc/apache2/ssl/barbican root:root 555
2024-06-03 09:40:47 WARNING unit.barbican/1.juju-log server.go:316 Package openstack-release has no installation candidate.
2024-06-03 09:40:47 WARNING unit.barbican/1.juju-log server.go:316 Package openstack-release has no installation candidate.
2024-06-03 09:40:48 INFO unit.barbican/1.juju-log server.go:316 Performing OpenStack upgrade to wallaby.
2024-06-03 09:40:48 INFO unit.barbican/1.juju-log server.go:316 Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
2024-06-03 09:40:50 INFO unit.barbican/1.juju-log server.go:316 Upgrading with options: ['--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef']
2024-06-03 09:41:21 INFO unit.barbican/1.juju-log server.go:316 Installing ['barbican-common', 'barbican-api', 'barbican-worker', 'python3-barbican', 'libapache2-mod-wsgi-py3', 'python3-apt', 'memcached', 'python3-memcache', 'haproxy', 'apache2'] with options: ['--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef']
2024-06-03 09:41:22 WARNING unit.barbican/1.juju-log server.go:316 Package python-barbican has no installation candidate.
2024-06-03 09:41:22 WARNING unit.barbican/1.juju-log server.go:316 Package python-mysqldb has no installation candidate.
2024-06-03 09:41:23 WARNING unit.barbican/1.juju-log server.go:316 Not adding haproxy listen stanza for barbican-worker_public port is alr...

Eric Chen (eric-chen)
tags: added: soleng-561
Revision history for this message
Samuel Allan (samuelallan) wrote :

Note: the workaround I found, that seemed safest, was to toggle openstack-origin or source config back to the previous value, and then forward to the target value again:

```
$ juju config ceph-radosgw source=cloud:focal-victoria

# wait for units to reach active/idle again

$ juju config ceph-radosgw source=cloud:focal-wallaby
```

This seemed to always trigger it again and correctly do the upgrade.

Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):

I've done some more digging, and my working theory is that it's a race condition triggered when the charm refresh and config change is done by cou (charmed-openstack-upgrader) in quick succession (without waiting in between).

This is what the flow of events seems to look like for ceph-mon:

1. cou upgrades the charm channel, this is registered in juju
2. cou changes the source config from cloud:focal-victoria -> cloud:focal-wallaby, this is registered immediately in juju
3. the charm runs the upgrade-charm hook
4. the charm loads config from juju (the new config after being updated in step 2)
5. charmhelpers saves the config to .juju-persistent-config (with the new value)
6. upgrade-charm hook completes, config-changed hook begins
7. upgrade procedure in the config-changed hook compares config.previous('source') with the live config, but these are the same, so no upgrade happens

I'm finding it very difficult to trace the flow of logic for aodh (and probably same will go for the other openstack charms here), but I'd guess a similar thing is happening.

Consider:
- these bugs affect many charms, but the bugs have only been observed by upgrading with cou
- cou will be different to a human upgrading: it fires charm refresh and config change very quickly without waiting in between. A human is likely to be waiting longer between commands (eg. waiting for charm upgrade to complete before changing the config).

I can't explain yet why these bugs have only been observed in victoria -> wallaby upgrades.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.