Ceph mon upgrade from Octopus to Pacific charms in loop

Bug #2007859 reported by Diko Parvanov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
New
Medium
Unassigned

Bug Description

Upgrading octopus to pacific the ceph-mon started the upgrade on ceph-mon/0 which was **not** the leader, what came next is a constant loop of:

ceph-mon/0 maintenance executing 3/lxd/3 10.11.2.35 (config-changed) Finishing upgrade
ceph-mon/1* waiting executing 4/lxd/3 10.11.2.32 (config-changed) Waiting on juju-4c2163-3-lxd-3 to finish upgrading
ceph-mon/2 waiting executing 5/lxd/3 10.11.2.170 (config-changed) Waiting on juju-4c2163-4-lxd-3 to finish upgrading

unit-ceph-mon-1: 14:09:10 INFO unit.ceph-mon/1.juju-log waiting for 15 seconds
unit-ceph-mon-1: 14:09:26 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:09:26 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:09:26 INFO unit.ceph-mon/1.juju-log waiting for 22 seconds
unit-ceph-mon-1: 14:09:48 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:09:49 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:09:49 INFO unit.ceph-mon/1.juju-log waiting for 13 seconds
unit-ceph-mon-1: 14:10:02 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:02 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:02 INFO unit.ceph-mon/1.juju-log waiting for 24 seconds
unit-ceph-mon-1: 14:10:27 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:27 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:27 INFO unit.ceph-mon/1.juju-log waiting for 6 seconds
unit-ceph-mon-1: 14:10:33 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:34 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:34 INFO unit.ceph-mon/1.juju-log waiting for 22 seconds
unit-ceph-mon-1: 14:10:56 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:57 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:57 INFO unit.ceph-mon/1.juju-log waiting for 14 seconds
unit-ceph-mon-1: 14:11:11 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:11:11 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:11:11 INFO unit.ceph-mon/1.juju-log waiting for 27 seconds
unit-ceph-mon-1: 14:11:39 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:11:39 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:11:39 INFO unit.ceph-mon/1.juju-log waiting for 27 seconds

This was fixed by stopping the jujud service on both other 2 units, so that leader moves to ceph-mon/0, which then continue the upgrade and finished successfully.

Tags: bseng-1079
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Indeed, this seem to be happening because the upgrade path isn't checking for leadership at some points, where it should.

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → Medium
status: Triaged → Confirmed
Andrea Ieri (aieri)
tags: added: bseng-1079
Tianqi Xiao (txiao)
Changed in charm-ceph-mon:
assignee: nobody → Tianqi Xiao (txiao)
Revision history for this message
Tianqi Xiao (txiao) wrote :

Not able to reproduce the described issue. Following the procedure below resulted in a successful ceph-mon upgrade from octopus -> pacific, even when unit 0 is not the leader:

```
$ juju config ceph-mon source=cloud:focal-victoria

$ juju upgrade-charm ceph-mon --channel pacific/stable

```

Log: https://pastebin.canonical.com/p/D2Dt9B57pq/

Marking the bug as Invalid for now. Feel free to re-open it if needed.

Changed in charm-ceph-mon:
status: Confirmed → Invalid
Revision history for this message
Billy Olsen (billy-olsen) wrote :

For this bug, we'll actually need to get log data from the units in order to determine what's going on (sosreports should be good). I want to call out very specifically that I cannot see charm leadership having anything to do with this bug whatsoever. Nothing in this code path is using leader storage which would cause such issues, nor does the leader come into play when upgrading the monitor cluster. Given the information that is currently present, the stopping of the additional units only circumstantially affects the cluster - and I suspect it likely doesn't at all.

I actually strongly suspect this is due to a slow restart of the ceph-mon service, which may be due to on-disk format changes for the mon's storage.

First, its important to understand how the ceph-mon upgrade works. When the 'source' config value is changed, this will cause a config-changed hook to execute. Since the source config option is indicating that the repository has changed, then the rolling the monitor cluster will start, which starts the upgrade process which generally occurs across all units at the same time. The process is as follows:

1. Get a list of all monitors in the cluster (from the monmap), sort them by name (for consistency) and save for later.
2. Check the index of the current node in the ordered list of monitors
  > if the index is 0, start the restart of the daemon (step 4)
  > if the index is not 0, then wait for the previous unit to complete (step 3)
3. Look for the previous unit to be done by checking for the mon_$hostname_$version_done key to be set in the key-value store. This will cause the unit to check for the existence of the done key. It will wait a random amount of time between 5 and 30 seconds before checking again. It will timeout waiting after 30 minutes.
4. Update the source repository configurations, update apt info
5. Upgrade the packages on the box to get the newest ceph version. This will upgrade the software on disk, but will not restart the monitor services running in order not to impact the cluster's availability.
6. Stop the ceph-mon service
7. ensure the mon directory is user writable by the ceph user (legacy)
8. Restart the ceph-mon service
9. Notify the service is done by setting the mon_$hostname_$version_done key in the ceph-mon key-value store

At each step along the way once a unit starts the upgrade (4-8), the ceph-mon key is updated with a timestamp so that other units will not time out and knows that the mon unit is still upgrading. There is very little that is going on during this time. The log messages that have been provided are simply indicating that they are waiting for the lock and for this process to play out.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Moving from invalid -> incomplete in case data can be provided that shows this is actually an issue. From what I can tell in the logs that have been provided, it is working as intended and potentially just slower than expected.

Changed in charm-ceph-mon:
status: Invalid → Incomplete
Revision history for this message
Andrea Ieri (aieri) wrote :

We seem to have hit this bug in a test cloud, or at least a variation of it (I don't see the looping Diko reported).
I have collected sosreports as follows:

`sos collect --cluster-type juju -c "juju.models=openstack" -c "juju.apps=ceph-mon" --nopasswd-sudo --case 2007859 --no-local`

The tarball is being uploaded to https://private-fileshare.canonical.com/~aieri/sos-collector-2007859-2023-10-17-iscic.tar.xz

The units have been in maintenance/executing for the last ~12h.

Changed in charm-ceph-mon:
assignee: Tianqi Xiao (txiao) → nobody
Andrea Ieri (aieri)
Changed in charm-ceph-mon:
status: Incomplete → New
Revision history for this message
Koo Zhong Zheng (kzz333) wrote :

I was able to reproduce this issue in my lab by luck and my workaround is following:

1) For example, if the message is "Waiting on juju-4c2163-3-lxd-3 to finish upgrading", then reboot that particular lxd or ceph-mon/<node-number>

# in my case, this particular node is not a leader, but it is a leader before upgrade
$ juju ssh ceph-mon/<node-number>
$ reboot

2) After the reboot, the juju status will show failure on that node, but I found it had recovered by itself in juju logs, hence resolve the ceph-mon node

$ juju resolve ceph-mon/<node-number>

Then all ceph-mon nodes are ready and clustered.

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote (last edit ):
Download full text (5.0 KiB)

I got this too. #6 works in my case too after reboot and resolved for ceph-mon/0. The ceph cluster is still working and operation.

zlmao@p14s:~/gzgz/upgrade$ juju status ceph-mon
Model Controller Cloud/Region Version SLA Timestamp
openstack maas maas/default 2.9.45 unsupported 16:21:09+08:00

App Version Status Scale Charm Channel Rev Exposed Message
ceph-mon 15.2.17 waiting 3 ceph-mon pacific/stable 189 no Waiting on juju-e5b537-0-lxd-0 to finish upgrading

Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 maintenance executing 0/lxd/0 192.168.222.69 (config-changed) Finishing upgrade
ceph-mon/1* waiting executing 1/lxd/0 192.168.222.65 (config-changed) Waiting on juju-e5b537-0-lxd-0 to finish upgrading
ceph-mon/2 waiting executing 2/lxd/0 192.168.222.96 (config-changed) Waiting on juju-e5b537-1-lxd-0 to finish upgrading

root@juju-e5b537-0-lxd-0:~# tail -n10 /var/log/juju/unit-ceph-mon-0.log
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:39:21 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_alive
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:24 INFO unit.ceph-mon/0.juju-log server.go:316 Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 restarting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Making dir /var/lib/ceph/mon/ceph-juju-e5b537-0-lxd-0 ceph:ceph 755
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 starting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Done
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 monitor_key_set mon_juju-e5b537-0-lxd-0_pacific_done 1704353965.4807012
2024-01-04 07:39:25 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_done
root@juju-e5b537-0-lxd-0:~#

ubuntu@juju-e5b537-1-lxd-0:~$ sudo tail -n10 /var/log/juju/unit-ceph-mon-1.log
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 roll_monitor_cluster called with pacific
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 monitor_list: ['juju-e5b537-1-lxd-0', 'juju-e5b537-2-lxd-0', 'juju-e5b537-0-lxd-0']
2024-01-04 07:38:29 WARNING unit.ceph-mon/1.juju-log server.go:316 DEPRECATION WARNING: Function one_shot_log is being removed : Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 Current Ceph version is 15.2
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:38:30 INFO unit.ceph-mon/1.juju-log server.go:316 Installing ['ubuntu-cloud-keyring'] with options: ['--option=Dpkg::Optio...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.