Ceph Monitor Charm

Ceph mon upgrade from Octopus to Pacific charms in loop

Bug #2007859 reported by Diko Parvanov on 2023-02-20

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Ceph Monitor Charm	New	Medium	Unassigned

Bug Description

Upgrading octopus to pacific the ceph-mon started the upgrade on ceph-mon/0 which was **not** the leader, what came next is a constant loop of:

ceph-mon/0 maintenance executing 3/lxd/3 10.11.2.35 (config-changed) Finishing upgrade
ceph-mon/1* waiting executing 4/lxd/3 10.11.2.32 (config-changed) Waiting on juju-4c2163-3-lxd-3 to finish upgrading
ceph-mon/2 waiting executing 5/lxd/3 10.11.2.170 (config-changed) Waiting on juju-4c2163-4-lxd-3 to finish upgrading

unit-ceph-mon-1: 14:09:10 INFO unit.ceph-mon/1.juju-log waiting for 15 seconds
unit-ceph-mon-1: 14:09:26 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:09:26 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:09:26 INFO unit.ceph-mon/1.juju-log waiting for 22 seconds
unit-ceph-mon-1: 14:09:48 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:09:49 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:09:49 INFO unit.ceph-mon/1.juju-log waiting for 13 seconds
unit-ceph-mon-1: 14:10:02 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:02 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:02 INFO unit.ceph-mon/1.juju-log waiting for 24 seconds
unit-ceph-mon-1: 14:10:27 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:27 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:27 INFO unit.ceph-mon/1.juju-log waiting for 6 seconds
unit-ceph-mon-1: 14:10:33 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:34 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:34 INFO unit.ceph-mon/1.juju-log waiting for 22 seconds
unit-ceph-mon-1: 14:10:56 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:10:57 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:10:57 INFO unit.ceph-mon/1.juju-log waiting for 14 seconds
unit-ceph-mon-1: 14:11:11 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:11:11 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:11:11 INFO unit.ceph-mon/1.juju-log waiting for 27 seconds
unit-ceph-mon-1: 14:11:39 WARNING unit.ceph-mon/1.config-changed Error ENOENT: key 'mon_juju-4c2163-3-lxd-3_pacific_done' doesn't exist
unit-ceph-mon-1: 14:11:39 WARNING unit.ceph-mon/1.config-changed obtained 'mon_juju-4c2163-3-lxd-3_pacific_alive'
unit-ceph-mon-1: 14:11:39 INFO unit.ceph-mon/1.juju-log waiting for 27 seconds

This was fixed by stopping the jujud service on both other 2 units, so that leader moves to ceph-mon/0, which then continue the upgrade and finished successfully.

Tags:

Revision history for this message

Luciano Lo Giudice (lmlogiudice) wrote on 2023-02-22:

Indeed, this seem to be happening because the upgrade path isn't checking for leadership at some points, where it should.

Changed in charm-ceph-mon:
status:	New → Triaged
importance:	Undecided → Medium
status:	Triaged → Confirmed

Andrea Ieri (aieri) on 2023-04-07

tags:

added: bseng-1079

Tianqi Xiao (txiao) on 2023-04-12

Changed in charm-ceph-mon:
assignee:	nobody → Tianqi Xiao (txiao)

Revision history for this message

Tianqi Xiao (txiao) wrote on 2023-04-12:

Not able to reproduce the described issue. Following the procedure below resulted in a successful ceph-mon upgrade from octopus -> pacific, even when unit 0 is not the leader:

```
$ juju config ceph-mon source=cloud:focal-victoria

$ juju upgrade-charm ceph-mon --channel pacific/stable

```

Log: https://pastebin.canonical.com/p/D2Dt9B57pq/

Marking the bug as Invalid for now. Feel free to re-open it if needed.

Changed in charm-ceph-mon:
status:	Confirmed → Invalid

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2023-04-13:

For this bug, we'll actually need to get log data from the units in order to determine what's going on (sosreports should be good). I want to call out very specifically that I cannot see charm leadership having anything to do with this bug whatsoever. Nothing in this code path is using leader storage which would cause such issues, nor does the leader come into play when upgrading the monitor cluster. Given the information that is currently present, the stopping of the additional units only circumstantially affects the cluster - and I suspect it likely doesn't at all.

I actually strongly suspect this is due to a slow restart of the ceph-mon service, which may be due to on-disk format changes for the mon's storage.

First, its important to understand how the ceph-mon upgrade works. When the 'source' config value is changed, this will cause a config-changed hook to execute. Since the source config option is indicating that the repository has changed, then the rolling the monitor cluster will start, which starts the upgrade process which generally occurs across all units at the same time. The process is as follows:

1. Get a list of all monitors in the cluster (from the monmap), sort them by name (for consistency) and save for later.
2. Check the index of the current node in the ordered list of monitors
> if the index is 0, start the restart of the daemon (step 4)
> if the index is not 0, then wait for the previous unit to complete (step 3)
3. Look for the previous unit to be done by checking for the mon_$hostname_$version_done key to be set in the key-value store. This will cause the unit to check for the existence of the done key. It will wait a random amount of time between 5 and 30 seconds before checking again. It will timeout waiting after 30 minutes.
4. Update the source repository configurations, update apt info
5. Upgrade the packages on the box to get the newest ceph version. This will upgrade the software on disk, but will not restart the monitor services running in order not to impact the cluster's availability.
6. Stop the ceph-mon service
7. ensure the mon directory is user writable by the ceph user (legacy)
8. Restart the ceph-mon service
9. Notify the service is done by setting the mon_$hostname_$version_done key in the ceph-mon key-value store

At each step along the way once a unit starts the upgrade (4-8), the ceph-mon key is updated with a timestamp so that other units will not time out and knows that the mon unit is still upgrading. There is very little that is going on during this time. The log messages that have been provided are simply indicating that they are waiting for the lock and for this process to play out.

I actually strongly suspect this is due to a slow restart of the ceph-mon service, which may be due to on-disk format changes for the mon's storage.

1. Get a list of all monitors in the cluster (from the monmap), sort them by name (for consistency) and save for later.
2. Check the index of the current node in the ordered list of monitors
  > if the index is 0, start the restart of the daemon (step 4)
  > if the index is not 0, then wait for the previous unit to complete (step 3)
3. Look for the previous unit to be done by checking for the mon_$hostname_$version_done key to be set in the key-value store. This will cause the unit to check for the existence of the done key. It will wait a random amount of time between 5 and 30 seconds before checking again. It will timeout waiting after 30 minutes.
4. Update the source repository configurations, update apt info
5. Upgrade the packages on the box to get the newest ceph version. This will upgrade the software on disk, but will not restart the monitor services running in order not to impact the cluster's availability.
6. Stop the ceph-mon service
7. ensure the mon directory is user writable by the ceph user (legacy)
8. Restart the ceph-mon service
9. Notify the service is done by setting the mon_$hostname_$version_done key in the ceph-mon key-value store

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2023-04-13:

Moving from invalid -> incomplete in case data can be provided that shows this is actually an issue. From what I can tell in the logs that have been provided, it is working as intended and potentially just slower than expected.

Changed in charm-ceph-mon:
status:	Invalid → Incomplete

Revision history for this message

Andrea Ieri (aieri) wrote on 2023-10-17:

We seem to have hit this bug in a test cloud, or at least a variation of it (I don't see the looping Diko reported).
I have collected sosreports as follows:

`sos collect --cluster-type juju -c "juju.models=openstack" -c "juju.apps=ceph-mon" --nopasswd-sudo --case 2007859 --no-local`

The tarball is being uploaded to https://private-fileshare.canonical.com/~aieri/sos-collector-2007859-2023-10-17-iscic.tar.xz

The units have been in maintenance/executing for the last ~12h.

Changed in charm-ceph-mon:
assignee:	Tianqi Xiao (txiao) → nobody

Andrea Ieri (aieri) on 2023-10-17

Changed in charm-ceph-mon:
status:	Incomplete → New

Revision history for this message

Koo Zhong Zheng (kzz333) wrote on 2023-11-29:

I was able to reproduce this issue in my lab by luck and my workaround is following:

1) For example, if the message is "Waiting on juju-4c2163-3-lxd-3 to finish upgrading", then reboot that particular lxd or ceph-mon/<node-number>

# in my case, this particular node is not a leader, but it is a leader before upgrade
$ juju ssh ceph-mon/<node-number>
$ reboot

2) After the reboot, the juju status will show failure on that node, but I found it had recovered by itself in juju logs, hence resolve the ceph-mon node

$ juju resolve ceph-mon/<node-number>

Then all ceph-mon nodes are ready and clustered.

Revision history for this message

Zhanglei Mao (zhanglei-mao) wrote on 2024-01-04 (last edit on 2024-01-04):

Download full text (5.0 KiB)

I got this too. #6 works in my case too after reboot and resolved for ceph-mon/0. The ceph cluster is still working and operation.

zlmao@p14s:~/gzgz/upgrade$ juju status ceph-mon
Model Controller Cloud/Region Version SLA Timestamp
openstack maas maas/default 2.9.45 unsupported 16:21:09+08:00

App Version Status Scale Charm Channel Rev Exposed Message
ceph-mon 15.2.17 waiting 3 ceph-mon pacific/stable 189 no Waiting on juju-e5b537-0-lxd-0 to finish upgrading

Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 maintenance executing 0/lxd/0 192.168.222.69 (config-changed) Finishing upgrade
ceph-mon/1* waiting executing 1/lxd/0 192.168.222.65 (config-changed) Waiting on juju-e5b537-0-lxd-0 to finish upgrading
ceph-mon/2 waiting executing 2/lxd/0 192.168.222.96 (config-changed) Waiting on juju-e5b537-1-lxd-0 to finish upgrading

root@juju-e5b537-0-lxd-0:~# tail -n10 /var/log/juju/unit-ceph-mon-0.log
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:39:21 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_alive
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:24 INFO unit.ceph-mon/0.juju-log server.go:316 Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 restarting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Making dir /var/lib/ceph/mon/ceph-juju-e5b537-0-lxd-0 ceph:ceph 755
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 starting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Done
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 monitor_key_set mon_juju-e5b537-0-lxd-0_pacific_done 1704353965.4807012
2024-01-04 07:39:25 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_done
root@juju-e5b537-0-lxd-0:~#

I got this too.  #6 works in my case too after reboot and resolved for ceph-mon/0.  The ceph cluster is still working and operation.

zlmao@p14s:~/gzgz/upgrade$ juju status ceph-mon
Model      Controller  Cloud/Region  Version  SLA          Timestamp
openstack  maas        maas/default  2.9.45   unsupported  16:21:09+08:00

App       Version  Status   Scale  Charm     Channel         Rev  Exposed  Message
ceph-mon  15.2.17  waiting      3  ceph-mon  pacific/stable  189  no       Waiting on juju-e5b537-0-lxd-0 to finish upgrading

Unit         Workload     Agent      Machine  Public address  Ports  Message
ceph-mon/0   maintenance  executing  0/lxd/0  192.168.222.69         (config-changed) Finishing upgrade
ceph-mon/1*  waiting      executing  1/lxd/0  192.168.222.65         (config-changed) Waiting on juju-e5b537-0-lxd-0 to finish upgrading
ceph-mon/2   waiting      executing  2/lxd/0  192.168.222.96         (config-changed) Waiting on juju-e5b537-1-lxd-0 to finish upgrading

root@juju-e5b537-0-lxd-0:~# tail -n10 /var/log/juju/unit-ceph-mon-0.log 
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:39:21 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_alive
2024-01-04 07:39:21 INFO unit.ceph-mon/0.juju-log server.go:316 Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:24 INFO unit.ceph-mon/0.juju-log server.go:316 Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 restarting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Making dir /var/lib/ceph/mon/ceph-juju-e5b537-0-lxd-0 ceph:ceph 755
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 starting ceph-mgr.target maybe: True
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 Done
2024-01-04 07:39:25 INFO unit.ceph-mon/0.juju-log server.go:316 monitor_key_set mon_juju-e5b537-0-lxd-0_pacific_done 1704353965.4807012
2024-01-04 07:39:25 WARNING unit.ceph-mon/0.config-changed logger.go:60 set mon_juju-e5b537-0-lxd-0_pacific_done
root@juju-e5b537-0-lxd-0:~#

ubuntu@juju-e5b537-1-lxd-0:~$ sudo tail -n10 /var/log/juju/unit-ceph-mon-1.log 
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 roll_monitor_cluster called with pacific
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 monitor_list: ['juju-e5b537-1-lxd-0', 'juju-e5b537-2-lxd-0', 'juju-e5b537-0-lxd-0']
2024-01-04 07:38:29 WARNING unit.ceph-mon/1.juju-log server.go:316 DEPRECATION WARNING: Function one_shot_log is being removed : Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 Current Ceph version is 15.2
2024-01-04 07:38:29 INFO unit.ceph-mon/1.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:38:30 INFO unit.ceph-mon/1.juju-log server.go:316 Installing ['ubuntu-cloud-keyring'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:38:35 INFO unit.ceph-mon/1.juju-log server.go:316 Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:31 INFO unit.ceph-mon/1.juju-log server.go:316 Packages upgraded but not restarting daemons yet.
2024-01-04 07:39:31 INFO unit.ceph-mon/1.juju-log server.go:316 upgrade position: 1
2024-01-04 07:39:32 INFO unit.ceph-mon/1.juju-log server.go:316 Previous node is: juju-e5b537-0-lxd-0
ubuntu@juju-e5b537-1-lxd-0:~$

ubuntu@juju-e5b537-2-lxd-0:~$ sudo tail -n10 /var/log/juju/unit-ceph-mon-2.log 
2024-01-04 07:38:29 INFO unit.ceph-mon/2.juju-log server.go:316 Upgrading to: pacific
2024-01-04 07:38:30 INFO unit.ceph-mon/2.juju-log server.go:316 Installing ['ubuntu-cloud-keyring'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:38:35 INFO unit.ceph-mon/2.juju-log server.go:316 Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2024-01-04 07:39:08 INFO unit.ceph-mon/2.juju-log server.go:316 Packages upgraded but not restarting daemons yet.
2024-01-04 07:39:08 INFO unit.ceph-mon/2.juju-log server.go:316 upgrade position: 2
2024-01-04 07:39:08 INFO unit.ceph-mon/2.juju-log server.go:316 Previous node is: juju-e5b537-1-lxd-0
2024-01-04 07:39:08 WARNING unit.ceph-mon/2.config-changed logger.go:60 Error ENOENT: key 'mon_juju-e5b537-1-lxd-0_pacific_start' doesn't exist
2024-01-04 07:39:08 INFO unit.ceph-mon/2.juju-log server.go:316 wait_until: waiting for 8 seconds
2024-01-04 07:39:17 WARNING unit.ceph-mon/2.config-changed logger.go:60 Error ENOENT: key 'mon_juju-e5b537-1-lxd-0_pacific_start' doesn't exist
2024-01-04 07:39:17 INFO unit.ceph-mon/2.juju-log server.go:316 wait_until: waiting for 19 seconds
ubuntu@juju-e5b537-2-lxd-0:~$

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.