Ceph Monitor Charm

Bug #2007859
Comment #3

Comment 3 for bug 2007859

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2023-04-13:

For this bug, we'll actually need to get log data from the units in order to determine what's going on (sosreports should be good). I want to call out very specifically that I cannot see charm leadership having anything to do with this bug whatsoever. Nothing in this code path is using leader storage which would cause such issues, nor does the leader come into play when upgrading the monitor cluster. Given the information that is currently present, the stopping of the additional units only circumstantially affects the cluster - and I suspect it likely doesn't at all.

I actually strongly suspect this is due to a slow restart of the ceph-mon service, which may be due to on-disk format changes for the mon's storage.

First, its important to understand how the ceph-mon upgrade works. When the 'source' config value is changed, this will cause a config-changed hook to execute. Since the source config option is indicating that the repository has changed, then the rolling the monitor cluster will start, which starts the upgrade process which generally occurs across all units at the same time. The process is as follows:

1. Get a list of all monitors in the cluster (from the monmap), sort them by name (for consistency) and save for later.
2. Check the index of the current node in the ordered list of monitors
> if the index is 0, start the restart of the daemon (step 4)
> if the index is not 0, then wait for the previous unit to complete (step 3)
3. Look for the previous unit to be done by checking for the mon_$hostname_$version_done key to be set in the key-value store. This will cause the unit to check for the existence of the done key. It will wait a random amount of time between 5 and 30 seconds before checking again. It will timeout waiting after 30 minutes.
4. Update the source repository configurations, update apt info
5. Upgrade the packages on the box to get the newest ceph version. This will upgrade the software on disk, but will not restart the monitor services running in order not to impact the cluster's availability.
6. Stop the ceph-mon service
7. ensure the mon directory is user writable by the ceph user (legacy)
8. Restart the ceph-mon service
9. Notify the service is done by setting the mon_$hostname_$version_done key in the ceph-mon key-value store

At each step along the way once a unit starts the upgrade (4-8), the ceph-mon key is updated with a timestamp so that other units will not time out and knows that the mon unit is still upgrading. There is very little that is going on during this time. The log messages that have been provided are simply indicating that they are waiting for the lock and for this process to play out.

I actually strongly suspect this is due to a slow restart of the ceph-mon service, which may be due to on-disk format changes for the mon's storage.

1. Get a list of all monitors in the cluster (from the monmap), sort them by name (for consistency) and save for later.
2. Check the index of the current node in the ordered list of monitors
  > if the index is 0, start the restart of the daemon (step 4)
  > if the index is not 0, then wait for the previous unit to complete (step 3)
3. Look for the previous unit to be done by checking for the mon_$hostname_$version_done key to be set in the key-value store. This will cause the unit to check for the existence of the done key. It will wait a random amount of time between 5 and 30 seconds before checking again. It will timeout waiting after 30 minutes.
4. Update the source repository configurations, update apt info
5. Upgrade the packages on the box to get the newest ceph version. This will upgrade the software on disk, but will not restart the monitor services running in order not to impact the cluster's availability.
6. Stop the ceph-mon service
7. ensure the mon directory is user writable by the ceph user (legacy)
8. Restart the ceph-mon service
9. Notify the service is done by setting the mon_$hostname_$version_done key in the ceph-mon key-value store