Q -> R: ceph mgr down after upgrade due to start-limit-hit

Bug #2038518 reported by Peter Sabaini
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
New
Undecided
Unassigned

Bug Description

**Description:**
While upgrading `ceph-mon` from Quincy to Reef, I encountered an issue where `ceph-mgr` restarts too quickly. This leads to hitting the start limit for `systemd`.

This does not appear to be consistent though, on two consecutive runs I've first seen 3 of 3 mgrs down, on the next run only 1 of 3 was down.

**Reproduction Steps**
1. Deploy quincy cloud
2. Run `juju config ceph-mon source=cloud:jammy-bobcat`.

**Error Message:**
When checking the status using `sudo systemctl status <email address hidden>`, this error was shown:

```shell
ubuntu@juju-bc9f56-zaza-5ec88f2270ac-7:~$ sudo systemctl status <email address hidden>
× <email address hidden> - Ceph cluster manager daemon
...
Oct 05 09:02:11 juju-bc9f56-zaza-5ec88f2270ac-7 systemd[1]: <email address hidden>: Start request repeated too quickly.
Oct 05 09:02:11 juju-bc9f56-zaza-5ec88f2270ac-7 systemd[1]: <email address hidden>: Failed with result 'start-limit-hit'.
Oct 05 09:02:11 juju-bc9f56-zaza-5ec88f2270ac-7 systemd[1]: Failed to start Ceph cluster manager daemon.

```

**Workaround:**
Reloading `systemd` seems to solve this, as the service starts correctly after running `sudo systemctl daemon-reload`.

**Additional Information**
The `charm` version was latest/edge at git revision 55beb25.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.