systemd issues with bionic-rocky causing nagios alert and can't restart daemon
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph RADOS Gateway Charm |
Fix Released
|
High
|
Pen Gale | ||
ceph (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
During deployment of a bionic-rocky cloud on 19.04 charms, we are seeing an issue with the ceph-radosgw units related to the systemd service definition for radosgw.service.
If you look through this pastebin, you'll notice that there is a running radosgw daemon and the local haproxy unit thinks all radosgw backend services are available (via nagios check), but systemd can't control radosgw properly (note that before a restart with systemd, systemd just showed the unit as loaded inactive, however, it now shows active exited, but that did not actually restart the radosgw service.
https:/
charm: cs:ceph-radosgw-266
cloud:bionic-rocky
*** 13.2.4+
500 http://
ceph-radosgw/0 active idle 18/lxd/2 10.20.175.60 80/tcp Unit is ready
hacluster-
ceph-radosgw/1 active idle 19/lxd/2 10.20.175.48 80/tcp Unit is ready
hacluster-
ceph-radosgw/2* active idle 20/lxd/2 10.20.175.25 80/tcp Unit is ready
hacluster-
Changed in charm-ceph-radosgw: | |
importance: | Undecided → High |
assignee: | nobody → Pete Vander Giessen (petevg) |
tags: | added: tracking |
Changed in charm-ceph-radosgw: | |
milestone: | none → 19.07 |
Changed in charm-ceph-radosgw: | |
status: | Fix Committed → Fix Released |
Subscribed field-high as this is an operational concern for go-live.
Workaround for managing service is to reboot the hosting lxd container which resets state to that of the first 43 lines of the pastebin.