ceph-radosgw service crashes on fresh deployment

Bug #1940457 reported by Marco Savoca
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
New
Undecided
Unassigned

Bug Description

ceph-radosgw keeps in blocked state with message "Services not running that should be: <email address hidden>".

The unit is deployed on a fresh and healthy ceph cluster on MAAS/ubuntu-focal with:

juju deploy -n 3 --to lxd:0,lxd:1,lxd:2 --config rgw.yaml --config vip=10.10.0.151 ceph-radosgw
juju deploy --config cluster_count=3 hacluster radosgw-hacluster
juju add-relation radosgw-hacluster:ha ceph-radosgw:ha
juju add-relation ceph-radosgw:mon ceph-mon:radosgw

rgw.yaml
ceph-radosgw:
  source: cloud:focal-wallaby
  pool-type: erasure-coded
  bluestore-compression-mode: aggressive
  bluestore-compression-algorithm: lz4
  ec-profile-device-class: hdd
  ec-profile-k: 4
  ec-profile-m: 2
  ec-profile-name: erasure-4-2
  ec-profile-plugin: isa
  ec-profile-technique: reed_sol_van
  rgw-buckets-pool-weight: 80

Output of "systemctl status <email address hidden>"

<email address hidden> - Ceph rados gateway
     Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-08-18 16:37:54 UTC; 11min ago
    Process: 61895 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name client.rgw.juju-c42313-0-lxd-1 --setuser ceph --setgroup ceph (code=exited, sta>
   Main PID: 61895 (code=exited, status=1/FAILURE)

Aug 18 16:37:54 juju-c42313-0-lxd-1 systemd[1]: <email address hidden>: Scheduled restart job, restart counter is at 5.
Aug 18 16:37:54 juju-c42313-0-lxd-1 systemd[1]: Stopped Ceph rados gateway.
Aug 18 16:37:54 juju-c42313-0-lxd-1 systemd[1]: <email address hidden>: Start request repeated too quickly.
Aug 18 16:37:54 juju-c42313-0-lxd-1 systemd[1]: <email address hidden>: Failed with result 'exit-code'.
Aug 18 16:37:54 juju-c42313-0-lxd-1 systemd[1]: Failed

Revision history for this message
Marco Savoca (quaternionma) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

From the log, it looks like the ceph-radosgw cannot talk to the ceph-mons. Are the radosgw units bound to the correct space for ceph-access? It looks like a network issue because of these repeating log lines:

Aug 18 16:37:53 juju-c42313-0-lxd-1 radosgw[61880]: unable to get monitor info from DNS SRV with service name: ceph-mon
Aug 18 16:37:53 juju-c42313-0-lxd-1 radosgw[61880]: 2021-08-18T16:37:53.730+0000 7f79327e9cc0 -1 failed for service _ceph-mon._tcp
Aug 18 16:37:53 juju-c42313-0-lxd-1 radosgw[61880]: 2021-08-18T16:37:53.730+0000 7f79327e9cc0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Aug 18 16:37:53 juju-c42313-0-lxd-1 radosgw[61880]: failed to fetch mon config (--no-mon-config to skip)

Revision history for this message
Marco Savoca (quaternionma) wrote :

I have 3 spaces in the model:
Name Space ID Subnets
alpha 0
maascluster 2 10.10.176.0/27
maaspublic 1 10.10.32.0/24

maaspublic ist the model's default-space. The ceph OSD, MON and RGW units all get an ip address from the maaspublic network. Is there a need for explicitly bind the ceph-rgw unit to a space?

summary: - ceph-radosgw service crashes on freh deployment
+ ceph-radosgw service crashes on fresh deployment
Revision history for this message
Marco Savoca (quaternionma) wrote :

Finally I got it working.

It was necessary to manually add the containerized machines with and set the constraints to the maas public space.

For instance: juju add-machine lxd:0 --constraints spaces=maaspublic

Afterwards the ceph-radosgw application was deployed with explicit binding to the space:

juju deploy ... ceph-radosgw --bind "public=maaspublic"

As I stated before, the maaspublic was set as default space in the model. So the meaning or purposes of the config option "default space" isn't really clear to me.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.