Intermittent deploy failure

Bug #1868387 reported by Frode Nordahl
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Fix Released
High
Frode Nordahl

Bug Description

$ juju status ceph-radosgw --relations
Model Controller Cloud/Region Version SLA Timestamp
zaza-fc6306dea031 fnordahl-serverstack serverstack/serverstack 2.7.4 unsupported 14:41:04Z

App Version Status Scale Charm Store Rev OS Notes
ceph-radosgw 15.1.0 blocked 1 ceph-radosgw jujucharms 356 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-radosgw/0* blocked idle 6 10.5.0.3 80/tcp Services not running that should be: <email address hidden>

Machine State DNS Inst id Series AZ Message
6 started 10.5.0.3 4bb9dfd8-17ac-49a3-a322-8f790444ecd2 bionic nova ACTIVE

Relation provider Requirer Interface Type Message
ceph-mon:radosgw ceph-radosgw:mon ceph-radosgw regular
ceph-radosgw:cluster ceph-radosgw:cluster swift-ha peer
keystone:identity-service ceph-radosgw:identity-service keystone regular

2020-03-21 11:26:53 INFO juju-log identity-service:37: Registered config file: /etc/haproxy/haproxy.cfg
2020-03-21 11:26:53 INFO juju-log identity-service:37: Registered config file: /etc/ceph/ceph.conf
2020-03-21 11:26:55 DEBUG juju-log identity-service:37: Ensuring haproxy enabled in /etc/default/haproxy.
2020-03-21 11:26:55 INFO juju-log identity-service:37: HAProxy context is incomplete, this unit has no peers.
2020-03-21 11:26:57 DEBUG juju-log identity-service:37: Generating template context for identity-service
2020-03-21 11:27:00 DEBUG juju-log identity-service:37: Ensuring haproxy enabled in /etc/default/haproxy.
2020-03-21 11:27:00 INFO juju-log identity-service:37: HAProxy context is incomplete, this unit has no peers.
2020-03-21 11:27:01 INFO juju-log identity-service:37: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
2020-03-21 11:27:01 INFO juju-log identity-service:37: Rendering from template: /etc/haproxy/haproxy.cfg
2020-03-21 11:27:01 INFO juju-log identity-service:37: Wrote template /etc/haproxy/haproxy.cfg.
2020-03-21 11:27:01 DEBUG juju-log identity-service:37: Generating template context for identity-service
2020-03-21 11:27:03 INFO juju-log identity-service:37: Loaded template from templates/ceph.conf
2020-03-21 11:27:03 INFO juju-log identity-service:37: Rendering from template: /etc/ceph/ceph.conf
2020-03-21 11:27:03 INFO juju-log identity-service:37: Wrote template /etc/ceph/ceph.conf.
2020-03-21 11:27:03 DEBUG juju-log identity-service:37: Ensuring haproxy enabled in /etc/default/haproxy.
2020-03-21 11:27:04 INFO juju-log identity-service:37: HAProxy context is incomplete, this unit has no peers.
2020-03-21 11:27:04 INFO juju-log identity-service:37: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
2020-03-21 11:27:04 INFO juju-log identity-service:37: Rendering from template: /etc/haproxy/haproxy.cfg
2020-03-21 11:27:04 INFO juju-log identity-service:37: Wrote template /etc/haproxy/haproxy.cfg.
2020-03-21 11:27:04 DEBUG juju-log identity-service:37: Generating template context for identity-service
2020-03-21 11:27:06 INFO juju-log identity-service:37: Loaded template from templates/ceph.conf
2020-03-21 11:27:06 INFO juju-log identity-service:37: Rendering from template: /etc/ceph/ceph.conf
2020-03-21 11:27:06 INFO juju-log identity-service:37: Wrote template /etc/ceph/ceph.conf.
2020-03-21 11:27:06 DEBUG identity-service-relation-changed ERROR: Site openstack_https_frontend does not exist!
2020-03-21 11:27:06 DEBUG identity-service-relation-changed apache2.service is not active, cannot reload.
2020-03-21 11:27:06 DEBUG identity-service-relation-changed Job for apache2.service failed because the control process exited with error code.
2020-03-21 11:27:06 DEBUG identity-service-relation-changed See "systemctl status apache2.service" and "journalctl -xe" for details.
2020-03-21 11:27:07 DEBUG identity-service-relation-changed active
2020-03-21 11:27:07 DEBUG identity-service-relation-changed active
2020-03-21 11:27:07 INFO juju-log identity-service:37: Unit is ready

# systemctl status apache2
● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/apache2.service.d
           └─apache2-systemd.conf
   Active: failed (Result: exit-code) since Sat 2020-03-21 11:27:06 UTC; 3h 15min ago
  Process: 6895 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
  Process: 14108 ExecStart=/usr/sbin/apachectl start (code=exited, status=1/FAILURE)
 Main PID: 5564 (code=exited, status=1/FAILURE)

Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: Starting The Apache HTTP Server...
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 apachectl[14108]: no listening sockets available, shutting down
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 apachectl[14108]: AH00015: Unable to open logs
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 apachectl[14108]: Action 'start' failed.
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 apachectl[14108]: The Apache error log may have more information.
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: apache2.service: Control process exited, code=exited status=1
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: apache2.service: Failed with result 'exit-code'.
Mar 21 11:27:06 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: Failed to start The Apache HTTP Server.

# netstat -nepa |grep LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 38964 7263/haproxy
tcp 0 0 252.0.3.1:53 0.0.0.0:* LISTEN 0 23121 2374/dnsmasq
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 15544 611/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 18826 910/sshd
tcp 0 0 127.0.0.1:8888 0.0.0.0:* LISTEN 0 38962 7263/haproxy
tcp6 0 0 :::80 :::* LISTEN 0 38965 7263/haproxy
tcp6 0 0 :::22 :::* LISTEN 0 18837 910/sshd

# systemctl status <email address hidden>
● <email address hidden> - Ceph rados gateway
   Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2020-03-21 11:27:07 UTC; 3h 17min ago
  Process: 14228 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name client.rgw.juju-ddb957-zaza-fc6306dea031-6 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
 Main PID: 14228 (code=exited, status=1/FAILURE)

Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Main process exited, code=exited, status=1/FAILURE
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Failed with result 'exit-code'.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Service hold-off time over, scheduling restart.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Scheduled restart job, restart counter is at 5.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: Stopped Ceph rados gateway.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Start request repeated too quickly.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Failed with result 'exit-code'.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: Failed to start Ceph rados gateway.

# journalctl -b |grep radosgw
[ ... ]
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 radosgw[14228]: 2020-03-21T11:27:07.610+0000 7f2016e2c980 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-rgw.juju-ddb957-zaza-fc6306dea031-6/keyring: (2) No such file or directory
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 radosgw[14228]: 2020-03-21T11:27:07.610+0000 7f2016e2c980 -1 AuthRegistry(0x5600fa991198) no keyring found at /var/lib/ceph/radosgw/ceph-rgw.juju-ddb957-zaza-fc6306dea031-6/keyring, disabling cephx
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 radosgw[14228]: 2020-03-21T11:27:07.618+0000 7f2016e2c980 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-rgw.juju-ddb957-zaza-fc6306dea031-6/keyring: (2) No such file or directory
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 radosgw[14228]: 2020-03-21T11:27:07.618+0000 7f2016e2c980 -1 AuthRegistry(0x7fffa48ac2d0) no keyring found at /var/lib/ceph/radosgw/ceph-rgw.juju-ddb957-zaza-fc6306dea031-6/keyring, disabling cephx
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 radosgw[14228]: failed to fetch mon config (--no-mon-config to skip)
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Main process exited, code=exited, status=1/FAILURE
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Failed with result 'exit-code'.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Service hold-off time over, scheduling restart.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Scheduled restart job, restart counter is at 5.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Start request repeated too quickly.
Mar 21 11:27:07 juju-ddb957-zaza-fc6306dea031-6 systemd[1]: <email address hidden>: Failed with result 'exit-code'.

The ceph-radosgw charm appear to never pick up the broker request response from ceph-mon:
2020-03-21 11:25:14 DEBUG juju-log mon:36: Request already sent but not complete, not sending new request

The response is only present on one of the unit to unit relations, but that may or may not be ok:
ubuntu@test:~$ juju run --unit ceph-radosgw/0 'relation-get -r mon:36 - ceph-mon/0'
auth: cephx
ceph-public-address: 10.5.0.38
egress-subnets: 10.5.0.38/32
fsid: f82f86bc-6b65-11ea-bf83-fa163e6453d2
ingress-address: 10.5.0.38
private-address: 10.5.0.38
rgw.juju-ddb957-zaza-fc6306dea031-6_key: AQAf+XVeUildAxAARIUZUmzoh4/zIfwBTQ5m1g==
ubuntu@test:~$ juju run --unit ceph-radosgw/0 'relation-get -r mon:36 - ceph-mon/1'
auth: cephx
broker-rsp-ceph-radosgw-0: '{"exit-code": 1, "stderr": "Unexpected error occurred
  while processing requests: {''api-version'': 1, ''ops'': [{''op'': ''create-pool'',
  ''name'': ''default.rgw.buckets.data'', ''replicas'': 3, ''pg_num'': None, ''weight'':
  20, ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.control'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.data.root'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.gc'', ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1,
  ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.log'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.intent-log'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.meta'', ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1,
  ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.usage'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.users.keys'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.users.email'', ''replicas'': 3, ''pg_num'': None, ''weight'':
  0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.users.swift'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.users.uid'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.buckets.extra'', ''replicas'': 3, ''pg_num'': None, ''weight'':
  1.0, ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.buckets.index'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 3.0, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''.rgw.root'', ''replicas'': 3, ''pg_num'': None, ''weight'':
  0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}], ''request-id'': ''f41b0e16-6b65-11ea-a7e5-fa163e452a2c''}"}'
ceph-public-address: 10.5.0.18
egress-subnets: 10.5.0.18/32
fsid: f82f86bc-6b65-11ea-bf83-fa163e6453d2
ingress-address: 10.5.0.18
private-address: 10.5.0.18
rgw.juju-ddb957-zaza-fc6306dea031-6_key: AQAf+XVeUildAxAARIUZUmzoh4/zIfwBTQ5m1g==
ubuntu@test:~$ juju run --unit ceph-radosgw/0 'relation-get -r mon:36 - ceph-mon/2'
auth: cephx
ceph-public-address: 10.5.0.4
egress-subnets: 10.5.0.4/32
fsid: f82f86bc-6b65-11ea-bf83-fa163e6453d2
ingress-address: 10.5.0.4
private-address: 10.5.0.4
rgw.juju-ddb957-zaza-fc6306dea031-6_key: AQAf+XVeUildAxAARIUZUmzoh4/zIfwBTQ5m1g==

Note that the 'broker-rsp-ceph-radosgw-0: '{"exit-code": 1, "stderr": "Unexpected error occurred
  while processing requests:' was caused by a bug in the Ceph Octopus PG autoscaling code re bug 1868587

Frode Nordahl (fnordahl)
summary: - Intermittent deploy failure
+ [Ussuri] Intermittent deploy failure
Frode Nordahl (fnordahl)
description: updated
Frode Nordahl (fnordahl)
summary: - [Ussuri] Intermittent deploy failure
+ Intermittent deploy failure
description: updated
Changed in charm-ceph-radosgw:
status: New → Triaged
importance: Undecided → High
Frode Nordahl (fnordahl)
summary: - Intermittent deploy failure
+ Intermittent deploy failure with certificates relation
Revision history for this message
Frode Nordahl (fnordahl) wrote : Re: Intermittent deploy failure with certificates relation
Changed in charm-ceph-radosgw:
assignee: nobody → Frode Nordahl (fnordahl)
milestone: none → 20.05
Frode Nordahl (fnordahl)
description: updated
Frode Nordahl (fnordahl)
summary: - Intermittent deploy failure with certificates relation
+ Intermittent deploy failure
Frode Nordahl (fnordahl)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714434

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-radosgw (master)

Change abandoned by Frode Nordahl (<email address hidden>) on branch: master
Review: https://review.opendev.org/714434
Reason: Root cause was https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1868587

Frode Nordahl (fnordahl)
description: updated
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Some extra information from the discourse topic on the same:

https://discourse.juju.is/t/unable-to-install-openstack-radoswg-stuck-in-blocked-state/2911/13

ubuntu@juju-ac0843-0-lxd-0:~$ journalctl -b |grep radosgw
May 03 18:45:17 juju-ac0843-0-lxd-0 systemd[1]: radosgw.service: Failed to reset devices.list: Operation not permitted
May 03 18:45:17 juju-ac0843-0-lxd-0 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
May 03 18:45:17 juju-ac0843-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 03 18:45:20 juju-ac0843-0-lxd-0 radosgw[230]: parse error setting 'debug_rgw' to '/5' (value must take the form N or N/M, where N and M are integers)
May 03 18:45:20 juju-ac0843-0-lxd-0 radosgw[230]: parse error setting 'err_to_syslog' to '' (Expected option value to be integer, got '')
May 03 18:45:20 juju-ac0843-0-lxd-0 radosgw[230]: parse error setting 'log_to_syslog' to '' (Expected option value to be integer, got '')

and:

> looking at /etc/ceph/ceph.conf, those variables were empty indeed. It seem something related to that.

David Ames (thedac)
Changed in charm-ceph-radosgw:
milestone: 20.05 → 20.08
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-radosgw (master)

Reviewed: https://review.opendev.org/714400
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-radosgw/commit/?id=d544c70912bbf54d04e437f5b0a0446099e83a65
Submitter: Zuul
Branch: master

commit d544c70912bbf54d04e437f5b0a0446099e83a65
Author: Frode Nordahl <email address hidden>
Date: Mon Mar 23 08:26:19 2020 +0100

    Determine default port based on presence of TLS configuration

    Fix intermittent deployment failure with TLS.

    Default to TLS in the functional test.

    The call to ``configure_https`` in identity_changed is remains
    from the time when Keystone provided certificates, remove it.

    Hold service down until keys are rendered.

    Change-Id: Ia16e6200520972c503102d80cda35e36daea82a2
    Closes-Bug: #1868387

Changed in charm-ceph-radosgw:
status: In Progress → Fix Committed
Revision history for this message
David Coronel (davecore) wrote :
Download full text (5.3 KiB)

I think I'm hitting this bug. I tried with cs:ceph-radosgw-286 and cs:~openstack-charmers-next/ceph-radosgw-363 but I end up with the same situation.

Unit Workload Agent Machine Public address Ports Message
ceph-radosgw/5* blocked idle 3/lxd/5 10.0.1.28 80/tcp Services not running that should be: <email address hidden>

root@juju-10aa60-3-lxd-5:/var/log# grep ERROR juju/unit-ceph-radosgw-5.log
2020-06-01 17:23:52 DEBUG config-changed ERROR: Site openstack_https_frontend does not exist!

root@juju-10aa60-3-lxd-5:/var/log# systemctl list-units | grep failed
● sys-kernel-config.mount loaded failed failed Kernel Configuration File System
● apache2.service loaded failed failed The Apache HTTP Server
● <email address hidden> loaded failed failed Ceph rados gateway
● systemd-modules-load.service loaded failed failed Load Kernel Modules

root@juju-10aa60-3-lxd-5:/var/log# journalctl -b |grep radosgw
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: <email address hidden>: Failed to reset devices.list: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 systemd[1]: Failed to set devices.allow on /system.slice/system-ceph\<email address hidden>: Operation not permitted
Jun 01 17:27:59 juju-10aa60-3-lxd-5 radosgw[20840]: 2020-06-01 17:27:59.315 7f50c23054c0 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-rgw.juju-10aa60-3-lxd-5/keyring: (2) No such file or directory
Jun 01 17:27...

Read more...

Revision history for this message
David Coronel (davecore) wrote :
Download full text (5.1 KiB)

And I also see the Unexpected error in broker-rsp-ceph-radosgw-5:

ubuntu@hevelius:~$ juju debug-log -i ceph-radosgw/5 --replay | grep -i "Request already sent"
unit-ceph-radosgw-5: 17:26:56 DEBUG unit.ceph-radosgw/5.juju-log mon:65: Request already sent but not complete, not sending new request

ubuntu@hevelius:~$ juju run --unit ceph-radosgw/5 'relation-get -r mon:65 - ceph-mon/0'
auth: cephx
ceph-public-address: 10.0.3.24
egress-subnets: 192.168.210.135/32
fsid: 4e0ece9e-a1d5-11ea-8976-00163e2267d7
ingress-address: 192.168.210.135
private-address: 192.168.210.135
radosgw_key: AQB6GNVelHpAJRAAGFQJyxi1hpOnaeuMdstXgQ==
rgw.juju-10aa60-3-lxd-5_key: AQAEOtVe1YbsDBAAqQHMZNpZUUZ+QJFDgaEyJQ==

ubuntu@hevelius:~$ juju run --unit ceph-radosgw/5 'relation-get -r mon:65 - ceph-mon/1'
auth: cephx
ceph-public-address: 10.0.3.25
egress-subnets: 192.168.210.137/32
fsid: 4e0ece9e-a1d5-11ea-8976-00163e2267d7
ingress-address: 192.168.210.137
private-address: 192.168.210.137
radosgw_key: AQB6GNVelHpAJRAAGFQJyxi1hpOnaeuMdstXgQ==
rgw.juju-10aa60-3-lxd-5_key: AQAEOtVe1YbsDBAAqQHMZNpZUUZ+QJFDgaEyJQ==

ubuntu@hevelius:~$ juju run --unit ceph-radosgw/5 'relation-get -r mon:65 - ceph-mon/2'
auth: cephx
broker-rsp-ceph-radosgw-5: '{"exit-code": 1, "stderr": "Unexpected error occurred
  while processing requests: {''api-version'': 1, ''ops'': [{''op'': ''create-pool'',
  ''name'': ''default.rgw.buckets.data'', ''replicas'': 3, ''pg_num'': None, ''weight'':
  20, ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.control'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.data.root'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.gc'', ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1,
  ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.log'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'':
  ''create-pool'', ''name'': ''default.rgw.intent-log'', ''replicas'': 3, ''pg_num'':
  None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'': None, ''app-name'':
  ''rgw'', ''max-bytes'': None, ''max-objects'': None}, {''op'': ''create-pool'',
  ''name'': ''default.rgw.meta'', ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1,
  ''group'': ''objects'', ''group-namespace'': None, ''app-name'': ''rgw'', ''max-bytes'':
  None, ''max-objects'': None}, {''op'': ''create-pool'', ''name'': ''default.rgw.usage'',
  ''replicas'': 3, ''pg_num'': None, ''weight'': 0.1, ''group'': ''objects'', ''group-namespace'':
  None, ''app-name'': ''rgw'', ''max-bytes'...

Read more...

Revision history for this message
David Coronel (davecore) wrote :

I just noticed I can't even do ceph status on ceph-mon/0:

ubuntu@hevelius:~/cpe-deployments$ juju ssh ceph-mon/0

ubuntu@juju-0a0531-0-lxd-0:~$ sudo ceph status

2020-06-02 16:23:33.869 7f4ce9280700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2020-06-02 16:23:33.869 7f4ce9280700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
[errno 2] error connecting to the cluster

ubuntu@juju-0a0531-0-lxd-0:~$ sudo -i

root@juju-0a0531-0-lxd-0:~# ls -l /etc/ceph/
total 4
lrwxrwxrwx 1 root root 27 Jun 1 21:38 ceph.conf -> /etc/alternatives/ceph.conf
-rw-r--r-- 1 root root 92 Apr 7 07:55 rbdmap

root@juju-0a0531-0-lxd-0:~# cat /etc/ceph/ceph.conf
[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx

mon host = 10.0.3.20 10.0.3.21 10.0.3.22
fsid = ff57c1e6-a44c-11ea-854b-00163ea779fd

log to syslog = false
err to syslog = false
clog to syslog = false
mon cluster log to syslog = false
debug mon = 1/5
debug osd = 1/5

# NOTE(jamespage):
# Disable object skew warnings as these only use
# the number of objects and not their size in the
# skew calculation.
mon pg warn max object skew = -1

public network =
cluster network =
public addr = 10.0.3.21
cluster addr = 10.0.4.21

[mon]
keyring = /var/lib/ceph/mon/$cluster-$id/keyring

[mds]
keyring = /var/lib/ceph/mds/$cluster-$id/keyring

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

ceph-radosgw rev 288 bug still exists

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (stable/20.05)

Fix proposed to branch: stable/20.05
Review: https://review.opendev.org/735886

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-radosgw (stable/20.05)

Reviewed: https://review.opendev.org/735886
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-radosgw/commit/?id=ec5ece070e8540670aab3173913fc739eed75a37
Submitter: Zuul
Branch: stable/20.05

commit ec5ece070e8540670aab3173913fc739eed75a37
Author: Frode Nordahl <email address hidden>
Date: Mon Mar 23 08:26:19 2020 +0100

    Determine default port based on presence of TLS configuration

    Fix intermittent deployment failure with TLS.

    Default to TLS in the functional test.

    The call to ``configure_https`` in identity_changed is remains
    from the time when Keystone provided certificates, remove it.

    Hold service down until keys are rendered.

    Change-Id: Ia16e6200520972c503102d80cda35e36daea82a2
    Closes-Bug: #1868387
    (cherry picked from commit d544c70912bbf54d04e437f5b0a0446099e83a65)

Changed in charm-ceph-radosgw:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.