Activity log for bug #2062405

Date Who What changed Old value New value Message
2024-04-18 18:52:25 Ionut-Madalin Balutoiu bug added bug
2024-04-18 18:55:46 Ionut-Madalin Balutoiu description If multisite replication feature is enabled, the "ceph-radosgw" Juju application (primary or secondary) cannot be scaled out. See this doc for the multi-site replication feature details: https://ubuntu.com/ceph/docs/setting-up-multi-site After the multi-site replication is established via the following Juju relation: ``` juju relate primary-ceph-radosgw:primary secondary-ceph-radosgw:secondary ``` any scale-out operation via the following commands will fail: ``` juju add-unit primary-ceph-radosgw ``` or ``` juju add-unit secondary-ceph-radosgw ``` This is the error from "juju debug-log UNIT_ID" of any new Juju unit: ``` Traceback (most recent call last): File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/leader-settings-changed", line 1278, in <module> assess_status(CONFIGS) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/utils.py", line 335, in assess_status assess_status_func(configs)() File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1828, in _assess_status_func state, message = _determine_os_workload_status(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1042, in _determine_os_workload_status state, message = _ows_check_charm_func( File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1206, in _ows_check_charm_func charm_state, charm_message = charm_func_with_configs() File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1043, in <lambda> state, message, lambda: charm_func(configs)) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/utils.py", line 248, in check_optional_config_and_relations if not multisite.is_multisite_configured(config('zone'), File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/multisite.py", line 761, in is_multisite_configured local_zones = list_zones() File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/core/decorators.py", line 40, in _retry_on_exception_inner_2 return f(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/multisite.py", line 150, in list_zones _zones = _list('zone') File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/multisite.py", line 97, in _list result = json.loads(_check_output(cmd)) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/charmhelpers/core/decorators.py", line 40, in _retry_on_exception_inner_2 return f(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-3/charm/hooks/multisite.py", line 58, in _check_output return subprocess.check_output(cmd).decode('UTF-8') File "/usr/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['radosgw-admin', '--id=rgw.juju-7ceee7-14', 'zone', 'list']' returned non-zero exit status 1. ``` This is happening because the multi-site functions are part of `check_optional_config_and_relations`, which is called by `assess_status` after every successful hook in the main hook entrypoint: ``` if __name__ == '__main__': try: hooks.execute(sys.argv) except UnregisteredHookError as e: log('Unknown hook {} - skipping.'.format(e)) except ValueError as e: # Handle any invalid configuration values status_set(WORKLOAD_STATES.BLOCKED, str(e)) else: assess_status(CONFIGS) ``` It seems that the method `check_optional_config_and_relations` doesn't return early if the unit is not ready for service (ceph conf and keyring files are not created yet). If multisite replication feature is enabled, the "ceph-radosgw" Juju application (primary or secondary) cannot be scaled out. See this doc for the multi-site replication feature details: https://ubuntu.com/ceph/docs/setting-up-multi-site After the multi-site replication is established via the following Juju relation: ``` juju relate primary-ceph-radosgw:primary secondary-ceph-radosgw:secondary ``` any scale-out operation via the following commands will fail: ``` juju add-unit primary-ceph-radosgw ``` or ``` juju add-unit secondary-ceph-radosgw ``` This is the error from "juju debug-log UNIT_ID" of any new Juju unit: ``` 2024-04-18T18:55:00.016+0000 7f5d891b1080 -1 Errors while parsing config file! 2024-04-18T18:55:00.016+0000 7f5d891b1080 -1 can't open ceph.conf: (2) No such file or directory unable to get monitor info from DNS SRV with service name: ceph-mon 2024-04-18T18:55:00.072+0000 7f5d891b1080 -1 failed for service _ceph-mon._tcp 2024-04-18T18:55:00.072+0000 7f5d891b1080 -1 monclient: get_monmap_and_config cannot identify monitors to contact failed to fetch mon config (--no-mon-config to skip) Traceback (most recent call last): File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/leader-settings-changed", line 1210, in <module> assess_status(CONFIGS) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/utils.py", line 334, in assess_status assess_status_func(configs)() File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1828, in _assess_status_func state, message = _determine_os_workload_status(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1042, in _determine_os_workload_status state, message = _ows_check_charm_func( File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1206, in _ows_check_charm_func charm_state, charm_message = charm_func_with_configs() File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1043, in <lambda> state, message, lambda: charm_func(configs)) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/utils.py", line 249, in check_optional_config_and_relations if not multisite.is_multisite_configured(config('zone'), File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/multisite.py", line 701, in is_multisite_configured local_zones = list_zones() File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/core/decorators.py", line 40, in _retry_on_exception_inner_2 return f(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/multisite.py", line 125, in list_zones _zones = _list('zone') File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/multisite.py", line 72, in _list result = json.loads(_check_output(cmd)) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/core/decorators.py", line 40, in _retry_on_exception_inner_2 return f(*args, **kwargs) File "/var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/multisite.py", line 33, in _check_output return subprocess.check_output(cmd).decode('UTF-8') File "/usr/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['radosgw-admin', '--id=rgw.juju-bafcdf-11', 'zone', 'list']' returned non-zero exit status 1. ``` This is happening because the multi-site functions are part of `check_optional_config_and_relations`, which is called by `assess_status` after every successful hook in the main hook entrypoint: ``` if __name__ == '__main__':     try:         hooks.execute(sys.argv)     except UnregisteredHookError as e:         log('Unknown hook {} - skipping.'.format(e))     except ValueError as e:         # Handle any invalid configuration values         status_set(WORKLOAD_STATES.BLOCKED, str(e))     else:         assess_status(CONFIGS) ``` It seems that the method `check_optional_config_and_relations` doesn't return early if the unit is not ready for service (ceph conf and keyring files are not created yet).
2024-04-18 19:09:35 OpenStack Infra charm-ceph-radosgw: status New In Progress
2024-04-18 19:10:21 Ionut-Madalin Balutoiu charm-ceph-radosgw: assignee Ionut-Madalin Balutoiu (ionutbalutoiu)
2024-04-18 20:02:35 Chris Valean bug added subscriber Chris Valean
2024-04-18 20:03:50 Chris Valean charm-ceph-radosgw: status In Progress Confirmed
2024-04-29 08:38:51 OpenStack Infra charm-ceph-radosgw: status Confirmed Fix Committed