While the apply is happening, ceph-mgr becomes unresponsive: Controller-0: 2019-07-21 03:19:37,274 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15) 2019-07-21 03:19:37,275 103010 INFO mgr-restful-plugin REST API ping failure count=0 2019-07-21 03:19:45,713 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits: ceph_mgr=0 < 3, ping=0 < 5. Report status OK 2019-07-21 03:19:45,713 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe 2019-07-21 03:19:45,714 103010 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid 2019-07-21 03:20:00,737 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits: ceph_mgr=0 < 3, ping=1 < 5. Report status OK 2019-07-21 03:20:00,738 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe 2019-07-21 03:20:00,738 102997 INFO mgr-restful-plugin Stop monitor with SIGTERM to process group 103010 2019-07-21 03:20:04,009 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15) 2019-07-21 03:20:04,009 103010 INFO mgr-restful-plugin REST API ping failure count=1 2019-07-21 03:20:05,743 102997 INFO mgr-restful-plugin Stop monitor with SIGKILL to process group 103010 2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Monitor stopped: pid=103010 2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Remove service pid file: path=/var/run/ceph/mgr-restful-plugin.pid 2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Close service socket and remove file: path=/var/run/ceph/mgr/mgr-restful-plugin.socket 2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Release service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock 2019-07-21 03:20:06,256 1360341 WARNING mgr-restful-plugin Disable urllib3 certifcates check 2019-07-21 03:20:06,256 1360341 INFO mgr-restful-plugin Take service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock 2019-07-21 03:20:06,340 1360341 INFO mgr-restful-plugin Create service socket 2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Remove existing socket files 2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Bind service socket: path=/var/run/ceph/mgr/mgr-restful-plugin.socket 2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Update service pid file: path=/var/run/ceph/mgr-restful-plugin.pid 2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Start monitor loop 2019-07-21 03:20:06,343 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid 2019-07-21 03:20:06,625 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-0 -o /var/run/ceph/mgr/ceph-controller-0/keyring 2019-07-21 03:20:06,901 1360416 INFO mgr-restful-plugin Stop unmanaged running ceph-mgr processes 2019-07-21 03:20:07,004 1360416 INFO mgr-restful-plugin Start ceph-mgr daemon 2019-07-21 03:20:22,021 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port 2019-07-21 03:20:22,302 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json 2019-07-21 03:20:25,577 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt 2019-07-21 03:20:25,831 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin 2019-07-21 03:20:26,099 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json 2019-07-21 03:20:26,363 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt Controller-1: 2019-07-21 03:19:00,752 92323 WARNING mgr-restful-plugin Disable urllib3 certifcates check 2019-07-21 03:19:00,752 92323 INFO mgr-restful-plugin Take service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock 2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Create service socket 2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Remove existing socket files 2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Bind service socket: path=/var/run/ceph/mgr/mgr-restful-plugin.socket 2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Update service pid file: path=/var/run/ceph/mgr-restful-plugin.pid 2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Start monitor loop 2019-07-21 03:19:00,803 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid 2019-07-21 03:19:01,075 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring 2019-07-21 03:19:01,345 92325 INFO mgr-restful-plugin Stop unmanaged running ceph-mgr processes 2019-07-21 03:19:01,411 92325 INFO mgr-restful-plugin Start ceph-mgr daemon 2019-07-21 03:19:16,429 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port 2019-07-21 03:19:16,698 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json 2019-07-21 03:19:19,972 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt 2019-07-21 03:19:20,252 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate 2019-07-21 03:19:20,253 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert 2019-07-21 03:19:50,258 92325 WARNING mgr-restful-plugin Command timeout: command=['/usr/bin/timeout', '30', '/usr/bin/ceph', 'restful', 'create-self-signed-cert'], timeout=30 2019-07-21 03:19:50,258 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid 2019-07-21 03:19:50,524 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring 2019-07-21 03:19:50,787 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port 2019-07-21 03:19:51,044 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json 2019-07-21 03:19:54,296 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt 2019-07-21 03:19:54,562 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate 2019-07-21 03:19:54,562 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert 2019-07-21 03:20:10,282 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin 2019-07-21 03:20:10,552 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json 2019-07-21 03:20:10,827 92325 WARNING mgr-restful-plugin Failed to start restful plugin: reason=missing expected key: 'restful' in ouput={} 2019-07-21 03:20:25,843 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid 2019-07-21 03:20:26,106 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring 2019-07-21 03:20:26,373 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port 2019-07-21 03:20:26,650 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json 2019-07-21 03:20:29,926 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt 2019-07-21 03:20:30,202 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate 2019-07-21 03:20:30,202 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert 2019-07-21 03:20:30,743 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin 2019-07-21 03:20:31,018 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json 2019-07-21 03:20:31,286 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt Sm fails to audit and goes on to restart the restful plugin, sysinv, mtc, etc: 2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3020> INFO: sm[100232]: sm_service_audit.c(175): Action (audit-enabled) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (mgr-restful-plugin), reason_text=, exit_code=-65534. 2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3021> INFO: sm[100232]: sm_service_action.c(345): Aborting service (mgr-restful-plugin) with kill signal, pid=1350443. 2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3022> INFO: sm[100232]: sm_service_audit.c(75): Max retires not met for action (audit-enabled) of service (mgr-restful-plugin), attempts=1. 2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.196] log<3023> ERROR: sm[100232]: sm_service_audit.c(228): Failed to query service based on pid (1350443), error=NOT_FOUND. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3024> INFO: sm[100232]: sm_service_audit.c(175): Action (audit-enabled) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (mgr-restful-plugin), reason_text=, exit_code=-65534. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3025> INFO: sm[100232]: sm_service_action.c(345): Aborting service (mgr-restful-plugin) with kill signal, pid=1354071. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3026> INFO: sm[100232]: sm_service_audit.c(70): Max retires met for action (audit-enabled) of service (mgr-restful-plugin), attempts=2. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3027> INFO: sm[100232]: sm_service_action.c(98): Plugin (/etc/init.d/mgr-restful-plugin) has been changed, was=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, now=d0c81ce9be8fa1c0748d3fb6f9550249172f2a6628abbd66f11e841e8953338bad4421ae8c54b10c101c0b23d4f930bb5baff79205c3c3353fec557fdb32137f. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.201] log<3028> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358501) for service (mgr-restful-plugin), flag (0). 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.201] log<3029> INFO: sm[100232]: sm_service_fsm.c(1032): Service (mgr-restful-plugin) received event (audit-failed) was in the enabled-active state and is now in the disabling state. 2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.203] log<3030> ERROR: sm[100232]: sm_service_audit.c(228): Failed to query service based on pid (1354071), error=NOT_FOUND. 2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.673] log<3031> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358639) for service (ceph-manager), flag (0). 2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.673] log<3032> INFO: sm[100232]: sm_service_fsm.c(1032): Service (ceph-manager) received event (disable) was in the enabled-active state and is now in the disabling state. 2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.674] log<3033> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358640) for service (sysinv-conductor), flag (0). When sysinv-conductor is being brought down it kills the monitoring thread for the armada apply: 2019-07-21 03:20:01.084 2015130 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app stx-openstack And when it comes back up it does housekeeping and sets it to failed The restful-plugin comes back online ok: 2019-07-21T03:20:06.000 controller-0 sm: debug time[74043.239] log<3096> INFO: sm[100232]: sm_service_fsm.c(1032): Service (ceph-manager) received event (enable-success) was in the enabling state and is now in the enabled-active state