StarlingX

stx-openstack in apply-failed after lock/unlock standby controller

Bug #1837581 reported by Yang Liu on 2019-07-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Daniel Badea

Bug Description

Brief Description
-----------------
After lock/unlock controller-1, stx-openstack is in apply-failed status.

Severity
--------
Major

Steps to Reproduce
------------------
- stx-openstack is applied
- system host-lock controller-1
- system host-unlock controller-1
- wait for controller-1 to become enabled/available in system host-list
- watch for system application-list

Expected Behavior
------------------
- stx-openstack is reapplied after unlock, and it should reach applied status

Actual Behavior
----------------
- stx-openstack in apply-failed status after a few minutes

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Dedicated storage
Lab-name: wcp113-121

Branch/Pull Time/Commit
-----------------------
stx master as of 20190720T013000Z

Last Pass
---------
Same system same load. It seems to be an intermittent issue.

Timestamp/Logs
--------------
# Unlock requested
[2019-07-21 03:12:57,041] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1'

# apply-failed
[2019-07-21 03:19:29,459] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
+---------------------+--------------------------------+-------------------------------+--------------------+----------+---------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+----------+---------------------------------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-kube-system-ingress, overall completion: 4.0% |
+---------------------+--------------------------------+-------------------------------+--------------------+----------+---------------------------------------------------------------------+

# mariadb pod is in crashloopbackoff for a few minutes, but not sure if this is related to the apply-failed at all.
[2019-07-21 03:21:34,134] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pods --all-namespaces -o wide | grep -v -e Running -e Completed'
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack mariadb-server-1 0/1 CrashLoopBackOff 2 3m29s 172.16.166.141 controller-1 <none> <none>

[2019-07-21 03:21:34,501] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pods --all-namespaces -o wide | grep -v -e Running -e Completed -e NAMESPACE | awk '{system("kubectl describe pods -n "$1" "$2)}''
....
Tolerations: node.kubernetes.io/not-ready:NoExecute for 30s
                 node.kubernetes.io/unreachable:NoExecute for 30s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 3m25s (x4 over 3m28s) default-scheduler 0/7 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taints that the pod didn't tolerate, 5 node(s) didn't match node selector.
  Normal Scheduled 2m35s default-scheduler Successfully assigned openstack/mariadb-server-1 to controller-1
  Normal SuccessfulAttachVolume 2m35s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-bb00e842-aaf7-11e9-a776-3cfdfeac4dd8"
  Normal Pulled 86s kubelet, controller-1 Container image "registry.local:9001/quay.io/stackanetes/kubernetes-entrypoint:v0.3.1" already present on machine
  Normal Created 86s kubelet, controller-1 Created container
  Normal Started 85s kubelet, controller-1 Started container
  Normal Pulled 79s kubelet, controller-1 Container image "registry.local:9001/docker.io/openstackhelm/mariadb:10.2.18" already present on machine
  Normal Created 79s kubelet, controller-1 Created container
  Normal Started 79s kubelet, controller-1 Started container
  Normal Pulled 24s (x3 over 70s) kubelet, controller-1 Container image "registry.local:9001/docker.io/openstackhelm/mariadb:10.2.18" already present on machine
  Normal Created 23s (x3 over 62s) kubelet, controller-1 Created container
  Normal Started 22s (x3 over 62s) kubelet, controller-1 Started container
  Warning BackOff 5s (x3 over 39s) kubelet, controller-1 Back-off restarting failed container

Test Activity
-------------
Regression Testing

Tags:

Revision history for this message

Yang Liu (yliu12) wrote on 2019-07-23:

ALL_NODES_20190721.033902-aa Edit (95.4 MiB, application/octet-stream)

Revision history for this message

Yang Liu (yliu12) wrote on 2019-07-23:

ALL_NODES_20190721.033902_apply-failed.tar Edit (176.2 MiB, application/x-tar)

Logs are split into two parts. Use cat cmd to combine them.

Revision history for this message

Tyler Smith (tyler.smith) wrote on 2019-07-23:

Download full text (11.7 KiB)

While the apply is happening, ceph-mgr becomes unresponsive:

Controller-0:

2019-07-21 03:19:37,274 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15)
2019-07-21 03:19:37,275 103010 INFO mgr-restful-plugin REST API ping failure count=0
2019-07-21 03:19:45,713 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits: ceph_mgr=0 < 3, ping=0 < 5. Report status OK
2019-07-21 03:19:45,713 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe
2019-07-21 03:19:45,714 103010 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:20:00,737 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits: ceph_mgr=0 < 3, ping=1 < 5. Report status OK
2019-07-21 03:20:00,738 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe
2019-07-21 03:20:00,738 102997 INFO mgr-restful-plugin Stop monitor with SIGTERM to process group 103010
2019-07-21 03:20:04,009 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15)
2019-07-21 03:20:04,009 103010 INFO mgr-restful-plugin REST API ping failure count=1
2019-07-21 03:20:05,743 102997 INFO mgr-restful-plugin Stop monitor with SIGKILL to process group 103010
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Monitor stopped: pid=103010
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Remove service pid file: path=/var/run/ceph/mgr-restful-plugin.pid
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Close service socket and remove file: path=/var/run/ceph/mgr/mgr-restful-plugin.socket
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Release service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock
2019-07-21 03:20:06,256 1360341 WARNING mgr-restful-plugin Disable urllib3 certifcates check
2019-07-21 03:20:06,256 1360341 INFO mgr-restful-plugin Take service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock
2019-07-21 03:20:06,340 1360341 INFO mgr-restful-plugin Create service socket
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Remove existing socket files
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Bind service socket: path=/var/run/ceph/mgr/mgr-restful-plugin.socket
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Update service pid file: path=/var/run/ceph/mgr-restful-plugin.pid
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Start monitor loop
2019-07-21 03:20:06,343 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:20:06,625 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-0 -o /var/run/ceph/mgr/ceph-controller-0/keyring
2019-07-21 03:20:06,901 1360416 INFO mgr-restful-plugin Stop unmanaged running ceph-mgr processes
2019-07-21 03:20:07,004 1360416 INFO mgr-restful-plugin Start ceph-mgr daemon
2019-07-21 03:20:22,021 1360416 INFO mgr-restful-plugin Run ...

While the apply is happening, ceph-mgr becomes unresponsive:

Controller-0:

2019-07-21 03:19:37,274 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15)
2019-07-21 03:19:37,275 103010 INFO mgr-restful-plugin REST API ping failure count=0
2019-07-21 03:19:45,713 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits:  ceph_mgr=0 < 3, ping=0 < 5. Report status OK
2019-07-21 03:19:45,713 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe
2019-07-21 03:19:45,714 103010 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:20:00,737 102997 INFO mgr-restful-plugin Restful plugin does not respond but failure count is within acceptable limits:  ceph_mgr=0 < 3, ping=1 < 5. Report status OK
2019-07-21 03:20:00,738 102997 WARNING mgr-restful-plugin Failed to send response back. request=status, response=OK, reason=[Errno 32] Broken pipe
2019-07-21 03:20:00,738 102997 INFO mgr-restful-plugin Stop monitor with SIGTERM to process group 103010
2019-07-21 03:20:04,009 103010 WARNING mgr-restful-plugin REST API ping failed: reason=HTTPSConnectionPool(host='controller-0', port=5001): Read timed out. (read timeout=15)
2019-07-21 03:20:04,009 103010 INFO mgr-restful-plugin REST API ping failure count=1
2019-07-21 03:20:05,743 102997 INFO mgr-restful-plugin Stop monitor with SIGKILL to process group 103010
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Monitor stopped: pid=103010
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Remove service pid file: path=/var/run/ceph/mgr-restful-plugin.pid
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Close service socket and remove file: path=/var/run/ceph/mgr/mgr-restful-plugin.socket
2019-07-21 03:20:05,746 102997 INFO mgr-restful-plugin Release service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock
2019-07-21 03:20:06,256 1360341 WARNING mgr-restful-plugin Disable urllib3 certifcates check
2019-07-21 03:20:06,256 1360341 INFO mgr-restful-plugin Take service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock
2019-07-21 03:20:06,340 1360341 INFO mgr-restful-plugin Create service socket
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Remove existing socket files
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Bind service socket: path=/var/run/ceph/mgr/mgr-restful-plugin.socket
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Update service pid file: path=/var/run/ceph/mgr-restful-plugin.pid
2019-07-21 03:20:06,341 1360341 INFO mgr-restful-plugin Start monitor loop
2019-07-21 03:20:06,343 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:20:06,625 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-0 -o /var/run/ceph/mgr/ceph-controller-0/keyring
2019-07-21 03:20:06,901 1360416 INFO mgr-restful-plugin Stop unmanaged running ceph-mgr processes
2019-07-21 03:20:07,004 1360416 INFO mgr-restful-plugin Start ceph-mgr daemon
2019-07-21 03:20:22,021 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port
2019-07-21 03:20:22,302 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json
2019-07-21 03:20:25,577 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt
2019-07-21 03:20:25,831 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin
2019-07-21 03:20:26,099 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json
2019-07-21 03:20:26,363 1360416 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt

Controller-1:

2019-07-21 03:19:00,752 92323 WARNING mgr-restful-plugin Disable urllib3 certifcates check
2019-07-21 03:19:00,752 92323 INFO mgr-restful-plugin Take service lock: path=/var/run/ceph/mgr/mgr-restful-plugin.lock
2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Create service socket
2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Remove existing socket files
2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Bind service socket: path=/var/run/ceph/mgr/mgr-restful-plugin.socket
2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Update service pid file: path=/var/run/ceph/mgr-restful-plugin.pid
2019-07-21 03:19:00,801 92323 INFO mgr-restful-plugin Start monitor loop
2019-07-21 03:19:00,803 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:19:01,075 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring
2019-07-21 03:19:01,345 92325 INFO mgr-restful-plugin Stop unmanaged running ceph-mgr processes
2019-07-21 03:19:01,411 92325 INFO mgr-restful-plugin Start ceph-mgr daemon
2019-07-21 03:19:16,429 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port
2019-07-21 03:19:16,698 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json
2019-07-21 03:19:19,972 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt
2019-07-21 03:19:20,252 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate
2019-07-21 03:19:20,253 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert
2019-07-21 03:19:50,258 92325 WARNING mgr-restful-plugin Command timeout: command=['/usr/bin/timeout', '30', '/usr/bin/ceph', 'restful', 'create-self-signed-cert'], timeout=30
2019-07-21 03:19:50,258 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:19:50,524 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring
2019-07-21 03:19:50,787 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port
2019-07-21 03:19:51,044 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json
2019-07-21 03:19:54,296 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt
2019-07-21 03:19:54,562 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate
2019-07-21 03:19:54,562 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert
2019-07-21 03:20:10,282 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin
2019-07-21 03:20:10,552 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json
2019-07-21 03:20:10,827 92325 WARNING mgr-restful-plugin Failed to start restful plugin: reason=missing expected key: 'restful' in ouput={}
2019-07-21 03:20:25,843 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph fsid
2019-07-21 03:20:26,106 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph auth get mgr.controller-1 -o /var/run/ceph/mgr/ceph-controller-1/keyring
2019-07-21 03:20:26,373 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get config/mgr/mgr/restful/server_port
2019-07-21 03:20:26,650 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr module ls --format json
2019-07-21 03:20:29,926 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-1/crt
2019-07-21 03:20:30,202 92325 INFO mgr-restful-plugin Create restful plugin self signed certificate
2019-07-21 03:20:30,202 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph restful create-self-signed-cert
2019-07-21 03:20:30,743 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/keys/admin
2019-07-21 03:20:31,018 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph mgr services --format json
2019-07-21 03:20:31,286 92325 INFO mgr-restful-plugin Run command: /usr/bin/ceph config-key get mgr/restful/controller-0/crt

Sm fails to audit and goes on to restart the restful plugin, sysinv, mtc, etc:

2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3020> INFO: sm[100232]: sm_service_audit.c(175): Action (audit-enabled) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (mgr-restful-plugin), reason_text=, exit_code=-65534.
2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3021> INFO: sm[100232]: sm_service_action.c(345): Aborting service (mgr-restful-plugin) with kill signal, pid=1350443.
2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.194] log<3022> INFO: sm[100232]: sm_service_audit.c(75): Max retires not met for action (audit-enabled) of service (mgr-restful-plugin), attempts=1.
2019-07-21T03:19:45.000 controller-0 sm: debug time[74022.196] log<3023> ERROR: sm[100232]: sm_service_audit.c(228): Failed to query service based on pid (1350443), error=NOT_FOUND.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3024> INFO: sm[100232]: sm_service_audit.c(175): Action (audit-enabled) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (mgr-restful-plugin), reason_text=, exit_code=-65534.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3025> INFO: sm[100232]: sm_service_action.c(345): Aborting service (mgr-restful-plugin) with kill signal, pid=1354071.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3026> INFO: sm[100232]: sm_service_audit.c(70): Max retires met for action (audit-enabled) of service (mgr-restful-plugin), attempts=2.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.200] log<3027> INFO: sm[100232]: sm_service_action.c(98): Plugin (/etc/init.d/mgr-restful-plugin) has been changed, was=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, now=d0c81ce9be8fa1c0748d3fb6f9550249172f2a6628abbd66f11e841e8953338bad4421ae8c54b10c101c0b23d4f930bb5baff79205c3c3353fec557fdb32137f.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.201] log<3028> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358501) for service (mgr-restful-plugin), flag (0).
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.201] log<3029> INFO: sm[100232]: sm_service_fsm.c(1032): Service (mgr-restful-plugin) received event (audit-failed) was in the enabled-active state and is now in the disabling state.
2019-07-21T03:20:00.000 controller-0 sm: debug time[74037.203] log<3030> ERROR: sm[100232]: sm_service_audit.c(228): Failed to query service based on pid (1354071), error=NOT_FOUND.
2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.673] log<3031> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358639) for service (ceph-manager), flag (0).
2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.673] log<3032> INFO: sm[100232]: sm_service_fsm.c(1032): Service (ceph-manager) received event (disable) was in the enabled-active state and is now in the disabling state.
2019-07-21T03:20:01.000 controller-0 sm: debug time[74037.674] log<3033> INFO: sm[100232]: sm_service_disable.c(380): Started disable action (1358640) for service (sysinv-conductor), flag (0).

When sysinv-conductor is being brought down it kills the monitoring thread for the armada apply:

2019-07-21 03:20:01.084 2015130 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app stx-openstack

And when it comes back up it does housekeeping and sets it to failed
 
The restful-plugin comes back online ok:
2019-07-21T03:20:06.000 controller-0 sm: debug time[74043.239] log<3096> INFO: sm[100232]: sm_service_fsm.c(1032): Service (ceph-manager) received event (enable-success) was in the enabling state and is now in the enabled-active state

Maria Guadalupe Perez Ibara (maria-gp) on 2019-07-23

tags:

added: stx.regression

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-07-24:

Marking as stx.2.0 / medium priority - intermittent issue affecting openstack application apply stability.

tags:	added: stx.2.0 stx.storage
Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
assignee:	nobody → Daniel Badea (daniel.badea)

Daniel Badea (daniel.badea) on 2019-07-25

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-31: Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/673817

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-01: Fix merged to integ (master)

Reviewed: https://review.opendev.org/673817
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=d409d78cc1e8d2cb3a9d3b15ef641656487fd078
Submitter: Zuul
Branch: master

commit d409d78cc1e8d2cb3a9d3b15ef641656487fd078
Author: Daniel Badea <email address hidden>
Date: Wed Jul 31 12:47:33 2019 +0000

ceph: mgr-restful-plugin restarts on controller unlock

    When standby controller is unlocked its mgr-restful-plugin
    service starts and generates node specific self-signed
    certificates to be used by the restful plugin. This operation
    triggers a restart of the "active" mgr restful plugin
    which in turn causes Ceph REST API requests to fail.

    This failure is handled on the active controller by
    restarting the service. This happens while stx-openstack
    is reapplied and is the reason why mariadb pod fails to start.

    Change ceph-mgr and restful plugin config and startup
    procedure so a secondary ceph-mgr service doesn't disrupt
    the active one.

    Closes-Bug: 1837581
    Change-Id: Id8e5e56d48669498202ed319a9aad68365b51f23
    Signed-off-by: Daniel Badea <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.