The following host-lock attempt (pid 106416) did not progress pass the Ceph get_monitors_status() check:
2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_start_2019-09-06-20-38-50 patch
2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 1. delta_handle ['action']
2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost check_lock
2019-09-06 20:38:50.966 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost check_lock_controller
2019-09-06 20:38:51.183 106416 INFO sysinv.common.ceph [-] Active ceph monitors in inventory = [u'controller-0', u'controller-1', u'compute-0'] << This is the last log for this pid. get_monitors_status()
Pid appears stuck within method: get_monitors_status(). The missing log for pid 106416, after it attempts self._osd_quorum_names(), should indicate a log: " Active ceph monitors in ceph cluster" which is missing.
While ceph status indicates the following at time of host-lock attempt on controller-1:
[sysadmin@controller-1 sysinv(keystone_admin)]$ ceph -s
cluster:
id: 10100cb2-2e80-4dd5-a759-68de2ac873fc
health: HEALTH_WARN
Reduced data availability: 32 pgs stale
services:
mon: 3 daemons, quorum controller-0,controller-1,compute-0
mgr: controller-0(active), standbys: controller-1
osd: 2 osds: 2 up, 2 in
The following host-lock attempt (pid 106416) did not progress pass the Ceph get_monitors_ status( ) check:
2019-09-06 20:38:50.965 106416 INFO sysinv. api.controllers .v1.host [-] controller-0 ihost_patch_ start_2019- 09-06-20- 38-50 patch api.controllers .v1.host [-] controller-0 1. delta_handle ['action'] api.controllers .v1.host [-] controller-0 ihost check_lock api.controllers .v1.host [-] controller-0 ihost check_lock_ controller status( )
2019-09-06 20:38:50.965 106416 INFO sysinv.
2019-09-06 20:38:50.965 106416 INFO sysinv.
2019-09-06 20:38:50.966 106416 INFO sysinv.
2019-09-06 20:38:51.183 106416 INFO sysinv.common.ceph [-] Active ceph monitors in inventory = [u'controller-0', u'controller-1', u'compute-0'] << This is the last log for this pid. get_monitors_
Pid appears stuck within method: get_monitors_ status( ). The missing log for pid 106416, after it attempts self._osd_ quorum_ names() , should indicate a log: " Active ceph monitors in ceph cluster" which is missing.
While ceph status indicates the following at time of host-lock attempt on controller-1: controller- 1 sysinv( keystone_ admin)] $ ceph -s 2e80-4dd5- a759-68de2ac873 fc
[sysadmin@
cluster:
id: 10100cb2-
health: HEALTH_WARN
Reduced data availability: 32 pgs stale
services: 0,controller- 1,compute- 0 0(active) , standbys: controller-1
mon: 3 daemons, quorum controller-
mgr: controller-
osd: 2 osds: 2 up, 2 in
data:
pools: 1 pools, 64 pgs
objects: 0 objects, 0 B
usage: 217 MiB used, 892 GiB / 892 GiB avail
pgs: 32 active+clean
32 stale+active+clean
Investigation is required into ceph api (and reachability to it) to determine if it could be blocking or failing.