StarlingX

Bug #1843082
Comment #5

Comment 5 for bug 1843082

Revision history for this message

John Kung (john-kung) wrote on 2019-09-06: Re: IPv6: lock host stuck during patch orchestration

The following host-lock attempt (pid 106416) did not progress pass the Ceph get_monitors_status() check:

2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_start_2019-09-06-20-38-50 patch
2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 1. delta_handle ['action']
2019-09-06 20:38:50.965 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost check_lock
2019-09-06 20:38:50.966 106416 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost check_lock_controller
2019-09-06 20:38:51.183 106416 INFO sysinv.common.ceph [-] Active ceph monitors in inventory = [u'controller-0', u'controller-1', u'compute-0'] << This is the last log for this pid. get_monitors_status()

Pid appears stuck within method: get_monitors_status(). The missing log for pid 106416, after it attempts self._osd_quorum_names(), should indicate a log: " Active ceph monitors in ceph cluster" which is missing.

While ceph status indicates the following at time of host-lock attempt on controller-1:
[sysadmin@controller-1 sysinv(keystone_admin)]$ ceph -s
  cluster:
    id: 10100cb2-2e80-4dd5-a759-68de2ac873fc
    health: HEALTH_WARN
            Reduced data availability: 32 pgs stale

  services:
    mon: 3 daemons, quorum controller-0,controller-1,compute-0
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in

  data:
    pools: 1 pools, 64 pgs
    objects: 0 objects, 0 B
    usage: 217 MiB used, 892 GiB / 892 GiB avail
    pgs: 32 active+clean
             32 stale+active+clean

Investigation is required into ceph api (and reachability to it) to determine if it could be blocking or failing.