[MultiOS][Yocto] ceph failed sometime with: CephMgrMissingRestfulService: Missing restful service

Bug #1905376 reported by Jackie Huang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Committed
Low
Jackie Huang

Bug Description

Brief Description
-----------------
ceph failed sometime with: CephMgrMissingRestfulService: Missing restful service

Severity
--------
Major

Steps to Reproduce
------------------
1. build the image according to https://opendev.org/starlingx/meta-starlingx/src/branch/master/README.md
2. Install AIO duplex controller-0 with the built out image
3. run ansiple playbootk
4. configure controller-0 after bootstrap and unlock
5. check ceph logs after controller-0 rebooted

Expected Behavior
------------------
ceph work without unexpected errors

Actual Behavior
----------------

In /var/log/ceph-manager.log

2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor [-] Error running periodic monitoring of ceph status, will retry in 60s: CephMgrMissingRestfulService: Missing restful service. Available services: {}
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor Traceback (most recent call last):
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/ceph_manager/monitor.py", line 185, in run
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor self.ceph_poll_status()
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/ceph_manager/monitor.py", line 233, in ceph_poll_status
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor self._report_alarm_osds_health()
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/ceph_manager/monitor.py", line 585, in _report_alarm_osds_health
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor response, osd_tree = self.service.ceph_api.osd_tree(body='json')
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/cephclient/client.py", line 2550, in osd_tree
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor return self._request('osd tree', **kwargs)
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/cephclient/client.py", line 251, in _request
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor self._get_service_url()
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor File "/usr/lib/python2.7/site-packages/cephclient/client.py", line 117, in _get_service_url
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor status.get('services', ''))
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor CephMgrMissingRestfulService: Missing restful service. Available services: {}
2020-11-18 08:41:11.218 96060 ERROR ceph_manager.monitor

in /var/log/ceph/ceph-mgr.controller-0.log

2020-11-13 04:05:36.263 7fbfc77f6700 1 mgr[restful] Unknown request '140461628590608:0'
2020-11-13 04:06:08.203 7fbfd3fff700 -1 received signal: Terminated from /usr/bin/python /etc/init.d/mgr-restful-plugin start (PID: 355859) UID: 0
2020-11-13 04:06:08.203 7fbfd3fff700 -1 mgr handle_signal *** Got signal Terminated ***
2020-11-13 04:06:19.654 7f4858b05340 0 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process ceph-mgr, pid 506030

Reproducibility
---------------
Reproduceble

System Configuration
--------------------
Two node system, All-in-one duplex

Branch/Pull Time/Commit
-----------------------
Branch: master
Time: Nov 2 2020
Commit: 587e1dbcf2633770a9d6717fe5f84ab6b08de2bd

Last Pass
---------

Timestamp/Logs
--------------

Test Activity
-------------

Workaround
----------

Changed in starlingx:
assignee: nobody → Jackie Huang (jackie-huang)
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Low / doesn't gate the next release as this is prep work for multi-os support which is not committed for stx.5.0

Changed in starlingx:
importance: Undecided → Critical
importance: Critical → High
importance: High → Low
Revision history for this message
Jackie Huang (jackie-huang) wrote :
Changed in starlingx:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.