Ceph driver doesn't check connection on set up

Bug #1640169 reported by Jan Provaznik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
Undecided
Jan Provaznik

Bug Description

If ceph driver is misconfigured or ceph backend is not accessible when starting manila-share service, then the service keeps crashing and respawning with error:

2016-11-01 12:13:09.507 1507 INFO manila.share.manager [req-a6967728-58e5-452f-b72d-ee994fa3c50c - - - - -] Updating share status
2016-11-01 12:13:09.508 1507 INFO manila.share.drivers.cephfs.cephfs_native [req-a6967728-58e5-452f-b72d-ee994fa3c50c - - - - -] [CEPHFS1}] Ceph client found, connecting...
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service [req-a6967728-58e5-452f-b72d-ee994fa3c50c - - - - -] Error starting thread.
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service Traceback (most recent call last):
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service service.start()
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/service.py", line 118, in start
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self.manager.init_host()
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 163, in wrapped
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service return f(self, *args, **kwargs)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 348, in init_host
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self.publish_service_capabilities(ctxt)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 163, in wrapped
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service return f(self, *args, **kwargs)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/utils.py", line 617, in wrapper
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service return func(self, *args, **kwargs)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 2632, in publish_service_capabilities
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self._report_driver_status(context)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/utils.py", line 617, in wrapper
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service return func(self, *args, **kwargs)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 2596, in _report_driver_status
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service share_stats = self.driver.get_share_stats(refresh=True)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/driver.py", line 673, in get_share_stats
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self._update_share_stats()
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/drivers/cephfs/cephfs_native.py", line 86, in _update_share_stats
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service stats = self.volume_client.rados.get_cluster_stats()
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/manila/share/drivers/cephfs/cephfs_native.py", line 151, in volume_client
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self._volume_client.connect(premount_evict=premount_evict)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/ceph_volume_client.py", line 302, in connect
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service self.rados.connect()
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service ObjectNotFound: error connecting to the cluster
2016-11-01 12:13:09.619 1507 ERROR oslo_service.service

The reason is that ceph driver doesn't try to connect to the ceph backend when setting up the driver in manila/share/manager.py (do_setup & check_for_setup_error) and real connection is done later in init_host phase (publish_service_capabilities) when an exception is not handled.

It would be better to check that the connection works when setting up driver. If it fails, host initialization is aborted, same as for other drivers.

Changed in manila:
assignee: nobody → Jan Provaznik (jan-provaznik)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (master)

Fix proposed to branch: master
Review: https://review.openstack.org/394961

Changed in manila:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (master)

Reviewed: https://review.openstack.org/394961
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=af79b9f5b78be5413bc363cbf12ae8351b9e4f02
Submitter: Jenkins
Branch: master

commit af79b9f5b78be5413bc363cbf12ae8351b9e4f02
Author: Jan Provaznik <email address hidden>
Date: Tue Nov 8 14:37:06 2016 +0100

    Check ceph backend connection on driver setup

    Check that ceph connection really works when setting up
    the driver instead of doing real connect later in init_host
    phase. This mitigates the risk that the service crashes/respawns
    in an infinite loop because of a connection error.

    Change-Id: Ia71b55dab1535ce351310108aaf53304b15ab757
    Closes-Bug: 1640169

Changed in manila:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/397744

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/newton)

Reviewed: https://review.openstack.org/397744
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=fd631e9f74eb1c9df2eaee27731757e3441f3b84
Submitter: Jenkins
Branch: stable/newton

commit fd631e9f74eb1c9df2eaee27731757e3441f3b84
Author: Jan Provaznik <email address hidden>
Date: Tue Nov 8 14:37:06 2016 +0100

    Check ceph backend connection on driver setup

    Check that ceph connection really works when setting up
    the driver instead of doing real connect later in init_host
    phase. This mitigates the risk that the service crashes/respawns
    in an infinite loop because of a connection error.

    Change-Id: Ia71b55dab1535ce351310108aaf53304b15ab757
    Closes-Bug: 1640169
    (cherry picked from commit af79b9f5b78be5413bc363cbf12ae8351b9e4f02)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/manila 4.0.0.0b1

This issue was fixed in the openstack/manila 4.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.