Race during share server termination (auto cleanup)

Bug #1587835 reported by Marc Koderer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
In Progress
Low
Carlos Eduardo

Bug Description

2016-05-30 17:03:52.126 ERROR oslo_messaging.rpc.dispatcher [req-447cb847-516f-445b-9cd1-b8ed0c5fa083 9696432bd501426aac060ef23187948b 3d7166a2ef3648e198807ae779d0bafe] Exception during me
ssage handling: Share server 4eef99e7-fce2-45cc-944d-1dfc8fda490d could not be found.
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher incoming.message))
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/share/manager.py", line 146, in wrapped
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return f(self, *args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/utils.py", line 616, in wrapper
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return func(self, *args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/share/manager.py", line 1861, in delete_share_instance
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher self.delete_share_server(context, share_server)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/share/manager.py", line 146, in wrapped
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return f(self, *args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/utils.py", line 616, in wrapper
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return func(self, *args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/share/manager.py", line 2499, in delete_share_server
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher _teardown_server()
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/share/manager.py", line 2477, in _teardown_server
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher {'status': constants.STATUS_DELETING})
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/db/api.py", line 776, in share_server_update
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return IMPL.share_server_update(context, id, values)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/db/sqlalchemy/api.py", line 164, in wrapper
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/db/sqlalchemy/api.py", line 2822, in share_server_update
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher server_ref = share_server_get(context, id, session=session)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/db/sqlalchemy/api.py", line 164, in wrapper
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/manila/manila/db/sqlalchemy/api.py", line 2833, in share_server_get
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher raise exception.ShareServerNotFound(share_server_id=server_id)
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher ShareServerNotFound: Share server 4eef99e7-fce2-45cc-944d-1dfc8fda490d could not be found.
2016-05-30 17:03:52.126 TRACE oslo_messaging.rpc.dispatcher

Revision history for this message
Marc Koderer (m-koderer) wrote :

Issue orccours when:

delete_share_server_with_last_share = True
and
automatic_share_server_cleanup = True

I was able to reproduce with the following patch:
@@ -1878,11 +1879,15 @@ class ShareManager(manager.SchedulerDependentManager):
                 LOG.debug("Scheduled deletion of share-server "
                           "with id '%s' automatically by "
                           "deletion of last share.", share_server['id'])
+ print "Sleep some minutes....."
+ time.sleep(120)
+ import pdb; pdb.set_trace()
                 self.delete_share_server(context, share_server)

- @periodic_task.periodic_task(spacing=600)
+ @periodic_task.periodic_task(spacing=1)
     @utils.require_driver_initialized
     def delete_free_share_servers(self, ctxt):
         if not (self.driver.driver_handles_share_servers and
                 self.configuration.automatic_share_server_cleanup):

Changed in manila:
assignee: nobody → Marc Koderer (m-koderer)
summary: - Race during share server termination
+ Race during share server termination (auto cleanup)
Revision history for this message
Valeriy Ponomaryov (vponomaryov) wrote :

Looks like we should combine these two options as they are inversely mutually exclusive.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (master)

Fix proposed to branch: master
Review: https://review.openstack.org/324300

Changed in manila:
status: New → In Progress
Changed in manila:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on manila (master)

Change abandoned by Tom Barron (<email address hidden>) on branch: master
Review: https://review.openstack.org/324300
Reason: No activity since June 2016; feel free to restore and rebase.

Tom Barron (tpb)
tags: added: races
Changed in manila:
status: In Progress → New
Revision history for this message
Jason Grosso (jgrosso) wrote :

Goutham are you working on this one ?

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Hi Jason, no.. seems like a low-hanging-fruit and a good cleanup for someone who wants to contribute this fix.

Marc's initial patch [1] made the share manager fail to start, and reviewers have commented that you can just not run the periodic share server cleanup (and perhaps raise a warning log message about the incompatible combination of options used). I like the latter approach since it is upgrade friendly.

I'm un-assigning Marc, but hopefully we can have a new assignee pick this one up.

[1] https://review.opendev.org/#/c/324300/

Changed in manila:
importance: Medium → Low
assignee: Marc Koderer (m-koderer) → nobody
tags: added: low-hanging-fruit
Revision history for this message
Carlos Eduardo (silvacarlose) wrote :

Documentation helped us to avoid running into this issue. A solution for this would be to make a LHF out of this and surround https://github.com/openstack/manila/blob/2b57d15c641ff2a222c7d0af63853976b7e50f1f/manila/share/manager.py#L3494-L3507 with a try/catch block.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to manila (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/manila/+/840547

Vida Haririan (vhariria)
Changed in manila:
status: New → In Progress
assignee: nobody → Carlos Eduardo (silvacarlose)
Revision history for this message
kizdorothy (kizdorothy) wrote :

Hello, can I possibly work on this issue, it's my first time here I would love to contribute to this project.

Revision history for this message
Vida Haririan (vhariria) wrote :

Hello, Thanks you. A fix was proposed on 2022-05-02 and is under review.
Please have a look at the list of open bugs and feel free to chose another bug of interest.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.