iSCSI disconnect_volume fails with force=True and ignore_errors=True

Bug #2012251 reported by Rajat Dhasmana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Undecided
Rajat Dhasmana

Bug Description

The purpose of providing force=True and ignore_errors=True is to indicate os-brick that we want to disconnect the volume even if flush fails (force) and not raise any exceptions(ignore_errors) in the end.

Currently even with providing both parameters as True, we still see failure in operations like backup create that disconnects the volume with force and ignore_errors as True[2]

2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server [req-f6619913-6f96-4226-8d75-2da3fca722f1 23de1b92e7674cf59486f07ac75b886b a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -f 3624a93705842cfae35d7483200015ec6
Exit code: 1
Stdout: ''
Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath device\n'
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in create_backup
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server volume_utils.update_backup_error(backup, str(err))
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server self.force_reraise()
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise self.value
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in create_backup
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server updates = self._run_backup(context, backup, volume)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in _run_backup
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server ignore_errors=True)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, in _detach_device
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server force=force, ignore_errors=ignore_errors)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in trace_logging_wrapper
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 880, in disconnect_volume
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server is_disconnect_call=True)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 942, in _cleanup_connection
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server self._linuxscsi.flush_multipath_device(multipath_name)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 382, in flush_multipath_device
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server root_helper=self._root_helper)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = self.__execute(*args, **kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 172, in execute
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return execute_root(*cmd, **kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs)
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise exc_type(*result[2])
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command: multipath -f 3624a93705842cfae35d7483200015ec6
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit code: 1
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: ''
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath device\n'
2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server

[1] https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L903-L907
[2] https://github.com/openstack/cinder/blob/b75c29c7d8e0e6ac212b59f9ad8d140874e55251/cinder/backup/manager.py#L509-L512

Changed in os-brick:
assignee: nobody → Rajat Dhasmana (whoami-rajat)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-brick/+/878045

Changed in os-brick:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/878045
Committed: https://opendev.org/openstack/os-brick/commit/8070ac3bd903a443fbfd02abf8b0554d5d05cac1
Submitter: "Zuul (22348)"
Branch: master

commit 8070ac3bd903a443fbfd02abf8b0554d5d05cac1
Author: Rajat Dhasmana <email address hidden>
Date: Tue Mar 21 01:30:53 2023 +0000

    Fix iSCSI disconnect_volume when flush fails

    The purpose of providing force=True and ignore_errors=True
    is to tell os-brick that we want to disconnect the volume
    even if flush fails (force) and not raise any exceptions
    (ignore_errors). Currently, in an iSCSI multipath environment,
    disconnect_volume can fail when both parameters are True.

    The current flow when disconnecting an iSCSI volume is
    that if flushing a multipath device fails, we manually
    remove the device, logout from the target portals,
    and try the flush again.

    There are two problems here:

    1) code problem: The second flush is not wrapped by
    ExceptionChainer. This causes it to raise the exception
    immediately after flush fails irrespective of the value
    of the ignore_errors flag.

    2) conceptual problem: In this situation, there is no point
    in making the second flush attempt. Instead, we should just
    remove the multipath map from multipathd monitoring since
    we have already removed the paths manually.

    This patch fixes the conceptual problem, as we don't make a second
    flush call and ignore any errors on the execution ``multipathd del map``
    thereby also fixing the code problem.

    Closes-Bug: #2012251
    Change-Id: I828911495a2de550ea997e6f51cc039a7b7fa8cd

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 6.3.0

This issue was fixed in the openstack/os-brick 6.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-brick/+/883284

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/883284
Committed: https://opendev.org/openstack/os-brick/commit/931a95fc3302985c780b32c6ed66d386ba75aec0
Submitter: "Zuul (22348)"
Branch: master

commit 931a95fc3302985c780b32c6ed66d386ba75aec0
Author: Rajat Dhasmana <email address hidden>
Date: Tue May 16 16:15:10 2023 +0000

    Fix iSCSI disconnect_volume when flush fails

    The purpose of providing force=True and ignore_errors=True
    is to tell os-brick that we want to disconnect the volume
    even if flush fails (force) and not raise any exceptions
    (ignore_errors). Currently, in an iSCSI multipath environment,
    disconnect_volume can fail when both parameters are True.

    The current flow when disconnecting an iSCSI volume is
    that if flushing a multipath device fails, we manually
    remove the device, logout from the target portals,
    and try the flush again.

    There are two problems here:

    1) code problem: The second flush is not wrapped by
    ExceptionChainer. This causes it to raise the exception
    immediately after flush fails irrespective of the value
    of the ignore_errors flag.

    2) conceptual problem: In this situation, there is no point
    in making the second flush attempt. Instead, we should just
    remove the multipath map from multipathd monitoring since
    we have already removed the paths manually.

    This patch fixes the conceptual problem, as we don't make a second
    flush call and ignore any errors on the execution ``multipathd del map``
    thereby also fixing the code problem.

    Closes-Bug: #2012251
    Change-Id: Iad545dc8f3651bc1c2d2dabd87e79bba26cba3d9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 6.9.0

This issue was fixed in the openstack/os-brick 6.9.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.