Concurrency issues on rtstool (LIO)

Bug #1460692 reported by Gorka Eguileor on 2015-06-01
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Gorka Eguileor
Kilo
Undecided
Unassigned

Bug Description

Since rtslib has no synchronized access to configfs [1] it can happen that rtstool or rtslib try to access something that is no longer there because other Cinder request has removed the target.

These errors can happen on creating, attaching, detaching and deleting volumes, and resulting log messages are weird/misleading because the exception refers to a different volume-id from the one we were working on.

The reason why the exception is from another volume is because library was looping through all elements, and when it tried to access one to get its info to do a dump (for example) the info no longer exists and that's unexpected and raises an Exception.

I have located different errors that are originated from this same issue:

1)
TRACE oslo.messaging.rpc.dispatcher Stderr: u'Traceback (most recent call last):
  File "/bin/cinder-rtstool", line 280, in <module>
    sys.exit(main())
  File "/bin/cinder-rtstool", line 251, in main
    get_targets()
  File "/bin/cinder-rtstool", line 162, in get_targets
    for t, dump in _get_targets(rtsroot):
  File "/bin/cinder-rtstool", line 51, in _get_targets
    return [(target, target.dump()) for target in rtsroot.targets]
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 136, in dump
    d[\'tpgs\'] = [tpg.dump() for tpg in self.tpgs]
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 430, in dump
    d[\'luns\'] = [lun.dump() for lun in self.luns]
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 640, in dump
    (self.storage_object.plugin, self.storage_object.name)
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 557, in _get_storage_object
    return tcm.StorageObject.so_from_path(alias_path)
  File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 113, in so_from_path
    return so_mapping[so_type](so_name)
KeyError: \'lun\'

2)
Stderr: u'Traceback (most recent call last):
  File "/bin/cinder-rtstool", line 258, in <module>
    sys.exit(main())
  File "/bin/cinder-rtstool", line 229, in main
    get_targets()
  File "/bin/cinder-rtstool", line 141, in get_targets
    print(x.dump()[\'wwn\'])
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 136, in dump
    d[\'tpgs\'] = [tpg.dump() for tpg in self.tpgs]
  File "/usr/lib/python2.7/site-packages/rtslib/target.py", line 427, in dump
    d = super(TPG, self).dump()
  File "/usr/lib/python2.7/site-packages/rtslib/node.py", line 218, in dump
    params[item] = self.get_parameter(item)
  File "/usr/lib/python2.7/site-packages/rtslib/node.py", line 190, in get_parameter
    return fread(path)
  File "/usr/lib/python2.7/site-packages/rtslib/utils.py", line 89, in fread
    with open(path, \'r\') as file_fd:
IOError: [Errno 2] No such file or directory: \'/sys/kernel/config/target/iscsi/iqn.2010-10.org.openstack:volume-1468da95-22a3-470c-8ac9-4b27cac8acb3/tpgt_1/param/IFMarkInt\'\n'

3)
2015-05-17 20:51:49.829 62162 ERROR cinder.brick.iscsi.iscsi [req-6981f2b5-2259-49d8-a8ce-6d41de344944 50a742bdfe4e4a8ca0ddc150fc648904 648ffe15b9fe4a1bba71e4643d4dbbf2 - - -] Unexpected error while running command. Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool delete iqn.2010-10.org.openstack:volume-ad12c1e0-6746-4bce-9b94-5df1cae4ee2a Exit code: 1 Stdout: '' Stderr: 'Traceback (most recent call last):
 File "/bin/cinder-rtstool", line 238, in <module>
 sys.exit(main())
 File "/bin/cinder-rtstool", line 224, in main
 delete(iqn)
 File "/bin/cinder-rtstool", line 151, in delete
 for x in rtsroot.storage_objects:
 File "/usr/lib/python2.7/site-packages/rtslib/root.py", line 86, in _list_storage_objects
 for so in StorageObject.all():
 File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 102, in all
 yield cls.so_from_path(so_dir)
 File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 117, in so_from_path
 return mapping[so_type](so_name)
 File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 678, in __init__
 super(BlockStorageObject, self).__init__(name, \'lookup\')
 File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 48, in __init__
 self._backstore = _Backstore(name, type(self), mode)
 File "/usr/lib/python2.7/site-packages/rtslib/tcm.py", line 785, in __init__
 (self._plugin, name))
rtslib.utils.RTSLibError: Storage object block/iqn.2010-10.org.openstack:volume-fb12df40-d1e3-4a4e-b47d-0162ce9573cc not found

The cleanest solution is synchronizing access in Cinder.

[1]: https://github.com/agrover/rtslib-fb/issues/36

Gorka Eguileor (gorka) on 2015-06-01
Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
Gorka Eguileor (gorka) on 2015-06-01
description: updated

Fix proposed to branch: master
Review: https://review.openstack.org/187260

Changed in cinder:
status: New → In Progress

Reviewed: https://review.openstack.org/187260
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=45c282afb442aec80cf24d226bef5ebb46963026
Submitter: Jenkins
Branch: master

commit 45c282afb442aec80cf24d226bef5ebb46963026
Author: Gorka Eguileor <email address hidden>
Date: Mon Jun 1 18:30:01 2015 +0200

    Fix cinder concurrency issues on rtstool

    Since there's no synchronized access to configfs in rtslib it can happen
    that rtstool or rtslib access an element that no longer exists because
    it has been removed just in the middle of a loop by another Cinder
    request.

    This results in quite a different number of exceptions:
    - .dump()
    - KeyError
    - IOError
    - RTSLibError on storage_object

    This patch synchronizes access to all rtstool calls that access or
    modify configfs using utils.synchronized decorator.

    Change-Id: I341a10da54ab01be68a0cae843f35e5c841c6d81
    Closes-Bug: #1460692

Changed in cinder:
status: In Progress → Fix Committed
Gorka Eguileor (gorka) wrote :

Another effect of this patch is the disappearance of sporadic failures to attach volumes that appeared in Kernel messages like this (since I can no longer reproduce this errors):

[48972.425720] Unable to locate Target IQN: iqn.2010-10.org.openstack:volume-bdb5a6a3-840d-4960-8a64-af014388fbfa in Storage Node
[48972.436717] iSCSI Login negotiation failed.
[48999.366835] Rounding down aligned max_sectors from 4294967295 to 4294967288
[48999.397357] strict_strtoul() failed for port_str: -22

Reviewed: https://review.openstack.org/187667
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=223b258be62eb448728647eb3c662df331557b0d
Submitter: Jenkins
Branch: stable/kilo

commit 223b258be62eb448728647eb3c662df331557b0d
Author: Gorka Eguileor <email address hidden>
Date: Mon Jun 1 18:30:01 2015 +0200

    Fix cinder concurrency issues on rtstool

    Since there's no synchronized access to configfs in rtslib it can happen
    that rtstool or rtslib access an element that no longer exists because
    it has been removed just in the middle of a loop by another Cinder
    request.

    This results in quite a different number of exceptions:
    - .dump()
    - KeyError
    - IOError
    - RTSLibError on storage_object

    This patch synchronizes access to all rtstool calls that access or
    modify configfs using utils.synchronized decorator.

    Change-Id: I341a10da54ab01be68a0cae843f35e5c841c6d81
    Closes-Bug: #1460692
    (cherry picked from commit 45c282afb442aec80cf24d226bef5ebb46963026)
    Conflicts:
     cinder/tests/targets/test_lio_driver.py

tags: added: in-stable-kilo
Thierry Carrez (ttx) on 2015-06-23
Changed in cinder:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-10-15
Changed in cinder:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers