os-brick

Multipath disconnect fails if path just went down

Bug #1794829 reported by Gorka Eguileor on 2018-09-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	os-brick	Fix Released	Undecided	Gorka Eguileor

Bug Description

If the iSCSI connection to a device goes down right after we flush it, or if one of the paths of a multipath device goes down right before we start disconnecting, the detach will fail even though it should succeed.

An extract of the error we'll see in the logs is:

  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server return r.call(f, *args, **kwargs)
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server raise attempt.get()
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server six.reraise(self.value[0], self.value[1], self.value[2])
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py", line 89, in wait_for_volumes_removal
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server raise exception.VolumePathNotRemoved(volume_path=exist)
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server VolumePathNotRemoved: Volume path [u'sdd'] was not removed in time.
  2018-09-12 10:30:52.013 1 ERROR oslo_messaging.rpc.server

This happens because, under those circumstances, it may take up to 30 seconds for the SCSI device to be removed from /dev, but expect it to disappear in 6 seconds (first check happens, immediately, then another in 2 seconds, and another in 4 seconds).

If we wait a little bit more, the device will be properly removed.

Tags:

Gorka Eguileor (gorka) on 2018-09-27

Changed in os-brick:
assignee:	nobody → Gorka Eguileor (gorka)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-27: Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.openstack.org/605802

Changed in os-brick:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-01: Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/605802
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=b9c7bc2b597d944cbc404d6bf5fedc35d095a897
Submitter: Zuul
Branch: master

commit b9c7bc2b597d944cbc404d6bf5fedc35d095a897
Author: Gorka Eguileor <email address hidden>
Date: Thu Sep 27 17:55:00 2018 +0200

Succeed on iSCSI detach when path just went down

    If the iSCSI connection to a device goes down right after we flush it,
    or if one of the paths of a multipath device goes down right before we
    start disconnecting, the detach will fail even though it should succeed.

We'll see a VolumePathNotRemoved exception listing volumes that had not
disappeared.

    This happens because, under those circumstances, it may take up to 30
    seconds for the SCSI device to be removed from /dev, but expect it to
    disappear in 6 seconds (first check happens, immediately, then another
    in 2 seconds, and another in 4 seconds).

Since the device will be removed if we wait a bit more, this patch makes
it so that we wait for up to 30 seconds for the removal.

    To ensure we wait as little time as possible, we change the way we wait
    for the devices to be removed. Instead of checking, sleeping for 2 and
    then for 4 seconds, and then checking again, we just sleep 500ms between
    checks, and we do the DEBUG log every 5 seconds.

Change-Id: If801dfc2462c0d3f986eebd4108087139934610d
Closes-Bug: #1794829

Changed in os-brick:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-01: Fix proposed to os-brick (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/607041

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-03: Fix merged to os-brick (stable/rocky)

Reviewed: https://review.openstack.org/607041
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=b75411de2b2aadd1eafd2f8f8b1579df357bf09f
Submitter: Zuul
Branch: stable/rocky

commit b75411de2b2aadd1eafd2f8f8b1579df357bf09f
Author: Gorka Eguileor <email address hidden>
Date: Thu Sep 27 17:55:00 2018 +0200

Succeed on iSCSI detach when path just went down

    If the iSCSI connection to a device goes down right after we flush it,
    or if one of the paths of a multipath device goes down right before we
    start disconnecting, the detach will fail even though it should succeed.

We'll see a VolumePathNotRemoved exception listing volumes that had not
disappeared.

    This happens because, under those circumstances, it may take up to 30
    seconds for the SCSI device to be removed from /dev, but expect it to
    disappear in 6 seconds (first check happens, immediately, then another
    in 2 seconds, and another in 4 seconds).

Since the device will be removed if we wait a bit more, this patch makes
it so that we wait for up to 30 seconds for the removal.

    Change-Id: If801dfc2462c0d3f986eebd4108087139934610d
    Closes-Bug: #1794829
    (cherry-picked from b9c7bc2b597d944cbc404d6bf5fedc35d095a897)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-03: Fix proposed to os-brick (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/607632

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-08: Fix included in openstack/os-brick 2.6.1

This issue was fixed in the openstack/os-brick 2.6.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-08: Fix included in openstack/os-brick 2.5.4

This issue was fixed in the openstack/os-brick 2.5.4 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-10: Fix merged to os-brick (stable/queens)

Reviewed: https://review.openstack.org/607632
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=9722aa7db81b1d67b9e4c7804034b680aeb10b17
Submitter: Zuul
Branch: stable/queens

commit 9722aa7db81b1d67b9e4c7804034b680aeb10b17
Author: Gorka Eguileor <email address hidden>
Date: Thu Sep 27 17:55:00 2018 +0200

Succeed on iSCSI detach when path just went down

    If the iSCSI connection to a device goes down right after we flush it,
    or if one of the paths of a multipath device goes down right before we
    start disconnecting, the detach will fail even though it should succeed.

We'll see a VolumePathNotRemoved exception listing volumes that had not
disappeared.

    This happens because, under those circumstances, it may take up to 30
    seconds for the SCSI device to be removed from /dev, but expect it to
    disappear in 6 seconds (first check happens, immediately, then another
    in 2 seconds, and another in 4 seconds).

Since the device will be removed if we wait a bit more, this patch makes
it so that we wait for up to 30 seconds for the removal.

    Change-Id: If801dfc2462c0d3f986eebd4108087139934610d
    Closes-Bug: #1794829
    (cherry-picked from commit b9c7bc2b597d944cbc404d6bf5fedc35d095a897)
    (cherry picked from commit b75411de2b2aadd1eafd2f8f8b1579df357bf09f)

tags:

added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-07: Fix included in openstack/os-brick 2.3.5

This issue was fixed in the openstack/os-brick 2.3.5 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-26: Fix proposed to os-brick (stable/pike)

#10

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/647777

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-11: Fix merged to os-brick (stable/pike)

#11

Reviewed: https://review.openstack.org/647777
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=97c1da1230261ac79c6045b6aa8690db597fcc8c
Submitter: Zuul
Branch: stable/pike

commit 97c1da1230261ac79c6045b6aa8690db597fcc8c
Author: Gorka Eguileor <email address hidden>
Date: Thu Sep 27 17:55:00 2018 +0200

Succeed on iSCSI detach when path just went down

    If the iSCSI connection to a device goes down right after we flush it,
    or if one of the paths of a multipath device goes down right before we
    start disconnecting, the detach will fail even though it should succeed.

We'll see a VolumePathNotRemoved exception listing volumes that had not
disappeared.

    This happens because, under those circumstances, it may take up to 30
    seconds for the SCSI device to be removed from /dev, but expect it to
    disappear in 6 seconds (first check happens, immediately, then another
    in 2 seconds, and another in 4 seconds).

Since the device will be removed if we wait a bit more, this patch makes
it so that we wait for up to 30 seconds for the removal.

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-12: Fix included in openstack/os-brick 1.15.9

#12

This issue was fixed in the openstack/os-brick 1.15.9 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.