Comment 8 for bug 2004555

Revision history for this message
Gorka Eguileor (gorka) wrote : Re: [ussuri] Wrong volume attachment - volumes overlapping when connected through iscsi on host

Hi,

I think I know what happened, but there are some things that don't match unless
somebody has manually changed some things in the host (like cleaning up
multipaths).

Bit of context:

- SCSI volumes (iSCSI and FC) on Linux are NEVER removed automatically by the
  kernel and must always be removed explicitly. This means that they will
  remain in the system even if the remote connection is severed, unless
  something in OpenStack removes it.

- The os-brick library has a strong policy of not removing devices from the
  system if flushing fails during detach, to prevent data loss.

  The `disconnect_volume` method in the os-brick library has an additional
  parameter called `force` to allow callers to ignore flushing errors and
  ensure that the devices are being removed. This is useful when after failing
  the detach the volume is either going to be deleted or into error status.

I don't have the logs, but from what you said my guess is that this is what has
happened:

- Volume with SCSI ID 36e00084100ee7e7ed6ad25d900002f6b was attached to that
  host on LUN 10 at some point since the last reboot (sdao, sdap, sdan, sdaq).

- When detaching the volume from the host using os-brick the operation failed
  and it wasn't removed, yet Nova still called Cinder to unexport and unmap the
  volume. At this point LUN 10 is free on the Huawei array and the volume is
  no longer attacheable, but /dev/sda[o-q] are still present, and their SCSI_ID
  are still known to multipathd.

- Nova asked Cinder to attach the volume again, and the volume is mapped to LUN
  4 (which must have been available as well) and it successfully attaches (sdm,
  sdo, sdl, sdn), appears as a multipath, and is used by the VM.

- Nova asks Cinder to export and map the new 1GB volume, and Huawai maps it to
  LUN 10, at this point iSCSI detects that the remote LUNs are back and
  reconnects to them, which makes the multipathd path checker detect sdao,
  sdap, sdan, sdaq are alive on the compute host and they are added to the
  existing multipath device mapper using their known SCSI ID.

You should find out why the detach actually failed, but I think I see multiple
issues:

- Nova:

  - Should not call Cinder to unmap a volume if the os-brick to disconnect the
    volume has failed, as we know this will leave leftover devices that can
    cause issues like this.

  - If it's not already doing it, Nova should call disconnect_volume method
    from os-brick passing force=True when the volume is going to be deleted.

- os-brick:

  - Should try to detect when the newly added devices are being added to a
    multipath device mapper that has live paths to other LUNs and fail if that
    is the case.

  - As an improvement over the previous check, os-brick could forcefully remove
    those devices that are in the wrong device mapper, force a refresh of their
    SCSI IDs and add them back to multipathd to form a new device mapper.
    Though personally this is a non trivial and maybe potentially problematic
    feature.

In other words, the source of the problem is probably Nova, but os-brick should
try to prevent these possible data leaks.

Cheers,
Gorka.

[1]: https://github.com/openstack/os-brick/blob/655fcc41b33d3f6afc8f85005868d0111077bdb5/os_brick/initiator/connectors/iscsi.py#L858