OpenStack Compute (nova)

LibvirtISCSIVolumeDriver: device size mismatch when LUN is reused

Bug #1112483 reported by Brano Zarnovican on 2013-02-01

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Jason Dillaman	OpenStack Compute (nova) 2014.1 "icehouse"

Bug Description

Short problem summary:
====================

When LUN id is reused by SCSI provider, it may cause device size mismatch on the compute node. Host may report to guest the device size corresponding to volume previously mapped to this LUN id, not the device that is mapped there now. This happens for SCSI providers that use one target with many LUNs (eg Netapp).

Detailed problem description:
========================

Openstack iSCSI client in disconnect_volume() will call iscsiadm with --logout only if nobody else is using LUNs from that target. Otherwise, it will do nothing. Device stays there..

# ls -l /dev/disk/by-path/ip-172.30.128.3\:3260-iscsi-iqn.1992-08.com.netapp\:node.netapp02-lun-0
lrwxrwxrwx. 1 root root 9 Feb 1 11:06 /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-0 -> ../../sdg

Later, nova-volume will unmap LUN from the initiator. This devices becomes invalid. Example "sanlun" output:

# sanlun lun show
controller(7mode)/ device host lun
vserver(Cmode) lun-pathname filename adapter protocol size mode
---------------------------------------------------------------------------------------------------------------------------------------------------
1081809-413161-N2 <unknown> /dev/sdg host7 iSCSI 7

At some point, a different volume needs to be made available to the same compute node. Remote SCSI provider may choose to recycle an unused LUN id. From client's point of view, a different Openstack volume is visible under the same target and LUN id (as used before). After nova-volume completed LUN mapping, nova-compute's connect_volume() is called. Note that, at this point, iSCSI session to the target is up and device symlink (/dev/disk/by-path/..) exists. Openstack iSCSI driver will call "iscsiadm .. --login" (with no effect). Rescan is not called, because the device exists. Libvirt and VM will start to use device..

Access to the re-mapped device will produce
Feb 1 11:06:45 prod-cmp10 kernel: sd 7:0:0:1: [sdh] Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
Feb 1 11:06:45 prod-cmp10 kernel: sd 7:0:0:0: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 1 11:06:45 prod-cmp10 kernel: sd 7:0:0:0: [sdg] Sense Key : Illegal Request [current]
Feb 1 11:06:45 prod-cmp10 kernel: Info fld=0x0

For some strange reason, kernel reports the warning on device that did NOT change ("sdh" vs "sdg"). Possible bug in Linux iSCSI client ?

This issue affects SCSI systems where there are targets with multiple LUNs (eg Netapp). Openstack implementation on LVM/tgtd backend is not affected because there are multiple targets with single LUN. When the LUN becomes unused, driver will close the whole session.

Steps to reproduce:
================

1) create tree volumes with different sizes (1, 2, 3GB)

# euca-describe-volumes vol-00000551 vol-00000552 vol-00000553
VOLUME vol-00000551 1 na.dev-netapp available 2013-02-01T09:32:01.000Z
VOLUME vol-00000552 2 na.dev-netapp available 2013-02-01T09:32:07.000Z
VOLUME vol-00000553 3 na.dev-netapp available 2013-02-01T09:32:13.000Z

2) attach volumes 3G, 2G to an instance

compute node# virsh domblklist i-000005dc
Target Source
------------------------------------------------
...
vdc /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-0
vdd /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-1

instance# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdc 252:32 0 3G 0 disk
vdd 252:48 0 2G 0 disk

3) detach volume 3G (LUN0 becomes unused)

Device still exists

# ls -l /dev/disk/by-path/
...
lrwxrwxrwx. 1 root root 9 Feb 1 10:46 ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-0 -> ../../sdg
lrwxrwxrwx. 1 root root 9 Feb 1 10:44 ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-1 -> ../../sdh

4) attach volume 1G to the same instance (LUN0 is reused for different volume)

Expected result:
Instance can see new 1G device attached

Actual result:
Instance is reporting the size to be 3G.

Host OS is also reporting 3G. SCSI tools report correct size (1G).

instance# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdc 252:32 0 3G 0 disk
vdd 252:48 0 2G 0 disk

compute# virsh domblklist i-000005dc
Target Source
------------------------------------------------
...
vdc /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-0
vdd /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-1

compute# ls -l /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-?
lrwxrwxrwx. 1 root root 9 Feb 1 10:47 /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-0 -> ../../sdg
lrwxrwxrwx. 1 root root 9 Feb 1 10:47 /dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-1 -> ../../sdh

compute# lsblk
...
sdg 8:96 0 3G 0 disk
sdh 8:112 0 2G 0 disk

compute# sanlun lun show
controller(7mode)/ device host lun
vserver(Cmode) lun-pathname filename adapter protocol size mode
---------------------------------------------------------------------------------------------------------------------------------------------------
1081809-413161-N2 /vol/OpenStack_103a49bb861e485ea05aa78f9b0216bd_1/vol-00000552/vol-00000552 /dev/sdh host7 iSCSI 2g 7
1081809-413161-N2 /vol/OpenStack_103a49bb861e485ea05aa78f9b0216bd_1/vol-00000551/vol-00000551 /dev/sdg host7 iSCSI 1g 7

I'm attaching also more outputs with preserved formatting (outputs.txt) ..

Regards,

Brano Zarnovican

Tags:

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-02-01:

outputs.txt Edit (9.3 KiB, text/plain)

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-02-01:

I forgot to add sw versions..

ScientificLinux 6.3
kernel-2.6.32-279.11.1.el6.x86_64
iscsi-initiator-utils-6.2.0.872-41.el6.x86_64
Openstack Essex 2012.1.3 (most likely affects also Folsom, master)
Netapp OnTAP 7.3.6P5

One option to fix this problem:

During disconnect_volume()..
1) if this was the last LUN in a session, close the session (as it is doing now)
2) otherwise delete that single device

echo 1 > /sys/block/sdX/device/delete

This is easier said than done, because 'rootwrap' module does not natively support "echo 1 > /something" :(

Revision history for this message

Chuck Short (zulcss) wrote on 2013-02-04:

Can you attach the nova-compute log files when this happens?

Changed in nova:
status:	New → Incomplete

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-02-05:

output2.txt Edit (42.8 KiB, text/plain)

As per request, I'm attaching compute.log on DEBUG. There are three chunks of logs corresponding to the three steps to reproduce the problem.

We are running a slightly patched version of Essex. One patch introduced a log message which you might not be familiar with.. (grep for "Attaching device"). I believe that none of the patches on top of Essex introduced the problem described above. Anyway, it should be easily reproducible in any environment if you have Netapp at hand.

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-02-05:

0001-BUGFIX-1112483-device-size-mismatch-when-LUN-is-reus.patch Edit (2.2 KiB, text/plain)

There are several ways to fix/workaround the problem..

The solution I liked most is to delete the device when Openstack stops using it (see patch). At the time this code is called in disconnect_volume(), the LUN is still mapped to compute node, but not used by libvirt. Soon after that, nova-volume will unmap the LUN.

When some other volume will reuse the same LUN id, nova-compute will find the device missing and force rescan. This will create the device anew and do the usual kernel sensing..

Implementation side-note: I had a real struggle to implement "echo 1 > /sys/block/..". Eventually I got it working as "echo 1 | sudo cp /dev/stdin /sys/block/..". I guess there could be a better support in rootwrap for this kind of use-case.

Revision history for this message

Gustavo Randich (gustavo-randich) wrote on 2013-02-07:

We use Netapp and the bug occurs at our site too. We've applied Brano's patch and could fix it successfully.

Davanum Srinivas (DIMS) (dims-v) on 2013-03-18

Changed in nova:
status:	Incomplete → Confirmed

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-03-19:

I just found today (after days of troubleshooting) that OpenStack is calling disconnect_volume when terminating stopped instance. Because volume will retain 'connection_info' even after it is detached, this will lead to possibility that the code from my patch will delete SCSI device belonging to other volume. Remember that LUN id is reused and several volumes may have identical 'connection_info' details.

I hereby renounce my own patch as "unfit" and will provide an updated one in the near future.

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-04-03:

0001-BUGFIX-1112483-device-size-mismatch-when-LUN-is-reus-20130403.patch Edit (3.6 KiB, text/plain)

Here is the updated patch..

WARNING: This patch significantly changes how Essex is handling volume connect/disconnect.

The cleanest option to address the problem of reusing LUN id was to remove the information about the old LUN id from the database. After this patch, code will wipe-out "connection_info" column from "block_device_mapping" table, when volume is disconnected. I found only two places in Essex where it needs to be done. There are couple more in 'master' branch. So, this patch should only be applied Essex and would need significant update for later releases.

If there is no info in DB on what was the LUN id when this volume was connected, then there is no chance that any future call to 'disconnect_volume' will cause harm to volume using the same LUN id now..

You may also look at related mail-thread with "Libvirt iSCSI client: duplicit connection_info data" subject..

Revision history for this message

Joe T (joe-topjian-v) wrote on 2013-04-09:

I'm running a Folsom environment with a NetApp appliance and this bug affects me as well. So it's definitely not localized to Essex.

Revision history for this message

Brano Zarnovican (zarnovican) wrote on 2013-04-10:

#10

@Joe, I believe that Folsom is affected too. I briefly reviewed the code. I shared the patch for Essex only, because that's what we are running. The same patch might work in Folsom, but there are couple more places where the 'connection_info' should be cleared. If you won't hit that part of code (eg live migration) then you should be fine with the patch posted above. In the couple of weeks we will be migrating to Folsom and I will share again the updated patch when it is ready.

Revision history for this message

Joe T (joe-topjian-v) wrote on 2013-04-10:

#11

Thanks, Brano! For the time being, I am going to write an external script triggered via cron to cleanup connection_info. While doing this in Nova would be more efficient, I don't want to modify Nova unless I absolutely have to. My volume traffic is low enough that a periodic cron cleanup should work.

Jason Dillaman (jdillaman) on 2013-07-19

Changed in nova:
status:	Confirmed → In Progress
assignee:	nobody → Jason Dillaman (jdillaman)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-19: Fix proposed to nova (master)

#12

Fix proposed to branch: master
Review: https://review.openstack.org/37907

Revision history for this message

ritesh nanda (riteshnanda09) wrote on 2013-11-20:

#13

Hello Jason, do we have patch released for this issue for grizzly , we have a setup which runs 10 compute nodes , because of this problem we are not able to create any volume and assign it, or do we have any work around for it to solve it.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-09: Fix merged to nova (master)

#14

Reviewed: https://review.openstack.org/37907
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8a6ee4808782820cc0938519eb3db63a5eb9a2ff
Submitter: Jenkins
Branch: master

commit 8a6ee4808782820cc0938519eb3db63a5eb9a2ff
Author: Jason Dillaman <email address hidden>
Date: Wed Jul 17 16:17:08 2013 -0400

Delete iSCSI devices after volume detached

    Previously, after detaching from a volume, the iSCSI device
    remains attached on the compute node until all LUNs for a given
    IQN are detached -- causing issues when LUNs are reused for
    different volumes. This change will delete the device(s)
    associated with the detached volume so LUNs can be reused.

Fixes bug 1112483
Change-Id: Icae3ec4d1ee2036fbba7b9eb5c03a1c86014fcc0

Changed in nova:
status:	In Progress → Fix Committed

Vish Ishaya (vishvananda) on 2014-01-09

Changed in nova:
importance:	Undecided → Medium
tags:	added: havana-backport-potential
tags:	added: grizzly-backport-potential

Russell Bryant (russellb) on 2014-01-13

Changed in nova:
milestone:	none → icehouse-2

Thierry Carrez (ttx) on 2014-01-22

Changed in nova:
status:	Fix Committed → Fix Released

Alan Pevec (apevec) on 2014-03-30

tags:

removed: grizzly-backport-potential

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-2 → 2014.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-06: Fix proposed to nova (stable/havana)

#15

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/112391

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-22: Change abandoned on nova (stable/havana)

#16

Change abandoned by Alan Pevec (<email address hidden>) on branch: stable/havana
Review: https://review.openstack.org/112391
Reason: Final Havana release 2013.2.4 has been cut and stable/havana is going to be removed in a week.