Summary: Undeletable volumes after live migration (iSCSI)

Bug #1132146 reported by Robert Heinzmann
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Jason Dillaman

Bug Description

Summary:
  iscsi client session is not removed after live migration leading to undeletable volumes (error_deleting)

Description:
  When using cinder with iSCSI as EBS Backend and KVM as virtualization, live migrating a machine then then terminating it leads to undeletable volumes
  (volumes beeing stuck in "error_delete" state)

Problem:
  After the live migration, the iSCSI Session on the source host to the cinder storage is not removed. When deleting the instance, the volume is cleared (dd) and then the iscsi session on the machine, the instance is currently running on is removed. However the iscsi session on the source host of the migration is not cleared, leading to a situation where the target on Cinder (TGTD) can not be removed as it is still "in use".

Solution:
  a) Force Logout of Cinder with something like the folloing before deleteing the volume:

     tgtadm --lld iscsi --mode target --op unbind --tid=X -I ALL
  tgtadm --lld iscsi --mode conn --op delete --tid=X --sid Y --cid 0
  tgtadm --op delete --mode logicalunit --tid=X --lun 1
  tgtadm --lld iscsi --mode target --op delete --tid=X

  b) After the migration was successfull, logout of the iSCSi Target on the source host

b) is probably easier and cleaner !

doing a+b) should be very solid and error prove

How to Reproduce:

1) Configure 2 Node Cluster with EBS DISK (Cinder)
2) Create a VM based on SAN Boot Volumes
3) Migrate the machine from HOST1 to HOST2
4) Terminate the instance
5) Delete the Volume of the instance
6) Here the Volume is stuck in undeletable state ("error_delete")

Debug Info:
------------

Instance in Question: instance-0000005b
Hypervisor Hosts: ESX1 / ESX2 (running KVM, not ESX - naming fun :))

root@esx1:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot
tcp: [8] 172.16.0.130:3260,1 iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7

root@esx1:~# virsh list
 Id Name State
----------------------------------------------------
 2 quantum running
 8 instance-0000005b running

 root@esx1:~# virsh dumpxml instance-0000005b
<domain type='kvm' id='8'>
  <name>instance-0000005b</name>
  <uuid>4bd1426d-4972-4176-8a72-bfccd3c9035b</uuid>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-1.2'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-172.16.0.130:3260-iscsi-iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7-lun-1'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:32:59:24'/>
      <source bridge='qbr6b2a850f-46'/>
      <target dev='vnet3'/>
      <model type='virtio'/>
      <filterref filter='nova-instance-instance-0000005b-fa163e325924'>
        <parameter name='DHCPSERVER' value='192.168.102.3'/>
        <parameter name='IP' value='192.168.102.4'/>
        <parameter name='PROJMASK' value='255.255.255.0'/>
        <parameter name='PROJNET' value='192.168.102.0'/>
      </filterref>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/instance-0000005b/console.log'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target port='1'/>
      <alias name='serial1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/instance-0000005b/console.log'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5901' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='apparmor' relabel='yes'>
    <label>libvirt-4bd1426d-4972-4176-8a72-bfccd3c9035b</label>
    <imagelabel>libvirt-4bd1426d-4972-4176-8a72-bfccd3c9035b</imagelabel>
  </seclabel>
</domain>

root@esx2:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot

root@esx2:~# virsh list
 Id Name State
----------------------------------------------------

root@openstack:/etc/init.d# nova list --all-tenants
Please enter password for encrypted keyring:
+--------------------------------------+----------+--------+--------------------+
| ID | Name | Status | Networks |
+--------------------------------------+----------+--------+--------------------+
| 4bd1426d-4972-4176-8a72-bfccd3c9035b | asdasdad | ACTIVE | lan1=192.168.102.4 |
+--------------------------------------+----------+--------+--------------------+

root@openstack:/etc/init.d# nova show 4bd1426d-4972-4176-8a72-bfccd3c9035b
Please enter password for encrypted keyring:
+-------------------------------------+-----------------------------------------------------------+
| Property | Value |
+-------------------------------------+-----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-SRV-ATTR:host | esx1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | esx1.lab.elconas.de |
| OS-EXT-SRV-ATTR:instance_name | instance-0000005b |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2013-02-23T15:15:57Z |
| flavor | m1.tiny (6) |
| hostId | 8a48ee7c074e396d726b6da80eeacb9a7bae4bf41450e5b772cf9ff0 |
| id | 4bd1426d-4972-4176-8a72-bfccd3c9035b |
| image | Ubuntu-Image-12.04 (cf59575f-189f-4275-9e66-1cc39efb47e4) |
| key_name | rheinzmann |
| lan1 network | 192.168.102.4 |
| metadata | {} |
| name | asdasdad |
| progress | 0 |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| tenant_id | 62515dbc834241d8ab5d58ed7ea50f6b |
| updated | 2013-02-23T15:28:47Z |
| user_id | b193e3443cd94f41bac8938f4da5a9d0 |
+-------------------------------------+-----------------------------------------------------------+

root@openstack:/etc/init.d# nova live-migration 4bd1426d-4972-4176-8a72-bfccd3c9035b esx2
Please enter password for encrypted keyring:
root@openstack:/etc/init.d# nova show 4bd1426d-4972-4176-8a72-bfccd3c9035b
Please enter password for encrypted keyring:
+-------------------------------------+-----------------------------------------------------------+
| Property | Value |
+-------------------------------------+-----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-SRV-ATTR:host | esx2 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | esx2.lab.elconas.de |
| OS-EXT-SRV-ATTR:instance_name | instance-0000005b |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2013-02-23T15:15:57Z |
| flavor | m1.tiny (6) |
| hostId | cd6040c927390b663ce86a21e25f48228cd8dd084d05a2e826f5ac0e |
| id | 4bd1426d-4972-4176-8a72-bfccd3c9035b |
| image | Ubuntu-Image-12.04 (cf59575f-189f-4275-9e66-1cc39efb47e4) |
| key_name | rheinzmann |
| lan1 network | 192.168.102.4 |
| metadata | {} |
| name | asdasdad |
| progress | 0 |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| tenant_id | 62515dbc834241d8ab5d58ed7ea50f6b |
| updated | 2013-02-23T15:45:21Z |
| user_id | b193e3443cd94f41bac8938f4da5a9d0 |
+-------------------------------------+-----------------------------------------------------------+

root@esx1:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot
tcp: [8] 172.16.0.130:3260,1 iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7
root@esx1:~# virsh list
 Id Name State
----------------------------------------------------
 2 quantum running

root@esx2:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot
tcp: [4] 172.16.0.130:3260,1 iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7
root@esx2:~# virsh list
 Id Name State
----------------------------------------------------
 3 instance-0000005b running

root@esx2:~#

=> Terminate Instance in GUI ......

=> Instance Terminated and Deleted

=> Volume ff447b2f-6910-4e35-a3c0-78bffe6f35d7 beeing Available ...

Try to delete volume ff447b2f-6910-4e35-a3c0-78bffe6f35d7 in GUI ....

Now the "dd" takes place to remove the volume

After the "dd" the deletion has ended in "Error_Deleting"

root@esx1:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot
tcp: [8] 172.16.0.130:3260,1 iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7

root@esx2:~# iscsiadm -m session
tcp: [1] 172.16.0.110:3260,2 iqn.1986-03.com.sun:nexentaboot

root@esx2:~# cinder list --all-tenants
+--------------------------------------+----------------+--------------+------+-------------+-------------+
| ID | Status | Display Name | Size | Volume Type | Attached to |
+--------------------------------------+----------------+--------------+------+-------------+-------------+
| 332e4755-070c-45b4-b4b4-408c3fe6609b | available | mytest | 5 | None | |
| df9416af-0b35-4acf-af1b-3c2593757b67 | error_deleting | | 5 | None | |
| ff447b2f-6910-4e35-a3c0-78bffe6f35d7 | error_deleting | | 5 | None | |
+--------------------------------------+----------------+--------------+------+-------------+-------------+

On Cinder:

root@esx2:~# tgtadm --mode target --op show
...
Target 2: iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 6
            Initiator: iqn.2010-09.org.etherboot:openstack164
            Connection: 0
                IP Address: 172.16.0.120
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET 00020000
            SCSI SN: beaf20
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET 00020001
            SCSI SN: beaf21
            Size: 5369 MB, Block size: 512
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/cinder-volumes/volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7
            Backing store flags:
    Account information:
    ACL information:
        ALL

Left Over Session:

root@esx2:~# tgtadm --lld iscsi --mode conn --op show --tid 2
Session: 6
    Connection: 0
        Initiator: iqn.2010-09.org.etherboot:openstack164
        IP Address: 172.16.0.120

Delete the Session

root@esx1:~# iscsiadm -m session -r 8 -u
Logging out of session [sid: 8, target: iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7, portal: 172.16.0.130,3260]
Logout of [sid: 8, target: iqn.2010-10.org.openstack:volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7, portal: 172.16.0.130,3260] successful.

root@esx2:~# tgtadm --mode target --op delete --tid 2

root@esx2:~# lvremove /dev/cinder-volumes/volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7
File descriptor 3 (/usr/share/bash-completion/completions) leaked on lvremove invocation. Parent PID 11376: -bash
Do you really want to remove active logical volume volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7? [y/n]: y
  Logical volume "volume-ff447b2f-6910-4e35-a3c0-78bffe6f35d7" successfully removed

Tags: cinder iscsi tgtd
Revision history for this message
John Griffith (john-griffith) wrote :

Wouldn't this be the responsibility of the compute node (nova) to disconnect the session on migration?

Revision history for this message
Robert Heinzmann (reg-x) wrote : Re: [Bug 1132146] Re: Summary: Undeletable volumes after live migration (iSCSI)

It think it should be part of the "live migration" orchestration.
Actually I don't know which componente coordinates the live migration -
if it is nova-compute then yes, it should be part of this component ?

Am 05.03.2013 22:20, schrieb John Griffith:
> Wouldn't this be the responsibility of the compute node (nova) to
> disconnect the session on migration?
>

affects: cinder → nova
Revision history for this message
Chuck Short (zulcss) wrote :

Which version of nova?

Changed in nova:
status: New → Incomplete
Revision history for this message
Robert Heinzmann (reg-x) wrote :
Download full text (6.4 KiB)

Compute:

root@COMPUTE1:~# dpkg -l | grep -i openstack

ii nova-common 2012.2.3-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2012.2.3-0ubuntu1 all OpenStack Compute - compute node
ii nova-compute-kvm 2012.2.3-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii python-cinderclient 1:1.0.0-0ubuntu1 all python bindings to the OpenStack Volume API
ii python-glance 2012.2.3-0ubuntu1 all OpenStack Image Registry and Delivery Service - Python library
ii python-glanceclient 1:0.5.1-0ubuntu1 all Client library for Openstack glance server.
ii python-keystone 2012.2.3+stable-20130206-82c87e56-0ubuntu1 all OpenStack identity service - Python library
ii python-keystoneclient 1:0.1.3-0ubuntu1.1 all Client libary for Openstack Keystone API
ii python-nova 2012.2.3-0ubuntu1 all OpenStack Compute Python libraries
ii python-novaclient 1:2.9.0-0ubuntu1 all client library for OpenStack Compute API
ii python-quantum 2012.2.3-0ubuntu1 all Quantum is a virutal network service for Openstack. (python library)
ii python-quantumclient 1:2.1-0ubuntu1 all client - Quantum is a virtual network service for Openstack
ii python-swiftclient 1:1.2.0-0ubuntu2 all Client libary for Openstack Swift API.
ii quantum-common 2012.2.3-0ubuntu1 all common - Quantum is a virtual network service for Openstack.
ii quantum-plugin-openvswitch 2012.2.3-0ubuntu1 all Quantum is a virtual network service for Openstack. (openvswitch plugin)
ii quantum-plugin-openvswitch-agent 2012.2.3-0ubuntu1 all Quantum is a virtual network service for Openstack. (openvswitch plugin agent)

Controller (KeyStone etc.)

root@openstack:~# dpkg -l | grep -i Openstack
ii glance 2012.2.3-0ubuntu1 all OpenStack Image Registry and Delivery Service - Daemons
ii glance-api 2012.2.3-0ubuntu1 all OpenStack Image Registry and Delivery Service - API
ii glance-common 2012.2.3-0ubuntu1 all OpenStack Image Registry and Delivery Service - Common
ii glance-registry ...

Read more...

Revision history for this message
Adrien (adrrob) wrote :

I reported the same bug yersteday (https://bugs.launchpad.net/nova/+bug/1156788) and if it can help :

I run Nova and Cinder Folsom (2012.2.3) on Debian Wheezy X86_64 Kernel 3.2.
I installed nova from github repositories.

Changed in nova:
status: Incomplete → New
Revision history for this message
Vish Ishaya (vishvananda) wrote :

This needs to be investigated to see if it is still a problem. It looks like we might be missing some cleanup in the complete live migration code path

Changed in nova:
importance: Undecided → Medium
milestone: none → havana-1
status: New → Triaged
Changed in nova:
milestone: havana-1 → havana-2
Changed in nova:
status: Triaged → In Progress
assignee: nobody → Jason Dillaman (jdillaman)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/34617

Changed in nova:
milestone: havana-2 → havana-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/34617
Committed: http://github.com/openstack/nova/commit/38e6e93a8102751b781d79da37249fe9c55f2575
Submitter: Jenkins
Branch: master

commit 38e6e93a8102751b781d79da37249fe9c55f2575
Author: Jason Dillaman <email address hidden>
Date: Wed Jun 26 16:53:03 2013 -0400

    Disconnect from iSCSI volume sessions after live migration

    The live migration source host will now disconnect from
    iSCSI volume sessions after the VM is successfully migrated
    to the destination host.

    Fixes bug 1132146
    Change-Id: I132869612bdf2baa810756586e643ea68ea8d8f6

Changed in nova:
status: In Progress → Fix Committed
tags: added: grizzly-backport-potential
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/48246

Thierry Carrez (ttx)
Changed in nova:
milestone: havana-3 → 2013.2
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.