iSCSI connection not logging out after volume detach

Bug #1436561 reported by Qiu Yu on 2015-03-25
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Unassigned
CentOS
New
Undecided
Unassigned

Bug Description

We're running a workload on top of our OpenStack deployment, which heavily depends on the cinder volume attach / detach to do data persistence. From time to time, it is observed that iSCSI connection not being logged out after volume detach, while block device mapping in Nova database and volume in Cinder database have been deleted.

The disconnect_volume in libvirt volume driver was called, but the actual iSCSI logout not triggered because the domain still holding the reference.

2015-03-19 15:28:55,365.365 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Making synchronous call on conductor ... multicall /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:553
2015-03-19 15:28:55,365.365 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] MSG_ID is 4fc3ba5a0aaf42e993987b103fcd776f multicall /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:556
2015-03-19 15:28:55,366.366 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] UNIQUE_ID is d085d575d08841efbfbd8b15cc0a44bf. _add_unique_id /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:341
2015-03-19 15:28:55,386.386 4209 AUDIT nova.compute.manager [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] [instance: 8aff4eca-23b6-4673-b97c-696e21897484] Detach volume 6ef40074-7637-4b8d-82c2-7a28df2dbbc8 from mountpoint /dev/vdc
2015-03-19 15:29:00,005.005 4209 DEBUG nova.openstack.common.lockutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Got semaphore "connect_volume" lock /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:166
2015-03-19 15:29:00,006.006 4209 DEBUG nova.openstack.common.lockutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Got semaphore / lock "disconnect_volume" inner /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:245
2015-03-19 15:29:00,026.026 4209 DEBUG nova.openstack.common.processutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf cat /etc/iscsi/initiatorname.iscsi execute /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/com
mon/processutils.py:147
2015-03-19 15:29:00,111.111 4209 DEBUG nova.openstack.common.processutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf systool -c fc_host -v execute /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/processutils.py:147
2015-03-19 15:29:00,186.186 4209 DEBUG nova.openstack.common.processutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Result was 96 execute /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/processutils.py:172
2015-03-19 15:29:00,187.187 4209 WARNING nova.virt.libvirt.utils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] systool is not installed
2015-03-19 15:29:00,187.187 4209 DEBUG nova.virt.libvirt.driver [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] [instance: 8aff4eca-23b6-4673-b97c-696e21897484] Could not determine fibre channel world wide node names get_volume_connector /opt/openstack/nova/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:1025
2015-03-19 15:29:00,188.188 4209 DEBUG nova.openstack.common.processutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf systool -c fc_host -v execute /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/processutils.py:147
2015-03-19 15:29:00,258.258 4209 DEBUG nova.openstack.common.processutils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Result was 96 execute /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/processutils.py:172
2015-03-19 15:29:00,259.259 4209 WARNING nova.virt.libvirt.utils [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] systool is not installed
2015-03-19 15:29:00,259.259 4209 DEBUG nova.virt.libvirt.driver [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] [instance: 8aff4eca-23b6-4673-b97c-696e21897484] Could not determine fibre channel world wide port names get_volume_connector /opt/openstack/nova/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:1032
2015-03-19 15:29:00,260.260 4209 DEBUG nova.volume.cinder [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Cinderclient connection created using URL: http://<os volume endpoint>:8776/v1/586d0319753740e982c00e0638451fd2 cinderclient /opt/openstack/nova/lib/python2.7/site-packages/nova/volume/cinder.py:96
2015-03-19 15:29:00,415.415 4209 DEBUG nova.volume.cinder [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Cinderclient connection created using URL: http://<os volume endpoint>:8776/v1/586d0319753740e982c00e0638451fd2 cinderclient /opt/openstack/nova/lib/python2.7/site-packages/nova/volume/cinder.py:96
2015-03-19 15:29:01,053.053 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Making synchronous call on conductor ... multicall /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:553
2015-03-19 15:29:01,053.053 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] MSG_ID is 116101d167fe41f1aaf7aaad58911590 multicall /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:556
2015-03-19 15:29:01,054.054 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] UNIQUE_ID is 345bbd162deb410d85320855b9a97bcc. _add_unique_id /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:341
2015-03-19 15:29:01,082.082 4209 DEBUG nova.openstack.common.rpc.amqp [req-505a3f6e-dd0e-4a15-aabf-2c5da8ccbcd7 6a08f3e43965436fb028eda1005ec77b 586d0319753740e982c00e0638451fd2] Sending compute.instance.volume.detach on notifications.info notify /opt/openstack/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py:621

Fix proposed to branch: master
Review: https://review.openstack.org/167815

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Huang Zhiteng (zhiteng-huang) wrote :

This bug may cause bigger impact than it appears to be (an idle, stale iSCSI session left on hypervisor). We do see a correlation between increased network latency and having stale iSCSI sessions left on hypervisor. Also this bug is very easy to reproduce if volume service is used often. Before we patched it internally, we had to clean up stale iSCSI session once/twice a week within an AZ.

It'll be nice to see this bug being fixed in L.

Qiu Yu (unicell) on 2015-03-26
tags: added: libvirt
Anish Bhatt (anish7) wrote :

I'm seeing this as well, but in a different scenario. Leftover open-iscsi node configurations like this prevent use of open-iscsi offload transports as the leftover configuration still shows the node using tcp transport.

Duncan Thomas (duncan-thomas) wrote :

This can be caused by things other than the hypervisor (udev, blockid, lvm) accessing the device nodes. Running 'fuser' and 'ls -l /block/<dev>/holders' in the case of removal failure can sometimes catch the culprit(s), though sometimes there are gone by the time you check.

Matt Riedemann (mriedem) on 2015-06-04
tags: added: volumes
Changed in nova:
assignee: Qiu Yu (unicell) → Huang Zhiteng (zhiteng-huang)
Changed in nova:
assignee: Huang Zhiteng (zhiteng-huang) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Huang Zhiteng (zhiteng-huang)
Tyler Bishop (tyler-bishop) wrote :

Confirmed were seeing this same issue with the actual cinder controller when creating volumes.

After a while the node has hundreds of stale volume mappings.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/167815
Reason: This looks abandoned. If you still want to work on this please restore and address comments, especially those related to the os-brick integration.

Changed in nova:
assignee: Huang Zhiteng (zhiteng-huang) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers