Sometimes Cinder volumes fail to attach with error "The device is not writable: Permission denied"

Bug #1495701 reported by Patrick East
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
os-brick
Fix Released
Undecided
Patrick East

Bug Description

This is happening on the latest master branch in CI systems. It happens very rarely in the gate:

http://logstash.openstack.org/#eyJzZWFyY2giOiJcImxpYnZpcnRFcnJvcjogb3BlcmF0aW9uIGZhaWxlZDogb3BlbiBkaXNrIGltYWdlIGZpbGUgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0NDIyNjY3MDU1NzZ9

And on some third party CI systems (not included in the logstash results):

http://ec2-52-8-200-217.us-west-1.compute.amazonaws.com/28/216728/5/check/PureFCDriver-tempest-dsvm-volume-multipath/bd3618d/logs/libvirt/libvirtd.txt.gz#_2015-09-14_09_00_44_829

When the error occurs there is a stack trace in the n-cpu log like this:

http://logs.openstack.org/22/222922/2/check/gate-tempest-dsvm-full-lio/550be5e/logs/screen-n-cpu.txt.gz?level=DEBUG#_2015-09-13_17_34_07_787

2015-09-13 17:34:07.787 ERROR nova.virt.libvirt.driver [req-4ac04f97-f468-466a-9fb2-02d1df3a5633 tempest-TestEncryptedCinderVolumes-1564844141 tempest-TestEncryptedCinderVolumes-804461249] [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] Failed to attach volume at mountpoint: /dev/vdb
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] Traceback (most recent call last):
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1115, in attach_volume
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] guest.attach_device(conf, persistent=True, live=live)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 233, in attach_device
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] self._domain.attachDeviceFlags(conf.to_xml(), flags=flags)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] rv = execute(f, *args, **kwargs)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] six.reraise(c, e, tb)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] rv = meth(*args, **kwargs)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6] libvirtError: operation failed: open disk image file failed
2015-09-13 17:34:07.787 22300 ERROR nova.virt.libvirt.driver [instance: 82f33247-d8be-49c7-9f89-f02602de5ef6]

and a corresponding error in the libvirt log such as this:

http://logs.openstack.org/22/222922/2/check/gate-tempest-dsvm-full-lio/550be5e/logs/libvirt/libvirtd.txt.gz#_2015-09-13_17_34_07_499

2015-09-13 17:34:07.496+0000: 16871: debug : qemuMonitorJSONCommandWithFd:264 : Send command '{"execute":"human-monitor-command","arguments":{"command-line":"drive_add dummy file=/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0,if=none,id=drive-virtio-disk1,format=raw,serial=561640e9-081a-430b-a7f8-9cadd63d2d00,cache=none"},"id":"libvirt-16"}' for write with FD -1
2015-09-13 17:34:07.496+0000: 16871: debug : qemuMonitorSend:959 : QEMU_MONITOR_SEND_MSG: mon=0x7f50dc008db0 msg={"execute":"human-monitor-command","arguments":{"command-line":"drive_add dummy file=/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0,if=none,id=drive-virtio-disk1,format=raw,serial=561640e9-081a-430b-a7f8-9cadd63d2d00,cache=none"},"id":"libvirt-16"}
 fd=-1
2015-09-13 17:34:07.496+0000: 16868: debug : qemuMonitorIOWrite:504 : QEMU_MONITOR_IO_WRITE: mon=0x7f50dc008db0 buf={"execute":"human-monitor-command","arguments":{"command-line":"drive_add dummy file=/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0,if=none,id=drive-virtio-disk1,format=raw,serial=561640e9-081a-430b-a7f8-9cadd63d2d00,cache=none"},"id":"libvirt-16"}
 len=325 ret=325 errno=22
2015-09-13 17:34:07.499+0000: 16868: debug : qemuMonitorIOProcess:396 : QEMU_MONITOR_IO_PROCESS: mon=0x7f50dc008db0 buf={"return": "could not open disk image /dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0: Could not open '/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0': Operation not permitted\r\n", "id": "libvirt-16"}
 len=352
2015-09-13 17:34:07.499+0000: 16868: debug : qemuMonitorJSONIOProcessLine:157 : Line [{"return": "could not open disk image /dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0: Could not open '/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0': Operation not permitted\r\n", "id": "libvirt-16"}]
2015-09-13 17:34:07.499+0000: 16868: debug : qemuMonitorJSONIOProcessLine:177 : QEMU_MONITOR_RECV_REPLY: mon=0x7f50dc008db0 reply={"return": "could not open disk image /dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0: Could not open '/dev/disk/by-path/ip-172.99.112.13:3260-iscsi-iqn.2010-10.org.openstack:volume-561640e9-081a-430b-a7f8-9cadd63d2d00-lun-0': Operation not permitted\r\n", "id": "libvirt-16"}
2015-09-13 17:34:07.499+0000: 16868: debug : qemuMonitorJSONIOProcess:226 : Total used 352 bytes out of 352 available in buffer
2015-09-13 17:34:07.499+0000: 16871: debug : qemuMonitorJSONCommandWithFd:269 : Receive command reply ret=0 rxObject=0x7f511026bfc0
2015-09-13 17:34:07.499+0000: 16871: error : qemuMonitorTextAddDrive:2611 : operation failed: open disk image file failed
2015-09-13 17:34:07.499+0000: 16871: debug : qemuDomainObjExitMonitorInternal:1301 : Exited monitor (mon=0x7f50dc008db0 vm=0x7f50f81661f0 name=instance-00000057)

It is unclear yet what causes this to occur. It has reproduced in several different tempest tests.

Revision history for this message
Patrick East (patrick-east) wrote :

After some further testing it looks like this almost always happens with a FC multipath device in the Pure Storage CI system. With multipath disabled the errors seem to stop happening.

tags: added: fibre-channel libvirt multipath volumes
description: updated
Revision history for this message
Patrick East (patrick-east) wrote :
Revision history for this message
Patrick East (patrick-east) wrote :
description: updated
Revision history for this message
Andrew Kerr (andrew-kerr) wrote :

We are seeing this same issue with NetApp's ESeries FC tests

Revision history for this message
Patrick East (patrick-east) wrote :

Not sure why it didn't get linked automatically, but I've got a fix (well.. maybe closer to work-around) up for review here https://review.openstack.org/#/c/235067/

Changed in os-brick:
assignee: nobody → Patrick East (patrick-east)
Changed in os-brick:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/235067
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=2e514655194b5385526dec7bc743cd3500a4dca9
Submitter: Jenkins
Branch: master

commit 2e514655194b5385526dec7bc743cd3500a4dca9
Author: Patrick East <email address hidden>
Date: Thu Sep 17 11:41:37 2015 -0700

    Wait for FC multipath devices to become writable

    Sometimes the multipath devices would be read-only block devices which
    would end up causing problems when anything tries to write to them. We
    now will wait and rescan a few times to try and get them to become
    read-write before moving on with the attach.

    Change-Id: If3c17348f6a6e766f811fcd446c03c82e5c8a883
    Partial-Bug: #1495701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/240111

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/liberty)

Reviewed: https://review.openstack.org/240111
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=86e2918c3e843fc355cca2b9f9ae7db569256f18
Submitter: Jenkins
Branch: stable/liberty

commit 86e2918c3e843fc355cca2b9f9ae7db569256f18
Author: Patrick East <email address hidden>
Date: Thu Sep 17 11:41:37 2015 -0700

    Wait for FC multipath devices to become writable

    Sometimes the multipath devices would be read-only block devices which
    would end up causing problems when anything tries to write to them. We
    now will wait and rescan a few times to try and get them to become
    read-write before moving on with the attach.

    Change-Id: If3c17348f6a6e766f811fcd446c03c82e5c8a883
    Partial-Bug: #1495701
    (cherry picked from commit 2e514655194b5385526dec7bc743cd3500a4dca9)

tags: added: in-stable-liberty
Revision history for this message
Magued (magued-sedra) wrote :

I also face this problem with multipath devices on iSCSI. The fix proveded above seems to work for me as well, but any idea so far for the root cause?

Revision history for this message
Scott DAngelo (scott-dangelo) wrote :

markus_z: I discusse with patrickeast and we think this is a "high" priority for users of both FibreChannel and multipath, but that is a less common use case, so perhaps should be set to "medium" priority.

Changed in nova:
status: New → Confirmed
Revision history for this message
Lee Yarwood (lyarwood) wrote :

IMHO this isn't a valid issue for Nova and the delays seen with devices becoming writeable is more than likely with the hosts multipath configuration, storage fabric etc. Are we still seeing failures even with the additional retry logic added in os-brick?

Revision history for this message
John Garbutt (johngarbutt) wrote :

Marking as incomplete for nova, given we don't really have any pointers towards a Nova issue here. Its some external issue that causes Nova to have issues attaching to the volume. At least it looks that way.

Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Patrick, is this still an issue? I haven't seen this myself.

Changed in nova:
status: Incomplete → Invalid
Changed in os-brick:
status: In Progress → Fix Released
Changed in cinder:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.