GlusterfsException: No base file found

Bug #1311182 reported by Dafna Ron
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Eric Harney

Bug Description

working with gluster backend for cinder, after failure to delete a snapshot on timeout, I reset the snapshot status to try and delete it again, in which time I immediately fail to delete the snapshot with the following error:

2014-04-22 18:25:48.499 2233 DEBUG cinder.openstack.common.processutils [req-ebe13b37-9fb4-496b-8869-cff771b9cd60 97e5450b24624fd78ad6fa6d8a14ef3d c3178ebef2c24d1b9a045bd67483a83c] Running cmd (subprocess): sudo cinder-rootwrap /etc/cind
er/rootwrap.conf env LC_ALL=C LANG=C qemu-img info /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa execute /usr/lib/python2.6/site-packages/cinder/open
stack/common/processutils.py:142
2014-04-22 18:25:48.507 2233 DEBUG qpid.messaging.io.raw [-] SENT[3e0f710]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\x01\x00\x00(e20240e0-4cca-44eb-8847-cc486fa240fa:357\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\x
07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' writeable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:480
2014-04-22 18:25:48.508 2233 DEBUG qpid.messaging.io.raw [-] READ[3e0f710]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x01\x00\x00(e20240e0-4cca-44eb-8847-cc486fa240fa:357\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\x
07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' readable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:416
2014-04-22 18:25:48.509 2233 DEBUG qpid.messaging.io.ops [-] RCVD[3e0f710]: SessionAttached(name='e20240e0-4cca-44eb-8847-cc486fa240fa:357') write /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:654
2014-04-22 18:25:48.509 2233 DEBUG qpid.messaging.io.ops [-] RCVD[3e0f710]: SessionCommandPoint(command_id=serial(0), command_offset=0) write /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:654
2014-04-22 18:25:48.741 2233 ERROR cinder.openstack.common.rpc.amqp [req-ebe13b37-9fb4-496b-8869-cff771b9cd60 97e5450b24624fd78ad6fa6d8a14ef3d c3178ebef2c24d1b9a045bd67483a83c] Exception during message handling
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp **args)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 435, in delete_snapshot
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp {'status': 'error_deleting'})
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp self.gen.next()
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 423, in delete_snapshot
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp self.driver.delete_snapshot(snapshot_ref)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/lockutils.py", line 247, in inner
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp retval = f(*args, **kwargs)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 557, in delete_snapshot
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp self._delete_snapshot(snapshot)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 621, in _delete_snapshot
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp raise exception.GlusterfsException(msg)
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp GlusterfsException: No base file found for /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa.
2014-04-22 18:25:48.741 2233 TRACE cinder.openstack.common.rpc.amqp
2014-04-22 18:26:03.171 2233 DEBUG qpid.messaging.io.raw [-] READ[3b385a8]: '\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\n\x00\x00' readable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:416
2014-04-22 18:26:03.172 2233 DEBUG qpid.messaging.io.ops [-] RCVD[3b385a8]: ConnectionHeartbeat() write /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:654
2014-04-22 18:26:04.888 2233 DEBUG qpid.messaging.io.raw [-] READ[3e0f710]: '\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\n\x00\x00' readable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:416
2014-04-22 18:26:04.888 2233 DEBUG qpid.messaging.io.ops [-] RCVD[3e0f710]: ConnectionHeartbeat() write /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:654
2014-04-22 18:26:07.577 2233 DEBUG cinder.openstack.common.periodic_task [-] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.6/site-packages/cinder/openstack/common/periodic_task.py:176

this is the ls output showing that we see the mount:

[root@hostXX ~(keystone_admin)]# ls -l /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa
-rw-r--r--. 1 qemu qemu 10739843072 Apr 22 18:21 /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa

here is the snapshot in the storage:

[root@storage Dafna]# ls -l |grep c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d
-rw-r--r-- 2 qemu qemu 10739843072 Apr 22 18:21 volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa

Mike Perez (thingee)
tags: added: drivers glusterfs
Revision history for this message
Thang Pham (thang-pham) wrote :

There should be a file under /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/, called volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.info. What does that file say?

Changed in cinder:
assignee: nobody → Thang Pham (thang-pham)
Revision history for this message
Dafna Ron (dron-3) wrote :

this was a testing environment and I do not have it anymore.
However, I did not delete the gluster share and the file is there:
[root@orion Dafna]# ls -l |grep c2d7b4c3-d9ef-4676-975b-fb7c83fe3927
-rw-rw-rw- 2 qemu qemu 10737418240 Apr 22 18:08 volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927
-rw-r--r-- 2 root root 10739843072 Apr 22 18:59 volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa
-rw-r--r-- 2 165 165 223 Apr 22 18:09 volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.info

Revision history for this message
Thang Pham (thang-pham) wrote :

Could you issue a "cat /var/lib/cinder/mnt/59ed8cb64fc8948968a29181234051a2/volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.info"?

Revision history for this message
Dafna Ron (dron-3) wrote :

no. because I do not have the openstack setup anymore :)

I can give you the cat from the storage server:

[root@orion Dafna]# cat volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.info
{
 "active": "volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa",
 "bb2fa5f1-4df4-4a62-b077-0a10d862c6fa": "volume-c2d7b4c3-d9ef-4676-975b-fb7c83fe3927.bb2fa5f1-4df4-4a62-b077-0a10d862c6fa"
}[root@orion Dafna]#

Revision history for this message
Eric Harney (eharney) wrote :

What you are seeing here is the result of:

1. Snapshot delete operation started
2. Cinder times out the snapshot delete operation and fails it because it took longer than expected. (This is not ideal but is how things currently work.)
3. Nova continues processing the snapshot delete operation and commits the changes. This means the snapshot actually is deleted from the storage/qcow2 point of view.
4. Cinder's view is now out of sync because it didn't update the info file -- what should have happened at this point is the snapshot entry is removed from the .info file and the snapshot is deleted from Cinder.
5. Cinder tries to delete the snapshot on this run but it no longer exists so the delete operation fails.

I have some changes in the works to address (2) above but another approach here would be to have the delete snapshot operation handle this case cleanly. (It probably should.)

Revision history for this message
Thang Pham (thang-pham) wrote :

Hi Eric,

What were you planning to do to resolve #2 above? Can we add a cinder.conf option for the nova wait timeout for both create and delete snapshot? e.g. nova_wait_timeout = 7200.

Regards,
Thang

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96340

Changed in cinder:
status: New → In Progress
Changed in cinder:
assignee: Thang Pham (thang-pham) → Eric Harney (eharney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/96340
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=3fd0a403b2688a52603accb9cc530a6b0ccad066
Submitter: Jenkins
Branch: master

commit 3fd0a403b2688a52603accb9cc530a6b0ccad066
Author: Thang Pham <email address hidden>
Date: Wed May 28 23:07:49 2014 -0400

    GlusterFS: Handle deletion of snapshot with no backing file

    If a glusterfs volume is in-use, nova is called to delete a
    volume snapshot. It is possible for a timeout to occur, since
    cinder is polling for nova task progress. In such case, a
    GlusterfsException is thrown, but nova continues to process the
    snapshot delete and commits the changes. Cinder is not aware of
    this, and the actual snapshot entry in cinder is prevented from
    being deleted, since no backing file exists for the out-of-sync
    snapshot. This patch allows a snapshot with no backing file to
    be deleted from cinder and updates the volume info file.

    Change-Id: I83c8a7242199edbfd1897297589ac7deb42c5865
    Closes-Bug: #1311182

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.