Qcow2 image stuck as locked after host crash

Bug #1819343 reported by Tim Schuster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Invalid
Undecided
Unassigned

Bug Description

After a host crash, the qcow2 image of the VM, stored on a remote NFS share, has become inaccessible. Libvirt/QEMU reports that 'failed to get "write" lock\nIs another process using the image [/path/nfs/image.qcow2]?'. No process is accessing the image from either host or the network share side. There is no obvious way in qemu-img to force unlocking the file or repair the image (attempting a qemu-img check with -r all results in qemu-img complaining about the lock and being unable to do force-share=on on anything but readonly images).

I'm currently attempting to fix this by converting the image via 'qemu-img convert -U -f qcow2 -O qcow2 image.qcow2 image_2.gcow2', though this will likely take some time.

Using QEMU 3.1.0

Revision history for this message
John Snow (jnsnow) wrote :

I wonder if your QEMU is using OFD locks or not, which might depend on a few things:
-Are you using a distributed QEMU or one you've built yourself?
-What glibc was it compiled against?
-What version of Linux are you running under?

I would have thought that after the process that held the lock died that the lock would be released, but perhaps it's more complicated than that because of NFS, perhaps there's a very long timeout involved somewhere?

Revision history for this message
Tim Schuster (tscs37) wrote :

Hi,

I used both the standard qemu package from the archlinux repositories as well as one I compiled myself with a few patches on top to improve audio performance.

According to my logs, the compiled version used 2.28-4, I don't know what archlinux compiles them against. 2.28-5 is currently deployed on the system.

The system is an up-to-date archlinux.

I've unmounted and mounted the NFS during my recovery attempts, this should have released any possible lock that was being held, though since the host with qemu crashed and restarted this was more snakeoil than actually trying to fix things.

fuser and lsof on the NFS host and QEMU host both showed no process holding a lock on the file.

Revision history for this message
John Snow (jnsnow) wrote :

Hi, a colleague of mine has pointed out that this is a well-worn problem with nfs*v3*:

https://bugzilla.redhat.com/show_bug.cgi?id=1547095#c43

Workarounds seem to involve:
- Use v4, or
- Use the nolock option.

Does this cover your use case?

Revision history for this message
Tim Schuster (tscs37) wrote :

Yes, it would be v3, I'll use v4 then, thanks!

Revision history for this message
John Snow (jnsnow) wrote :

OK; I will be marking this as invalid to mark our belief that this is a bug in NFS and not in QEMU; please re-open if you run into additional troubles!

Changed in qemu:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.