Snapshot with quiesce option fails and kills VM on ESXi 4.1

Bug #611644 reported by dnmvisser
This bug affects 5 people
Affects Status Importance Assigned to Milestone
open-vm-tools (Ubuntu)

Bug Description

Binary package hint: open-vm-tools

I recently upgraded from ESX4u2 to ESX4.1, and since then snapshots of Lucid VMs go wrong. This surfaced because we use VCB to make backups of our VMs, and this process includes making a quiesced snapshot. Done manually from the ESX console shows:

~ # time vim-cmd vmsvc/snapshot.create 128 TestSnap Blah 0 1
Create Snapshot:
Create snapshot failed
real 0m 16.29s
user 0m 0.23s
sys 0m 0.00s

The bad thing is that it leaves the VM in an unusable state, with lots of these on the console:

task xxx:123 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.

The VM has to be hard reset at this point.

The tools from the VMware provided tar installer (VMwareTools-8.3.2-257589.tar.gz) work fine, i.e. with those installed I am able to make a quiesced snapshot without problems:

~ # time vim-cmd vmsvc/snapshot.create 176 TestSnap Vmware_tar_tools 0 1
Create Snapshot:
real 0m 2.91s
user 0m 0.21s
sys 0m 0.00s

As a workaround, I have now reconfigured the backup regime to not use queiscing for Ubuntu Lucid VMs ("-Q 0" option for VCB's vcbmounter.exe).

Revision history for this message
dnmvisser (dnmvisser) wrote :

FYI, it might be totally bad practise, but I just copied the kernel modules from the host that has the official tar installer, to the host with Ubuntu open-vm-tools (essentially replacing /lib/modules/2.6.32-24-server/updates/dkms/*.ko ), and after rebooting everything seems to work fine, i.e. I was able to make a couple of quiesced snapshots.

I guess this means that the fault must be somewhere in the modules?

Revision history for this message
dnmvisser (dnmvisser) wrote :

Correction: copying modules does not work anymore on the latest kernel.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Given that there are 4 people listed as affected, marking Confirmed.

It sounds to me like maybe upstream changed something and there may need to be a backport of those changes into open-vm-tools.

Changed in open-vm-tools (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers