instance snapshot creation failed: libvirtError: block copy still active: domain has active block copy job

Bug #1287047 reported by Yogev Rabl
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Noel Nelson Dsouza

Bug Description

Description of problem:
A snapshot of an instance is been created with the status of 'deleted'.
The instance was launched with an ISO image of RHEL 6.5 & with the following flavor configuration:
Flavor Name: m1.small
VCPUs: 1
RAM: 2048MB
Root Disk: 20
Ephemeral Disk: 40
Swap Disk: 0MB

The system topology is:
1. cloud controller with the Nova services installed (with nova network).
2. Glance stand alone server.
3. Cinder & Swift installed on the same server.
4. Nova compute stand alone.

Version-Release number of selected component (if applicable):
openstack-nova-conductor-2013.2.2-2.el6ost.noarch
openstack-nova-scheduler-2013.2.2-2.el6ost.noarch
python-django-openstack-auth-1.1.2-2.el6ost.noarch
openstack-dashboard-2013.2.2-1.el6ost.noarch
openstack-selinux-0.1.3-2.el6ost.noarch
openstack-packstack-2013.2.1-0.25.dev987.el6ost.noarch
openstack-keystone-2013.2.2-1.el6ost.noarch
openstack-nova-common-2013.2.2-2.el6ost.noarch
openstack-nova-api-2013.2.2-2.el6ost.noarch
openstack-nova-console-2013.2.2-2.el6ost.noarch
openstack-nova-network-2013.2.2-2.el6ost.noarch
openstack-nova-cert-2013.2.2-2.el6ost.noarch
openstack-dashboard-theme-2013.2.2-1.el6ost.noarch
redhat-access-plugin-openstack-4.0.0-0.el6ost.noarch
openstack-nova-compute-2013.2.2-2.el6ost.noarch
openstack-nova-novncproxy-2013.2.2-2.el6ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Launch an instance with an ISO image (with the same flavor configuration as above)
2. Install the OS of the ISO on the ephemeral disk.
3. After the installation is done & the OS is up take a snapshot of the instance.

Actual results:
The snapshot is been created in deleted status.

Expected results:
The snapshot should be available.

Logs are attached.

Revision history for this message
Yogev Rabl (yrabl) wrote :
tags: added: libvirt
Revision history for this message
Solly Ross (sross-7) wrote :

Question: does this happen on a fresh install? I know in the past we've had issues with the API showing a deleted X with the same id as a newly created X.

Changed in nova:
status: New → Incomplete
Revision history for this message
Yogev Rabl (yrabl) wrote :

It happen after a fresh install and on a system that had a week of uptime, I don't think the point is there.

Changed in nova:
status: Incomplete → New
Revision history for this message
Solly Ross (sross-7) wrote :

One more bit of information: can you indicate the commands that you use to create the VM? It will be useful when people attempt to reproduce and fix the issue.

Changed in nova:
importance: Undecided → High
status: New → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Yogev Rabl (yrabl) wrote :

I've launched the instance from the Horizon, so I recommend you'll do the following, in the Horizon:
1. Edit the small flavor and add the 40 GB to the ephemeral disk.
2. Launch the instance with that flavor.

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
Thang Pham (thang-pham) wrote :

I believe this is not working because a snapshot only saves the root disk and not the ephemeral disk. There would be no way to determine which disk contains the root filesystem from an OpenStack perspective, or control which disk the user installs the root filesystem on.

The logic in nova/virt/libvirt/driver.py snapshot() method calls nova/virt/libvirt/utils.py find_disk() to find the root disk. find_disk returns the first devices/filesystem/source in the domain XML tree, which would be always vda/hda, since it is the first in the matching element tree.

Revision history for this message
Dan Genin (daniel-genin) wrote :

I could not reproduce this bug in Devstack with the latest Juno code.

Steps taken:

1. Upload Debian net-inst ISO image into Glance
2. Modify tiny flavor to add an ephemeral disk (4GB)
3. Boot an instance with the uploaded ISO image
4. Install Debian in test instance
5. Take a snapshot: produces a valid snapshot (of the ISO image)
6. Reboot the instance into new Debian install
7. Take a snapshot: produces a valid snapshot (of the ISO image)

From your original bug description I see that you are using 2013 packages which are probably Icehouse or earlier. It is possible that the bug has been fixed in Juno. Although, that the snapshots are of the installation ISO image rather than the root disk is certainly suboptimal.

Revision history for this message
Joel Friedly (joelfriedly) wrote :

Just saw this issue. The codeblock that's raising the error is this one, which hasn't really changed since grizzly-eol:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1830:L1851

We're undefining the domain so that we can do a blockRebase operation, then we abort it when it's done and have a finally statement that redefines the domain. The redefine fails though because libvirt thinks there's still an active block copy job. My guess is that we're raising an exception inside the block sometime after the blockRebase gets kicked off, but this could also be a race condition within libvirt.

If I can get a repro, I'll try putting an except Exception around the block and logging any exceptions that get raised.

Changed in nova:
assignee: nobody → Noel Nelson Dsouza (noelnelson)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Noel Nelson Dsouza (noelnelson) wrote :

I am not able to reproduce above bug in Juno version.

When I tried to reproduce, Snapshot Image status is Active instead of ‘Deleted’.

I followed the same steps as mentioned above, to reproduce the bug.

 Image file : Ubuntu-14.04.1-desktop-amd64.iso .

If anyone is affected with this bug then will debug further otherwise this bug has to be marked as invalid

Regards
Noel Nelson Dsouza

Changed in nova:
status: In Progress → Invalid
Revision history for this message
haichuan (haichuan0227) wrote :

Seems like you're hitting an old bug[1] where 'blockcopy' (or
'blockcommit') missed to execute a cleanup routine which destroys a
reference to the active block operation -- resulting in the error you're
seeing when you attempted to 'abort' the block operation manually.

This bug is fixed in libvirt-1.2.8 and above. I see you're using
libvirt-1.2.7, if you can update libvirt in your environment, that
should fix your issue.

haichuan (haichuan0227)
tags: added: snapshot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.