OpenStack Compute (nova)

inconsitent virtual size in qcow base image after block-migration

Bug #1237683 reported by Blair Bethwaite on 2013-10-09

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Unassigned

Bug Description

We're running a Grizzly node using KVM (1.0 from cloud-archive) with local ephemeral instance storage.

Since approximately the time we upgraded to Grizzly we've been receiving complaints from particular users about secondary disk corruption issues. These users in particular are noticing the issue because they are relying on the secondary drive and also because hey are using CentOS, which drops to an interactive prompt before completing boot if it cannot mount all filesystems (Ubuntu does not).

We've since discovered that this is specifically linked to block-migration of such disks which were created and formatted automatically by Nova. I.e., if we launch a new instance, log in and then reformat the drive internally (even as ext3), we don't encounter corruption issues after live-migration. If we change the virt_mkfs config option to use mkfs.ext4 then we also don't have the problem. Unfortunately that's not a simple fix for an active production cloud because all existing backing files must be removed in order to force their recreation.

In investigating the problem we noticed a behaviour that might be interrelated - after block-migration the instances secondary disk has a "generic" backing file instances/_base/ephemeral, as opposed to the backing file it was created with on the origin host, e.g., instances/_base/ephemeral_30_default.
These backing files have different virtual sizes(!):
$ qemu-img info _base/ephemeral
image: _base/ephemeral
file format: raw
virtual size: 2.0G (2147483648 bytes)
disk size: 778M
$ qemu-img info _base/ephemeral_30_default
image: _base/ephemeral_30_default
file format: raw
virtual size: 30G (32212254720 bytes)
disk size: 614M

We're no experts on qcow, but this looks like it could be problematic and may explain the corruption issues we're seeing - I can imagine there would be problems for a migrated guest that attempts to read a previously untouched sector beyond the size of the new backing file.

Tags:

Revision history for this message

Sam Morrison (sorrison) wrote on 2013-10-17:

Part of this was fixed by bug 1195877

It now creates the backing file for the ephemeral in the right place.

But more seriously instead of creating a new empty ephemeral backing file it actually uses the instances image from glance as the ephemeral backing file too.

Hope that makes sense!

Mathew Odden (locke105) on 2013-11-14

tags:

added: libvirt

Michael Still (mikal) on 2013-12-30

Changed in nova:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2014-10-21:

Ping, bug triaging here.

Blair, do you have any update (maybe you've done some testing with newer versions of OpenStack in a test environment).

Also, it'd be a lot more helpful, if you can provide some high-level reproducer commands for the block-migration, so others triaging the bugs without all the context can try to reproduce it.

Revision history for this message

Blair Bethwaite (blair-bethwaite) wrote on 2014-10-21:

Hi Kashyap,

ACK on the reproducibility context.

The behaviour is gone since Havana and if I recall correctly we backported the fix for https://bugs.launchpad.net/bugs/1195877, which fixed it for Grizzly at the time. 1195877 is tagged for grizzly backport potential - that's moot now I suppose.

Cheers!

Revision history for this message

Sean Dague (sdague) wrote on 2015-03-30:

Addressed in Havana

Changed in nova:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.