libvirt hypervisor version checks are insufficient

Bug #1193146 reported by Rafi Khardalian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Rafi Khardalian

Bug Description

This impacts master and Grizzly.

With Grizzly, we implemented functionality ("live snapshots") in the libvirt driver to make use of new blockcopy/blockpull operations which became available in Qemu 1.0.3. The determination as to whether to attempt a live or cold (legacy) snapshot is based on libvirt's response to the getVersion() API call. The fundamental flaw in our approach is that libvirt will return the *current* version of Qemu, despite the fact that instances which were already running prior to upgrading Qemu are still running the older version. As such, we end up satisfying version requirements to utilize new functionality, when it isn't actually available until a given instance is restarted on the new version of Qemu.

To reproduce the problem:

    1. Install QEMU < before 1.0.3.

    2. Launch an instance ("instanceA"), and leave this instance running.

    3. Upgrade Qemu to 1.0.3 (available via UCA) and restart libvirt.

    4. Launch another instance ("InstanceB").

    5. Attempt a snapshot of InstanceA. This will fail with an exception.

    6. Attempt a snapshot of InstanceB. It will succeed.

#5 fails because the instance is actually running the version of Qemu which was installed in with step #1. Checking getVersion() via libvirt is not sufficient in this case, we need to know what version any given instance is running before attempting to use newer functionality.

Changed in nova:
assignee: nobody → Rafi Khardalian (rkhardalian)
Changed in nova:
status: New → In Progress
Revision history for this message
Rafi Khardalian (rkhardalian) wrote :
Download full text (4.4 KiB)

2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 433, in _process_data
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp **args)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 98, in wrapped
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 75, in wrapped
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 215, in decorated_function
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp pass
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 201, in decorated_function
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 243, in decorated_function
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 230, in decorated_function
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 1932, in snapshot_instance
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp self.driver.snapshot(context, instance, image_id, update_task_state)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1248, in snapshot
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp image_format)
2013-06-21 18:28:39.998 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/libvirt/driver.py", l...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/33758
Committed: http://github.com/openstack/nova/commit/c6a120417e68c7423ad4898eb5e0567e0f22e0f3
Submitter: Jenkins
Branch: master

commit c6a120417e68c7423ad4898eb5e0567e0f22e0f3
Author: Rafi Khardalian <email address hidden>
Date: Fri Jun 21 23:02:03 2013 +0000

    Perform additional check before live snapshotting

    Bug 1193146

    Move blockJobAbort() out of the _live_snapshot function, such that
    it continues to terminate existing jobs, while also doubling as a
    method for confirming that our version of Qemu/KVM is new enough to
    execute a _live_snapshot.

    Change-Id: Ife5d2fd768a34dabf25a1bfc24e54bd6db762c89

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/740335

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/740335
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5964d7e11cdd67ae0e1af2382465d80b8531a94b
Submitter: Zuul
Branch: master

commit 5964d7e11cdd67ae0e1af2382465d80b8531a94b
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 9 16:52:19 2020 +0100

    libvirt: Remove workaround for really old QEMU

    QEMU < 1.0.3 did not support live snapshots. Bug #1193146 noted that for
    this to be used, the version of the QEMU that the instance is running is
    important, not the version that it was created with. To test this, it
    used a (duplicated) call to 'abort_job' that verified the newer versions
    of QEMU was in use. There can't be any instances still in the wild using
    this version of QEMU, since those users would have had to update their
    OS (the last version of Ubuntu to provide a suitably old QEMU was 12.04
    [1], which is very much EOL), meaning a reboot of the host and possible
    live-migration of instances to another host, both of which would result
    in a newer process.

    [1] https://launchpad.net/ubuntu/precise/+package/qemu-kvm

    Change-Id: Ic55d2ae49a1ae3aefd986bd1f52c76e022fb8ee1
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1193146

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.