Stop/start a KVM instance with volumes attached produces an error state

Bug #1013782 reported by Ryan Finnie
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya
Essex
Fix Released
High
Dave Walker
nova (Ubuntu)
Fix Released
High
Unassigned
Precise
Fix Released
High
Unassigned
Quantal
Fix Released
High
Unassigned

Bug Description

When a running instance with an attached volume is stopped and then started, the instance refuses to boot and goes into an error state. This appears to be caused by nova-compute incorrectly building the libvirt.xml file.

2012-06-15 03:54:14 TRACE nova.compute.manager [instance: 972b355b-21cc-4ca8-bbb7-67af1bf2ee7f] libvirtError: internal error Invalid harddisk device name: /dev/vdz

A log of the start and the generated libvirt.xml are attached. In particular:

            <disk type='block'>
                     <driver name='qemu' type='raw' cache='none'/>
                     <source dev='/dev/disk/by-path/ip-10.55.61.15:3260-iscsi-iqn.2010-10.org.openstack:volume-00000053-lun-1'/>
                     <target dev='/dev/vdz' bus='virtio'/>
                 </disk>

libvirt expects <target dev='vdz' bus='virtio'/>, and refuses to boot otherwise. Issuing a reboot to the instance does not trigger this (and the XML file is not even updated with the volume, presumably because the process is simply given a hard reboot order).

Version: 2012.1-0ubuntu2.2 (with SRU patch from 2012.1-0ubuntu2.3 manually applied)

Related branches

Revision history for this message
Ryan Finnie (fo0bar) wrote :
Revision history for this message
Ryan Finnie (fo0bar) wrote :
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
tags: added: volume
James Page (james-page)
Changed in nova (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in nova (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Chuck Short (zulcss) wrote :

So I was able to reproduce this in Essex but not folsom. Im thinking that the following commit fixed this:

https://github.com/openstack/nova/commit/ae878fc8b9761d099a4145617e4a48cbeb390623

However I dont think its apprioate to backport this fix. So im looking at alternatives.

chuck

Revision history for this message
Mark McLoughlin (markmc) wrote :

In LibvirtConnection.attach_volume() we do:

        mount_device = mountpoint.rpartition("/")[2]
        xml = self.volume_driver_method('connect_volume',
                                        connection_info,
                                        mount_device)

in _prepare_xml_info() we do:

            mountpoint = vol['mount_device']
            xml = self.volume_driver_method('connect_volume',
                                            connection_info,
                                            mountpoint)

Pretty confident we just need to do:

 - mountpoint = vol['mount_device']
 + mountpoint = vol['mount_device'].rpartition("/")[2]

Revision history for this message
Mark McLoughlin (markmc) wrote :

Hmm, this doesn't jive with the idea that it's Essex specific since we basically do the same thing in Folsom

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/8823

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: Confirmed → In Progress
tags: added: rls-q-incoming
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/8950

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/8823
Committed: http://github.com/openstack/nova/commit/96c86336c69b9d456e43234e3fe315bd3b101045
Submitter: Jenkins
Branch: master

commit 96c86336c69b9d456e43234e3fe315bd3b101045
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jun 21 13:25:57 2012 -0700

    Call libvirt_volume_driver with right mountpoint

     * fixes bug 1013782
     * includes failing test
     * fixes tests for live migration

    Change-Id: I8f95c6baa7aad878af19d5d8b8b34531a4a43885

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/8953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/essex)

Reviewed: https://review.openstack.org/8950
Committed: http://github.com/openstack/nova/commit/33c2575ebf9c8022521d36f51b9b31cd41f7f74f
Submitter: Jenkins
Branch: stable/essex

commit 33c2575ebf9c8022521d36f51b9b31cd41f7f74f
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jun 21 13:25:57 2012 -0700

    Call libvirt_volume_driver with right mountpoint

     * fixes bug 1013782
     * fixes tests for live migration
     * removed test which doesn't apply

    (cherry picked from commit 96c86336c69b9d456e43234e3fe315bd3b101045)

    Change-Id: I8f95c6baa7aad878af19d5d8b8b34531a4a43885

Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-2
status: Fix Committed → Fix Released
Chuck Short (zulcss)
Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Released
Revision history for this message
Adam Gandelman (gandelman-a) wrote : Verification report.

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/8823
Stable review: https://review.openstack.org/8950

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

Read more...

Changed in nova (Ubuntu Precise):
status: Confirmed → Fix Released
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-2 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.