OpenStack Compute (Nova)

Stop/start a KVM instance with volumes attached produces an error state

Reported by Ryan Finnie on 2012-06-15
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Vish Ishaya
Essex
High
Dave Walker
nova (Ubuntu)
High
Unassigned
Precise
High
Unassigned
Quantal
High
Unassigned

Bug Description

When a running instance with an attached volume is stopped and then started, the instance refuses to boot and goes into an error state. This appears to be caused by nova-compute incorrectly building the libvirt.xml file.

2012-06-15 03:54:14 TRACE nova.compute.manager [instance: 972b355b-21cc-4ca8-bbb7-67af1bf2ee7f] libvirtError: internal error Invalid harddisk device name: /dev/vdz

A log of the start and the generated libvirt.xml are attached. In particular:

            <disk type='block'>
                     <driver name='qemu' type='raw' cache='none'/>
                     <source dev='/dev/disk/by-path/ip-10.55.61.15:3260-iscsi-iqn.2010-10.org.openstack:volume-00000053-lun-1'/>
                     <target dev='/dev/vdz' bus='virtio'/>
                 </disk>

libvirt expects <target dev='vdz' bus='virtio'/>, and refuses to boot otherwise. Issuing a reboot to the instance does not trigger this (and the XML file is not even updated with the volume, presumably because the process is simply given a hard reboot order).

Version: 2012.1-0ubuntu2.2 (with SRU patch from 2012.1-0ubuntu2.3 manually applied)

Ryan Finnie (fo0bar) wrote :
Ryan Finnie (fo0bar) wrote :
Thierry Carrez (ttx) on 2012-06-18
Changed in nova:
importance: Undecided → High
status: New → Confirmed
tags: added: volume
James Page (james-page) on 2012-06-21
Changed in nova (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in nova (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → High
Chuck Short (zulcss) wrote :

So I was able to reproduce this in Essex but not folsom. Im thinking that the following commit fixed this:

https://github.com/openstack/nova/commit/ae878fc8b9761d099a4145617e4a48cbeb390623

However I dont think its apprioate to backport this fix. So im looking at alternatives.

chuck

Mark McLoughlin (markmc) wrote :

In LibvirtConnection.attach_volume() we do:

        mount_device = mountpoint.rpartition("/")[2]
        xml = self.volume_driver_method('connect_volume',
                                        connection_info,
                                        mount_device)

in _prepare_xml_info() we do:

            mountpoint = vol['mount_device']
            xml = self.volume_driver_method('connect_volume',
                                            connection_info,
                                            mountpoint)

Pretty confident we just need to do:

 - mountpoint = vol['mount_device']
 + mountpoint = vol['mount_device'].rpartition("/")[2]

Mark McLoughlin (markmc) wrote :

Hmm, this doesn't jive with the idea that it's Essex specific since we basically do the same thing in Folsom

Fix proposed to branch: master
Review: https://review.openstack.org/8823

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: Confirmed → In Progress
tags: added: rls-q-incoming

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/8950

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/8823
Committed: http://github.com/openstack/nova/commit/96c86336c69b9d456e43234e3fe315bd3b101045
Submitter: Jenkins
Branch: master

commit 96c86336c69b9d456e43234e3fe315bd3b101045
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jun 21 13:25:57 2012 -0700

    Call libvirt_volume_driver with right mountpoint

     * fixes bug 1013782
     * includes failing test
     * fixes tests for live migration

    Change-Id: I8f95c6baa7aad878af19d5d8b8b34531a4a43885

Reviewed: https://review.openstack.org/8950
Committed: http://github.com/openstack/nova/commit/33c2575ebf9c8022521d36f51b9b31cd41f7f74f
Submitter: Jenkins
Branch: stable/essex

commit 33c2575ebf9c8022521d36f51b9b31cd41f7f74f
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jun 21 13:25:57 2012 -0700

    Call libvirt_volume_driver with right mountpoint

     * fixes bug 1013782
     * fixes tests for live migration
     * removed test which doesn't apply

    (cherry picked from commit 96c86336c69b9d456e43234e3fe315bd3b101045)

    Change-Id: I8f95c6baa7aad878af19d5d8b8b34531a4a43885

Thierry Carrez (ttx) on 2012-07-04
Changed in nova:
milestone: none → folsom-2
status: Fix Committed → Fix Released
Chuck Short (zulcss) on 2012-08-21
Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Released

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/8823
Stable review: https://review.openstack.org/8950

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

Read more...

Changed in nova (Ubuntu Precise):
status: Confirmed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-2 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers