nova kilo->liberty ceph configdrive upgrade fails

Bug #1582684 reported by David Medberry
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
melanie witt
Liberty
Fix Released
High
melanie witt
Mitaka
Fix Released
High
melanie witt

Bug Description

Using CEPH RBD as our ephemeral drive led to an issue when upgrading from Kilo to Liberty. Our environment has "force_config_drive = True".

In Icehouse, Juno, and Kilo, this uses an ISO 9660 image created in /var/lib/nova/instances/$UUID/disk.config

However, in Liberty, if using CEPH RBD for ephemeral, there is a switch to putting this in rbe like this:

rbd:instances/${UUID}_disk.config

While this works GREAT for new VMs, it is problematic with existing VMs as not all transition states were considered. In particular, if you do a

nova stop $UUID

followed by a

nova start $UUID

you will find your instance still in the stopped state. There is something in the start code that ASSUMES that the new rbd format will be in place (but it doesn't actually create it.)

There is a work around if you find instances in that state, simply cold migrate them with

nova migrate $UUID

which redoes the config.drive plumbing and creates the rbd:instances/${UUID}_disk.config

Our permanent work around has been to prepopulate the rbd via a script though getting this bug fixed would be much better.

Liberty is a stable release and this is a loss of service type of bug so should get fixed. Not clear if this is also an issue (likely so) in Mitaka/Newton as we haven't got an environment yet to test it, but presumably with long running VMs from early config drive, it would also exist in Mitaka.

Specifics:
Liberty Nova
nova:12.0.2-38-g7bc3355.13.1b76006

CEPH:
0.94.6-1trusty

Host OS:
Ubuntu Trusty

Revision history for this message
David Medberry (med) wrote :
Revision history for this message
David Medberry (med) wrote :

Additionally, we are using ISO 9660 (default config drive format) not VFAT. Drive appears in our libvirt/kvm based implementation as /dev/sr0

Revision history for this message
David Medberry (med) wrote :
Revision history for this message
David Medberry (med) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah, looks like Michael even realized it was going to be a breaking change:

https://github.com/openstack/nova/blob/743d5efccaa99e3b4873831a8f43c216a31c7113/nova/virt/libvirt/driver.py#L2766

tags: added: ceph configdrive libvirt
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Matt Riedemann (mriedem)
summary: - nova kilo liberty ceph configdrive upgrade
+ nova kilo->liberty ceph configdrive upgrade fails
Revision history for this message
Matt Riedemann (mriedem) wrote :

Nevermind comment 5, that's not really related to this bug.

Revision history for this message
Matt Riedemann (mriedem) wrote :

David, are there errors in the n-cpu logs when you try to start the instance and the disk.config file isn't found?

Revision history for this message
David Medberry (med) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

So the original breaking change is probably here:

https://github.com/openstack/nova/blob/743d5efccaa99e3b4873831a8f43c216a31c7113/nova/virt/libvirt/driver.py#L3406-L3410

Which is using the new path, and that's used here:

https://github.com/openstack/nova/blob/743d5efccaa99e3b4873831a8f43c216a31c7113/nova/virt/libvirt/driver.py#L3406-L3410

Which is eventually passed down to launch the domain and that fails. Apparently there isn't an error, the instance just doesn't start. And there isn't an error because starting a stopped instance is a cast operation:

https://github.com/openstack/nova/blob/743d5efccaa99e3b4873831a8f43c216a31c7113/nova/compute/rpcapi.py#L788

Revision history for this message
David Medberry (med) wrote :

From libvirtd.log:

2016-05-17 18:47:33.653+0000: 118997: error : qemuProcessWaitForMonitor:2113 : internal error: process exited while connecting to monitor: 2016-05-17T18:47:33.551994Z qemu-system-x86_64: -drive file=rbd:instances/12fe8634-8ed3-452d-a78a-c67e2c690975_disk.config:id=volumes:key=AQBG409TCB5tEhAAPKaHlwHa82Vur4FK0WVPzg==:auth_supported=cephx\;none:mon_host=24.161.248.12\:6789\;24.161.248.13\:6789\;24.161.248.14\:6789,if=none,id=drive-ide0-1-1,readonly=on,format=raw,cache=writeback: error reading header from 12fe8634-8ed3-452d-a78a-c67e2c690975_disk.config

melanie witt (melwitt)
Changed in nova:
assignee: nobody → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/317785

Changed in nova:
status: Confirmed → In Progress
melanie witt (melwitt)
tags: added: liberty-backport-potential mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/325415

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/325428

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/317785
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38
Submitter: Jenkins
Branch: master

commit f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38
Author: melanie witt <email address hidden>
Date: Wed May 18 00:09:18 2016 +0000

    Fall back to flat config drive if not found in rbd

    Commit adecf780d3ed4315e4ce305cb1821d493650494b added support for
    storing config drives in rbd. Existing instances however still
    have config drives in the instance directory. If an existing
    instance is stopped, an attempt to start it again fails because the
    guest config is generated assuming a config drive location in rbd.

    This adds a fall back to the instance directory in the case of
    config drive and rbd if the image is not found in rbd.

    Closes-Bug: #1582684

    Change-Id: I21107ea0a148b66bee81e57cdce08e3006a60aee

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/325415
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cc5b96cc474a7b469316a4e1fc7fdbb92b029202
Submitter: Jenkins
Branch: stable/mitaka

commit cc5b96cc474a7b469316a4e1fc7fdbb92b029202
Author: melanie witt <email address hidden>
Date: Wed May 18 00:09:18 2016 +0000

    Fall back to raw config drive if not found in rbd

    Commit adecf780d3ed4315e4ce305cb1821d493650494b added support for
    storing config drives in rbd. Existing instances however still
    have config drives in the instance directory. If an existing
    instance is stopped, an attempt to start it again fails because the
    guest config is generated assuming a config drive location in rbd.

    This adds a fall back to the instance directory in the case of
    config drive and rbd if the image is not found in rbd.

    Conflicts:
        nova/tests/unit/virt/libvirt/test_driver.py
        nova/virt/libvirt/driver.py

        (Changed image_type from 'flat' to 'raw' and image.exists()
        to image.check_image_exists())

    Closes-Bug: #1582684

    Change-Id: I21107ea0a148b66bee81e57cdce08e3006a60aee
    (cherry picked from commit f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/325428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e4f550455e71b5132c451171f7bf14ec386dbf9c
Submitter: Jenkins
Branch: stable/liberty

commit e4f550455e71b5132c451171f7bf14ec386dbf9c
Author: melanie witt <email address hidden>
Date: Wed May 18 00:09:18 2016 +0000

    Fall back to raw config drive if not found in rbd

    Commit adecf780d3ed4315e4ce305cb1821d493650494b added support for
    storing config drives in rbd. Existing instances however still
    have config drives in the instance directory. If an existing
    instance is stopped, an attempt to start it again fails because the
    guest config is generated assuming a config drive location in rbd.

    This adds a fall back to the instance directory in the case of
    config drive and rbd if the image is not found in rbd.

    Conflicts:
     nova/tests/unit/virt/libvirt/test_driver.py
        nova/virt/libvirt/driver.py

        (Changed image_type from 'flat' to 'raw' and image.exists()
        to image.check_image_exists(), removed unit tests that
        don't exist on stable/liberty)

    Closes-Bug: #1582684

    Change-Id: I21107ea0a148b66bee81e57cdce08e3006a60aee
    (cherry picked from commit f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 12.0.4

This issue was fixed in the openstack/nova 12.0.4 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.1.0

This issue was fixed in the openstack/nova 13.1.0 release.

Revision history for this message
David Medberry (med) wrote :

thanks all, cool beans!

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.