configdrive is lost after resize.(libvirt driver)

Bug #1558343 reported by Jerry Zhao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matthew Booth
Kilo
Won't Fix
High
Unassigned
Liberty
Fix Released
High
Matt Riedemann
Mitaka
Fix Released
High
John Garbutt

Bug Description

Used the trunk code as of 2016/03/16
my environment disabled metadata agent and forced the use of config drive.

console log before resize: http://paste.openstack.org/show/490825/
console log after resize: http://paste.openstack.org/show/490824/

qemu 18683 1 4 18:40 ? 00:00:32 /usr/bin/qemu-system-x86_64 -name instance-00000002 -S -machine pc-i440fx-2.0,accel=tcg,usb=off -m 128 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 018892c7-8144-49c0-93d2-79ee83efd6a9 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=13.0.0,serial=16c127e2-6369-4e19-a646-251a416a8dcd,uuid=018892c7-8144-49c0-93d2-79ee83efd6a9,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/018892c7-8144-49c0-93d2-79ee83efd6a9/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/opt/stack/data/nova/instances/018892c7-8144-49c0-93d2-79ee83efd6a9/disk.config,if=none,id=drive-ide0-1-1,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=23,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:34:d6:f3,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/opt/stack/data/nova/instances/018892c7-8144-49c0-93d2-79ee83efd6a9/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -vnc 127.0.0.1:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on

$ blkid
/dev/vda1: LABEL="cirros-rootfs" UUID="d42bb4a4-04bb-49b0-8821-5b813116b17b" TYPE="ext3"
$

another vm without resize:
$ blkid
/dev/vda1: LABEL="cirros-rootfs" UUID="d42bb4a4-04bb-49b0-8821-5b813116b17b" TYPE="ext3"
/dev/sr0: LABEL="config-2" TYPE="iso9660"
$

Tags: libvirt
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

confirmed by mdbooth

Changed in nova:
importance: Undecided → High
status: New → Confirmed
tags: added: libvirt mitaka-rc-potential
Revision history for this message
Matthew Booth (mbooth-9) wrote :

This is a bit messy. I think this would have regressed when this merged a few days ago:

https://review.openstack.org/#/c/288640/

Note that this has also been backported to liberty, so this bug almost certainly also now exists there. The problem is there were 2 bugs. The first, and most serious, is here:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7387-L7388

This converts disks from raw to qcow2, including the config disk. As described in the comment above, though, we can't turn this off for all disks as this would open a severe security bug.

The second bug is here:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3256-L3267

This unconditionally overwrites the config disk.

Until https://review.openstack.org/#/c/288640/ merged, the second bug cancelled out the effect of the first, because it unconditionally overwrote the erroneously converted qcow2 file with a new, raw file. However, after it merged, the order of these 2 bugs was reversed, so now it overwrites the old config disk and converts it to qcow2. The result is that we're presenting a qcow2 file as raw, which is obviously corrupt (but not a security bug).

The place to fix it is probably here:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7345-L7347

Note that the comment is wrong, and only iterating over 'disk' and 'disk.local' would open the security bug described below if the instance has multiple ephemeral disks. Also note that the code is already broken if resizing an instance with multiple ephemeral disks, but not in a way which opens a security bug. It's probably going to be easiest just to filter out 'disk.config'. I'll look at this in more detail tomorrow.

Prateek Arora (parora)
Changed in nova:
assignee: nobody → Prateek Arora (parora)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/294560

Changed in nova:
assignee: Prateek Arora (parora) → Matthew Booth (mbooth-9)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/294560
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=29fb0aca1623eb0c4a565d728501667a4c5e8136
Submitter: Jenkins
Branch: master

commit 29fb0aca1623eb0c4a565d728501667a4c5e8136
Author: Matthew Booth <email address hidden>
Date: Fri Mar 18 12:12:51 2016 +0000

    Fix conversion of config disks to qcow2 during resize/migration

    finish_migration contains some code which converts raw disks to qcow2
    when moving from a host which uses raw disks to a host which uses
    qcow2 disks. This was erroneously also converted config disks, which
    are hard-coded to be raw even when other disks are qcow2.

    This has always been broken, but it wasn't previously exposed because
    a subsequent bug unconditionally overwrote the qcow2 config disk with
    a new, raw one. Change I03e08fae97416ebe5cdedcf238a06d1b90203c5d
    changed the order of these 2 events, so that the erroneously-converted
    config drive was no longer overwritten, resulting in a qcow2 file
    being presented to the instance as raw.

    This change explicitly filters out config disks from conversion.

    Change-Id: I6bf3cd4f9e0e152bf69732d9a17f93c86dedbd40
    Closes-Bug: #1558343

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/295229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/295583

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/295229
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=23a202eee87a14b152e25eac1f9b6a0aa14a2bd3
Submitter: Jenkins
Branch: stable/mitaka

commit 23a202eee87a14b152e25eac1f9b6a0aa14a2bd3
Author: Matthew Booth <email address hidden>
Date: Fri Mar 18 12:12:51 2016 +0000

    Fix conversion of config disks to qcow2 during resize/migration

    finish_migration contains some code which converts raw disks to qcow2
    when moving from a host which uses raw disks to a host which uses
    qcow2 disks. This was erroneously also converted config disks, which
    are hard-coded to be raw even when other disks are qcow2.

    This has always been broken, but it wasn't previously exposed because
    a subsequent bug unconditionally overwrote the qcow2 config disk with
    a new, raw one. Change I03e08fae97416ebe5cdedcf238a06d1b90203c5d
    changed the order of these 2 events, so that the erroneously-converted
    config drive was no longer overwritten, resulting in a qcow2 file
    being presented to the instance as raw.

    This change explicitly filters out config disks from conversion.

    Change-Id: I6bf3cd4f9e0e152bf69732d9a17f93c86dedbd40
    Closes-Bug: #1558343
    (cherry picked from commit 29fb0aca1623eb0c4a565d728501667a4c5e8136)

Matt Riedemann (mriedem)
tags: removed: mitaka-rc-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0rc2

This issue was fixed in the openstack/nova 13.0.0.0rc2 release candidate.

Revision history for this message
Charlotte Han (hanrong) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/295583
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=de75afc5c5077651770c5a5fa85afe916ca46f50
Submitter: Jenkins
Branch: stable/liberty

commit de75afc5c5077651770c5a5fa85afe916ca46f50
Author: Matthew Booth <email address hidden>
Date: Fri Mar 18 12:12:51 2016 +0000

    Fix conversion of config disks to qcow2 during resize/migration

    finish_migration contains some code which converts raw disks to qcow2
    when moving from a host which uses raw disks to a host which uses
    qcow2 disks. This was erroneously also converted config disks, which
    are hard-coded to be raw even when other disks are qcow2.

    This has always been broken, but it wasn't previously exposed because
    a subsequent bug unconditionally overwrote the qcow2 config disk with
    a new, raw one. Change I03e08fae97416ebe5cdedcf238a06d1b90203c5d
    changed the order of these 2 events, so that the erroneously-converted
    config drive was no longer overwritten, resulting in a qcow2 file
    being presented to the instance as raw.

    This change explicitly filters out config disks from conversion.

    Conflicts:
            nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(mriedem): The test conflict is due to fbe31e461 not being in
    stable/liberty (converting legacy image_meta dicts to ImageMeta
    objects).

    Change-Id: I6bf3cd4f9e0e152bf69732d9a17f93c86dedbd40
    Closes-Bug: #1558343
    (cherry picked from commit 29fb0aca1623eb0c4a565d728501667a4c5e8136)
    (cherry picked from commit 23a202eee87a14b152e25eac1f9b6a0aa14a2bd3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/302578

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (11.5 KiB)

Reviewed: https://review.openstack.org/302578
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8ebbebd4ee0c3bb1452ea32f92e1588a6b35067
Submitter: Jenkins
Branch: master

commit 7105f888ee1f52d2a462fc0ece3130dc0d3d49f5
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Mar 31 06:28:06 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Ibe5d4d38834fbcb99c0332d3375659a21d94154e

commit 5de98cb2de2eca3d061488c55f96e6f7c9bc56a8
Author: OpenStack Proposal Bot <email address hidden>
Date: Wed Mar 30 06:41:25 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Ia46d661560b1141c1c1522c9477c510d28a0d0e7

commit a9d55427b6e8d2472088e3d40a8a5151ce408283
Author: Moshe Levi <email address hidden>
Date: Wed Mar 23 10:59:04 2016 +0200

    Fix detach SR-IOV when using LibvirtConfigGuestHostdevPCI

    This patch fixes an issue which was introduced by this
    change If3edc1965c01a077eb61984a442e0d778d870d75.
    Usually the vif config is of type LibvirtConfigGuestInterface,
    but some vif use LibvirtConfigGuestHostdevPCI config
    (e.g. the ib_hostdev). The difference is that
    LibvirtConfigGuestInterface keeps the pci address in source_dev
    while LibvirtConfigGuestHostdevPCI has domain, bus, slot and
    function, instead of relying on the vif config type we can take the
    pci address for the neutron port.

    Closes-Bug: #1560860

    Change-Id: I62a7ff16f1c9c5da923451520fbeeabb5cc0c5c6
    (cherry picked from commit f15d9a9693b19393fcde84cf4bc6f044d39ffdca)

commit 5b6ee702df7ad901f68bec2ed8d43b66aa6d98c1
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Mar 29 06:37:30 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Iad0e42a18bd3a7dcf216b4df17b9893e13382efe

commit 29042e06f7e570bd13607b62b997a6ae21db80c5
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Mar 28 06:34:19 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: If159133a2e32c6ef53ba104751a3eb054a95b733

commit 3e9819dab8249ec9993b0b9874e80a78f2ed1754
Author: Matt Riedemann <email address hidden>
Date: Sun Mar 27 19:31:32 2016 -0400

    Update cells blacklist regex for test_server_basic_ops

    Tempest change 9bee3b92f1559cb604c8bd74dcca57805a85a97a
    renamed a test in our blacklist so update the filter to
    handle the old and new name.

    The Tempest team is hesitant to revert the change so we
    should handle it ourselves and eventually move to using
    test uuids for our blacklist, but there might need to
    be work in devstack-gate for that fi...

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 12.0.3

This issue was fixed in the openstack/nova 12.0.3 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 14.0.0.0b1

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.