Volume attachment may fail after rescuing instance on an image with different hw_disk_bus

Bug #1835926 reported by Alexandre arents
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Alexandre arents
Queens
Fix Released
Undecided
Alexandre arents
Rocky
Fix Committed
Undecided
Alexandre arents
Stein
Fix Committed
Undecided
Alexandre arents
Train
Fix Released
Undecided
Alexandre arents

Bug Description

Description
===========

Look likes rescue may update instances.root_device_name if rescue image has different disk bus (image property hw_disk_bus) than instance.
This introduce a mimatch between device name and driver used for instance:

During instance config generation, nova guess the disk bus driver according table instance_system_metadata.image_hw_disk_bus,
and get root device name from table instances.root_device_name.
Because of this mismatch, cinder attachment may failed with the following error message in compute log:
 unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device

Probable solution is to avoid rescue action to update instance.root_device_name

Steps to reproduce
==================

On a fresh master devstack:
openstack image save cirros-0.4.0-x86_64-disk --file /tmp/cirros-0.4.0-x86_64-disk.disk
#create a new image, but an scsi one:
openstack image create --container-format bare --disk-format qcow2 --file /tmp/cirros-0.4.0-x86_64-disk.disk --property hw_disk_bus='scsi' --property hw_scsi_model='virtio-scsi' cirros-0.4.0-x86_64-scsi-disk
#create instance with default virtio driver:
openstack server create --flavor m1.small --image cirros-0.4.0-x86_64-disk --nic net-id=private test
mysql> select root_device_name from instances where uuid='xxx'
/dev/vda
#rescue instance but with the scsi image:
$openstack server rescue xxxx --image cirros-0.4.0-x86_64-scsi-disk
mysql> select root_device_name from instances where uuid='xxx'
/dev/sda
$openstack server unrescue xxxx
# root_device_name is still on sda should be on vda according instance metadata
mysql> select root_device_name from instances where uuid='xxx'
/dev/sda
$virsh dumpxml instance-00000001 | grep "bus='virtio"
   <target dev='vda' bus='virtio'/>
   <alias name='virtio-disk0'/>
# at the next hard reboot new xml is generated with scsi device name BUT with virtio driver.
$openstack server reboot --hard xxx
$virsh dumpxml instance-00000001 | grep -A 1 "bus='virtio"
   <target dev='sda' bus='virtio'/>
   <alias name='virtio-disk0'/>
$openstack volume create --size 10 test
$openstack server add volume 1c9b1582-5fc7-417a-a8a0-387e8833731f 0621430c-b0d2-4cca-8868-f86f36f1ef29
$sudo journalctl -u <email address hidden> | grep Duplicate
Jul 05 09:29:54 alex-devstack-compute2 nova-compute[28285]: ERROR nova.virt.libvirt.driver [None req-38714989-4deb-4a05-bdfc-3418edbda7e3 demo demo] [instance: 1c9b1582-5fc7-417a-a8a0-387e8833731f] Failed to attach volume at mountpoint: /dev/vda: libvirtError: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device

Error probably comes from the fact that nova lookup for next availiable virtio device based on name, which is vda - virtio-disk0 (as root device is currently sda)
but because root device sda is already using virtio-disk0 it failed.

Expected result
===============
instance root_device_name should remain the same as before rescue/unrescue, regardless of image used for rescuing.

Actual result
=============
instance root_device_name is updated according the hw_disk_bus property for the image used during rescue(and never set back to original value)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/670000

Changed in nova:
assignee: nobody → Alexandre arents (aarents)
status: New → In Progress
tags: added: cinder libvirt volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/670000
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b
Submitter: Zuul
Branch: master

commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b
Author: Alexandre Arents <email address hidden>
Date: Tue Jul 9 16:13:01 2019 +0000

    Do not update root_device_name during guest config

    _get_guest_config() is currently updating instance.root_device_name
    and called in many ways like:

    _hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
    finish_revert_migration()

    It is an issue because root_device_name is initally set during instance
    build and should remain the same after:

    manager.py: _do_build_and_run_instance()
                 ..
                   _default_block_device_names() <-here
                   ..
                   driver.spawn()

    This may lead to edge case, like in rescue where this value can be mistakenly
    updated to reflect disk bus property of rescue image (hw_disk_bus).
    Further more, a _get* method should not modify instance object.

    Note that test test_get_guest_config_bug_1118829 is removed because no more
    relevant with current code.

    Change-Id: I1787f9717618d0837208844e8065840d30341cf7
    Closes-Bug: #1835926

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/696339

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/696351

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/696353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/696469

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/696339
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e858d0cbd672639318543201e251ed00324a9c2
Submitter: Zuul
Branch: stable/train

commit 5e858d0cbd672639318543201e251ed00324a9c2
Author: Alexandre Arents <email address hidden>
Date: Tue Jul 9 16:13:01 2019 +0000

    Do not update root_device_name during guest config

    _get_guest_config() is currently updating instance.root_device_name
    and called in many ways like:

    _hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
    finish_revert_migration()

    It is an issue because root_device_name is initally set during instance
    build and should remain the same after:

    manager.py: _do_build_and_run_instance()
                 ..
                   _default_block_device_names() <-here
                   ..
                   driver.spawn()

    This may lead to edge case, like in rescue where this value can be mistakenly
    updated to reflect disk bus property of rescue image (hw_disk_bus).
    Further more, a _get* method should not modify instance object.

    Note that test test_get_guest_config_bug_1118829 is removed because no more
    relevant with current code.

    Change-Id: I1787f9717618d0837208844e8065840d30341cf7
    Closes-Bug: #1835926
    (cherry picked from commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/696351
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9f9f8d330a50aec188e55ab8ae921db710e6cc83
Submitter: Zuul
Branch: stable/stein

commit 9f9f8d330a50aec188e55ab8ae921db710e6cc83
Author: Alexandre Arents <email address hidden>
Date: Tue Jul 9 16:13:01 2019 +0000

    Do not update root_device_name during guest config

    _get_guest_config() is currently updating instance.root_device_name
    and called in many ways like:

    _hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
    finish_revert_migration()

    It is an issue because root_device_name is initally set during instance
    build and should remain the same after:

    manager.py: _do_build_and_run_instance()
                 ..
                   _default_block_device_names() <-here
                   ..
                   driver.spawn()

    This may lead to edge case, like in rescue where this value can be mistakenly
    updated to reflect disk bus property of rescue image (hw_disk_bus).
    Further more, a _get* method should not modify instance object.

    Note that test test_get_guest_config_bug_1118829 is removed because no more
    relevant with current code.

    Change-Id: I1787f9717618d0837208844e8065840d30341cf7
    Closes-Bug: #1835926
    (cherry picked from commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b)
    (cherry picked from commit 5e858d0cbd672639318543201e251ed00324a9c2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/696353
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c075e3a76d07b8d8ccf201756810567ddf04db60
Submitter: Zuul
Branch: stable/rocky

commit c075e3a76d07b8d8ccf201756810567ddf04db60
Author: Alexandre Arents <email address hidden>
Date: Tue Jul 9 16:13:01 2019 +0000

    Do not update root_device_name during guest config

    _get_guest_config() is currently updating instance.root_device_name
    and called in many ways like:

    _hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
    finish_revert_migration()

    It is an issue because root_device_name is initally set during instance
    build and should remain the same after:

    manager.py: _do_build_and_run_instance()
                 ..
                   _default_block_device_names() <-here
                   ..
                   driver.spawn()

    This may lead to edge case, like in rescue where this value can be mistakenly
    updated to reflect disk bus property of rescue image (hw_disk_bus).
    Further more, a _get* method should not modify instance object.

    Note that test test_get_guest_config_bug_1118829 is removed because no more
    relevant with current code.

    Change-Id: I1787f9717618d0837208844e8065840d30341cf7
    Closes-Bug: #1835926
    (cherry picked from commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b)
    (cherry picked from commit 5e858d0cbd672639318543201e251ed00324a9c2)
    (cherry picked from commit 9f9f8d330a50aec188e55ab8ae921db710e6cc83)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/696469
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4f4b5bd545b189ae1ef6e22c2fe7d08e4e9b402c
Submitter: Zuul
Branch: stable/queens

commit 4f4b5bd545b189ae1ef6e22c2fe7d08e4e9b402c
Author: Alexandre Arents <email address hidden>
Date: Tue Jul 9 16:13:01 2019 +0000

    Do not update root_device_name during guest config

    _get_guest_config() is currently updating instance.root_device_name
    and called in many ways like:

    _hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
    finish_revert_migration()

    It is an issue because root_device_name is initally set during instance
    build and should remain the same after:

    manager.py: _do_build_and_run_instance()
                 ..
                   _default_block_device_names() <-here
                   ..
                   driver.spawn()

    This may lead to edge case, like in rescue where this value can be mistakenly
    updated to reflect disk bus property of rescue image (hw_disk_bus).
    Further more, a _get* method should not modify instance object.

    Note that test test_get_guest_config_bug_1118829 is removed because no more
    relevant with current code.

    Conflicts:
     nova/virt/libvirt/driver.py
        NOTE: conflict is due to small comment removal patch:
            I08916cf57d50f766126a99a479d79a27a1bca36f

    Change-Id: I1787f9717618d0837208844e8065840d30341cf7
    Closes-Bug: #1835926
    (cherry picked from commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b)
    (cherry picked from commit 5e858d0cbd672639318543201e251ed00324a9c2)
    (cherry picked from commit 9f9f8d330a50aec188e55ab8ae921db710e6cc83)
    (cherry picked from commit c075e3a76d07b8d8ccf201756810567ddf04db60)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.1.0

This issue was fixed in the openstack/nova 20.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.1.0

This issue was fixed in the openstack/nova 19.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.3.0

This issue was fixed in the openstack/nova 18.3.0 release.

Revision history for this message
Olivier Chaze (o.chaze) wrote :

Hi,

Hi,

Could it be that this fix introduced the behavior below?

1. Build a VM OS Debian : instances.root_device_name = /dev/sda
2. Attach a volume to the instance
3. Rebuid this VM with OS Ubuntu : instances.root_device_name is still /dev/sda while in the VM itself it is now /dev/vda
4. The volume is no longer attached and fails with

Failed to attach f3c12921-66ab-47b4-99dc-ed2dc67c32ba at /dev/vda: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device

This fixes the issue :
MariaDB [novadb]> UPDATE instances SET root_device_name="/dev/vda" WHERE uuid='f6a607aa-7142-47ba-a87a-e027c0b58dbb' LIMIT 1;

Revision history for this message
Alexandre arents (aarents) wrote :

Hi Olivier,

I can confirm this behavior in master devstack,
I guess your Debian image have properties: hw_disk_bus='scsi' hw_scsi_model='virtio-scsi'
And Ubuntu don't have it (default virtio)?
Maybe proper way to fix that is to add an additional step in rebuild code to update root_device_name.
It must be tracked in another bug I think.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.