problem specifying multiple "bus=scsi" block devices on nova boot

Bug #1792077 reported by Chris Friesen on 2018-09-12
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
melanie witt
Pike
Medium
melanie witt
Queens
Medium
melanie witt
Rocky
Medium
melanie witt
Stein
Medium
melanie witt

Bug Description

I'm using devstack stable/rocky on ubuntu 16.04.

When running this command

nova boot --flavor m1.small --nic net-name=public --block-device source=image,id=24e8e922-2687-48b5-a895-3134a650e00f,dest=volume,size=2,bootindex=0,shutdown=remove,bus=scsi --block-device source=blank,dest=volume,size=2,bootindex=1,shutdown=remove,bus=scsi --poll twovol

the instance fails to boot with the error:

libvirtError: unsupported configuration: Found duplicate drive address for disk with target name 'sda' controller='0' bus='0' target='0' unit='0'

For some background information, this works:

nova boot --flavor m1.small --nic net-name=public --block-device source=image,id=24e8e922-2687-48b5-a895-3134a650e00f,dest=volume,size=2,bootindex=0,shutdown=remove,bus=scsi --poll onevol

It also works if I have two block devices but don't specify "bus=scsi":

nova boot --flavor m1.small --nic net-name=public --block-device source=image,id=24e8e922-2687-48b5-a895-3134a650e00f,dest=volume,size=2,bootindex=0,shutdown=remove --block-device source=blank,dest=volume,size=2,bootindex=1,shutdown=remove --poll twovolnoscsi

This maps to the following XML:

Sep 12 05:05:22 devstack nova-compute[3062]: <devices>
Sep 12 05:05:22 devstack nova-compute[3062]: <disk type="block" device="disk">
Sep 12 05:05:22 devstack nova-compute[3062]: <driver name="qemu" type="raw" cache="none" io="native"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <source dev="/dev/sda"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <target bus="virtio" dev="vda"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <serial>f16cb93d-7bf0-4da7-a804-b9539d64576a</serial>
Sep 12 05:05:22 devstack nova-compute[3062]: </disk>
Sep 12 05:05:22 devstack nova-compute[3062]: <disk type="block" device="disk">
Sep 12 05:05:22 devstack nova-compute[3062]: <driver name="qemu" type="raw" cache="none" io="native"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <source dev="/dev/sdb"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <target bus="virtio" dev="vdb"/>
Sep 12 05:05:22 devstack nova-compute[3062]: <serial>7d5de2b0-cb66-4607-a5f5-60fd40db51c3</serial>
Sep 12 05:05:22 devstack nova-compute[3062]: </disk>

In the failure case, the nova-compute logs include the following interesting bits. Note the additional '<address type="drive" controller="0"/>' lines in the XML.

Sep 12 04:48:43 devstack nova-compute[3062]: ERROR nova.virt.libvirt.guest [None req-a7c5f15c-1e44-4cd1-bf57-45b819676b20 admin admin] Error defining a guest with XML: <domain type="qemu">

Sep 12 04:48:43 devstack nova-compute[3062]: <devices>
Sep 12 04:48:43 devstack nova-compute[3062]: <disk type="block" device="disk">
Sep 12 04:48:43 devstack nova-compute[3062]: <driver name="qemu" type="raw" cache="none" io="native"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <source dev="/dev/sda"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <target bus="scsi" dev="sda"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <serial>08561cc0-5cf2-4eb7-a3f9-956f945e6c24</serial>
Sep 12 04:48:43 devstack nova-compute[3062]: <address type="drive" controller="0"/>
Sep 12 04:48:43 devstack nova-compute[3062]: </disk>
Sep 12 04:48:43 devstack nova-compute[3062]: <disk type="block" device="disk">
Sep 12 04:48:43 devstack nova-compute[3062]: <driver name="qemu" type="raw" cache="none" io="native"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <source dev="/dev/sdb"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <target bus="scsi" dev="sdb"/>
Sep 12 04:48:43 devstack nova-compute[3062]: <serial>007fac3d-8800-4f45-9531-e3bab5c86a1e</serial>
Sep 12 04:48:43 devstack nova-compute[3062]: <address type="drive" controller="0"/>
Sep 12 04:48:43 devstack nova-compute[3062]: </disk>

Sep 12 04:48:43 devstack nova-compute[3062]: : libvirtError: unsupported configuration: Found duplicate drive address for disk with target name 'sda' controller='0' bus='0' target='0' unit='0'
Sep 12 04:48:43 devstack nova-compute[3062]: ERROR nova.virt.libvirt.driver [None req-a7c5f15c-1e44-4cd1-bf57-45b819676b20 admin admin] [instance: cf4f2c6f-7391-4a49-8f40-5e5cda98f78b] Failed to start libvirt guest: libvirtError: unsupported configuration: Found duplicate drive address for disk with target name 'sda' controller='0' bus='0' target='0' unit='0'

Here is the libvirtd log in the failure case:

2018-09-12 04:48:43.312+0000: 16889: error : virDomainDefCheckDuplicateDriveAddresses:5747 : unsupported configuration: Found duplicate drive address for disk with target name 'sda' controller='0' bus='0' target='0' unit='0'

Chris Friesen (cbf123) on 2018-09-12
description: updated
Garrett Mueller (mueller-2) wrote :

I'm seeing the same behavior with Rocky and CentOS 7.5. I've also tried specifying a different device
name for the second disk, but that has no effect.

Ben Swartzlander (bswartz) wrote :

This issue seems to be that nova only handles "scsi" block device mappings correctly if you boot from an image that has image metadata "hw_scsi_model" = "virtio-scsi". If you boot from disk rather than booting from an image I'm not sure if there is a workaround.

Chris Friesen, can you share your image metadata for glance image 24e8e922-2687-48b5-a895-3134a650e00f in the above bug report?

melanie witt (melwitt) wrote :

Indeed, we only set the initial disk_mapping['unit'] values if the image meta scsi_controller.model is 'virtio-scsi':

https://github.com/openstack/nova/blob/ce520ee789bf6a46f56de86769d74c095ce432cf/nova/virt/libvirt/driver.py#L3920-L3937

But then, we increment the 'unit' value if 'bus' == 'scsi' (NOT only 'virtio-scsi') when attaching a volume:

https://github.com/openstack/nova/blob/ce520ee789bf6a46f56de86769d74c095ce432cf/nova/virt/libvirt/driver.py#L1422-L1423

Or creating a guest config:

https://github.com/openstack/nova/blob/ce520ee789bf6a46f56de86769d74c095ce432cf/nova/virt/libvirt/driver.py#L3892-L3894

I think the intention was to only to control the 'unit' numbering of the disks in nova if 'virtio-scsi' and let other disk type's unit numbering be handled automatically by libvirt. But because the 'unit' number incrementing code only checks generically for 'bus' == 'scsi', it's controlling the disk unit numbering for non-'virtio-scsi' as well.

I'm not sure there's anyway to check if 'virtio-scsi' from the disk_mapping. If there is, it seems like the fix would be to add that check to the disk unit increment conditionals.

melanie witt (melwitt) wrote :

Hm, but we only add 'unit' to the disk_mapping if image meta has scsi controller model 'virtio-scsi' and we only increment the 'unit' number if 'unit' is found in the disk_mapping. So it would seem like we're safely only setting and incrementing 'unit' if we have image meta. So I don't get how the duplicate address is happening.

Ben Swartzlander (bswartz) wrote :

@melanie: what happens is the the generated XML lacks the unit (for the disk address) altogether. If you omit the address tag, libvirt will autogenerate it, but if you send it and omit the unit field, it assumes zero, and that's where the conflict comes in.

Look at this part of the XML in the above report: <address type="drive" controller="0"/>
There is a controller number, but no unit.

Ben Swartzlander (bswartz) wrote :

@melanie, I think the fix for this bug would be to always add the unit to the disk mapping if it is present.

Ben Swartzlander (bswartz) wrote :

If there's a workaround that would work with OpenStack Rocky I'd like to know about it though.

Chris Friesen (cbf123) wrote :

Ben: in reply to your earlier request, I don't have that image around anymore. But it sounds like you might have a handle on the problem.

melanie witt (melwitt) wrote :

Thanks Ben, that's helpful.

Historically, we have let libvirt autogenerate the address tag (and thus avoided issues like this) and AFAIK the 'virtio-scsi' manipulation of the disk unit number [1] is the first time we started setting any of it manually.

If we're not setting the unit for the disk address, we shouldn't be setting the address tag at all, but we are. I'm not 100% sure how/where that's happening. I think it might be from what's included in this commit:

https://github.com/openstack/nova/commit/724ca8227a23a918d1810f866af661ac2a0730a3

If there's a way to avoid setting the address tag manually unless 'virtio-scsi' (and keep in line with the original intent of the unit number setting), then I think that's what we want to do to fix it. But I don't yet know if that's possible.

[1] https://bugs.launchpad.net/nova/+bug/1686116

Ben Swartzlander (bswartz) wrote :

https://github.com/openstack/nova/commit/c25629f85feb53b5be0347f68c43b3b55fb9f137

This is the commit that I find suspicious. But I also can't pinpoint where the problem is.

melanie witt (melwitt) wrote :

Ah, I think you're right.

It seems to me that we shouldn't be creating the address tag at all unless we intend to set the unit as well. So, I wonder if we should do something like this then:

--- a/nova/virt/libvirt/volume/volume.py
+++ b/nova/virt/libvirt/volume/volume.py
@@ -94,16 +94,15 @@ class LibvirtBaseVolumeDriver(object):
         if data.get('discard', False) is True:
             conf.driver_discard = 'unmap'

- if disk_info['bus'] == 'scsi':
+ if disk_info['bus'] == 'scsi' and 'unit' in disk_info:
             # The driver is responsible to create the SCSI controller
             # at index 0.
             conf.device_addr = vconfig.LibvirtConfigGuestDeviceAddressDrive()
             conf.device_addr.controller = 0
- if 'unit' in disk_info:
- # In order to allow up to 256 disks handled by one
- # virtio-scsi controller, the device addr should be
- # specified.
- conf.device_addr.unit = disk_info['unit']
+ # In order to allow up to 256 disks handled by one
+ # virtio-scsi controller, the device addr should be
+ # specified.
+ conf.device_addr.unit = disk_info['unit']

         if connection_info.get('multiattach', False):
             # Note that driver_cache should be disabled (none) when using

Ben Swartzlander (bswartz) wrote :

If you push a patch I'll be happy to test it. Maybe we can get a fix in Stein and a backport to Rocky and then I can have a happy customer.

Ben Swartzlander (bswartz) wrote :

FWIW the above patch (from comment 11) does work for the situation I care about. I would like to see it or something like it merged to Stein and backported to Rocky.

melanie witt (melwitt) wrote :

Thanks for confirming that, Ben. I'll cook up a patch for this.

Changed in nova:
assignee: nobody → melanie witt (melwitt)
importance: Undecided → Medium
status: New → Confirmed
tags: added: libvirt

Fix proposed to branch: master
Review: https://review.openstack.org/611974

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: melanie witt (melwitt) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2019-04-17
Changed in nova:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)

Reviewed: https://review.openstack.org/611974
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=48fd81648a7cadf2d147a6beabac067c28b288b0
Submitter: Zuul
Branch: master

commit 48fd81648a7cadf2d147a6beabac067c28b288b0
Author: melanie witt <email address hidden>
Date: Fri Oct 19 22:06:20 2018 +0000

    libvirt: set device address tag only if setting disk unit

    In Pike, we began setting disk unit values manually for the
    'virtio-scsi' controller model in order to allow up to 256 devices [1].
    We do this by setting the disk unit of the address tag manually for the
    guest config. If we do not set the address tag manually, libvirt would
    autogenerate it for us.

    A problem occurs when a user has a SCSI disk that is a volume or isn't
    using the 'virtio-scsi' controller model because we're not guarding our
    manual setting of the address tag in the guest config by the disk unit,
    in addition to the SCSI bus. This means that for a SCSI volume, we
    generate an address tag like '<address type="drive" controller="0"/>'
    for any SCSI volume, so a user with more than one device will get the
    following error when they try to boot an instance:

      Failed to start libvirt guest: libvirtError: unsupported
        configuration: Found duplicate drive address for disk with target name
        'sda' controller='0' bus='0' target='0' unit='0'

    This updates the conditionals to only manually set the address tag if
    the bus is SCSI _and_ the disk unit has been specified. Otherwise, let
    libvirt autogenerate the address tag and take care of avoiding
    collisions.

    [1] https://bugs.launchpad.net/nova/+bug/1686116

    Closes-Bug: #1792077

    Change-Id: Iefab05e84ccc0bf8f15bdbbf515a290d282dbc5d

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/653510
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7500c1910cf1cbe16ee48028b67aa919160f44da
Submitter: Zuul
Branch: stable/stein

commit 7500c1910cf1cbe16ee48028b67aa919160f44da
Author: melanie witt <email address hidden>
Date: Fri Oct 19 22:06:20 2018 +0000

    libvirt: set device address tag only if setting disk unit

    In Pike, we began setting disk unit values manually for the
    'virtio-scsi' controller model in order to allow up to 256 devices [1].
    We do this by setting the disk unit of the address tag manually for the
    guest config. If we do not set the address tag manually, libvirt would
    autogenerate it for us.

    A problem occurs when a user has a SCSI disk that is a volume or isn't
    using the 'virtio-scsi' controller model because we're not guarding our
    manual setting of the address tag in the guest config by the disk unit,
    in addition to the SCSI bus. This means that for a SCSI volume, we
    generate an address tag like '<address type="drive" controller="0"/>'
    for any SCSI volume, so a user with more than one device will get the
    following error when they try to boot an instance:

      Failed to start libvirt guest: libvirtError: unsupported
        configuration: Found duplicate drive address for disk with target name
        'sda' controller='0' bus='0' target='0' unit='0'

    This updates the conditionals to only manually set the address tag if
    the bus is SCSI _and_ the disk unit has been specified. Otherwise, let
    libvirt autogenerate the address tag and take care of avoiding
    collisions.

    [1] https://bugs.launchpad.net/nova/+bug/1686116

    Closes-Bug: #1792077

    Change-Id: Iefab05e84ccc0bf8f15bdbbf515a290d282dbc5d
    (cherry picked from commit 48fd81648a7cadf2d147a6beabac067c28b288b0)

Reviewed: https://review.opendev.org/653511
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8703282508954596e1d041340878c636693d8f5f
Submitter: Zuul
Branch: stable/rocky

commit 8703282508954596e1d041340878c636693d8f5f
Author: melanie witt <email address hidden>
Date: Fri Oct 19 22:06:20 2018 +0000

    libvirt: set device address tag only if setting disk unit

    In Pike, we began setting disk unit values manually for the
    'virtio-scsi' controller model in order to allow up to 256 devices [1].
    We do this by setting the disk unit of the address tag manually for the
    guest config. If we do not set the address tag manually, libvirt would
    autogenerate it for us.

    A problem occurs when a user has a SCSI disk that is a volume or isn't
    using the 'virtio-scsi' controller model because we're not guarding our
    manual setting of the address tag in the guest config by the disk unit,
    in addition to the SCSI bus. This means that for a SCSI volume, we
    generate an address tag like '<address type="drive" controller="0"/>'
    for any SCSI volume, so a user with more than one device will get the
    following error when they try to boot an instance:

      Failed to start libvirt guest: libvirtError: unsupported
        configuration: Found duplicate drive address for disk with target name
        'sda' controller='0' bus='0' target='0' unit='0'

    This updates the conditionals to only manually set the address tag if
    the bus is SCSI _and_ the disk unit has been specified. Otherwise, let
    libvirt autogenerate the address tag and take care of avoiding
    collisions.

    [1] https://bugs.launchpad.net/nova/+bug/1686116

    Closes-Bug: #1792077

    NOTE(melwitt): The difference in test_imagebackend.py from the Stein
    backport is because change I28c5bc23c0ea60d64153472d8937965f60f907c4
    is not in Rocky.

    Change-Id: Iefab05e84ccc0bf8f15bdbbf515a290d282dbc5d
    (cherry picked from commit 48fd81648a7cadf2d147a6beabac067c28b288b0)
    (cherry picked from commit 7500c1910cf1cbe16ee48028b67aa919160f44da)

Reviewed: https://review.opendev.org/653512
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c8ad9f4e1949ce6fdada565cc6660fbac7f75d6c
Submitter: Zuul
Branch: stable/queens

commit c8ad9f4e1949ce6fdada565cc6660fbac7f75d6c
Author: melanie witt <email address hidden>
Date: Fri Oct 19 22:06:20 2018 +0000

    libvirt: set device address tag only if setting disk unit

    In Pike, we began setting disk unit values manually for the
    'virtio-scsi' controller model in order to allow up to 256 devices [1].
    We do this by setting the disk unit of the address tag manually for the
    guest config. If we do not set the address tag manually, libvirt would
    autogenerate it for us.

    A problem occurs when a user has a SCSI disk that is a volume or isn't
    using the 'virtio-scsi' controller model because we're not guarding our
    manual setting of the address tag in the guest config by the disk unit,
    in addition to the SCSI bus. This means that for a SCSI volume, we
    generate an address tag like '<address type="drive" controller="0"/>'
    for any SCSI volume, so a user with more than one device will get the
    following error when they try to boot an instance:

      Failed to start libvirt guest: libvirtError: unsupported
        configuration: Found duplicate drive address for disk with target name
        'sda' controller='0' bus='0' target='0' unit='0'

    This updates the conditionals to only manually set the address tag if
    the bus is SCSI _and_ the disk unit has been specified. Otherwise, let
    libvirt autogenerate the address tag and take care of avoiding
    collisions.

    [1] https://bugs.launchpad.net/nova/+bug/1686116

    Closes-Bug: #1792077

    Change-Id: Iefab05e84ccc0bf8f15bdbbf515a290d282dbc5d
    (cherry picked from commit 48fd81648a7cadf2d147a6beabac067c28b288b0)
    (cherry picked from commit 7500c1910cf1cbe16ee48028b67aa919160f44da)
    (cherry picked from commit 8703282508954596e1d041340878c636693d8f5f)

Reviewed: https://review.opendev.org/653514
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b50b70a18286b20afa0dbf68bc4833ce5a6235ca
Submitter: Zuul
Branch: stable/pike

commit b50b70a18286b20afa0dbf68bc4833ce5a6235ca
Author: melanie witt <email address hidden>
Date: Fri Oct 19 22:06:20 2018 +0000

    libvirt: set device address tag only if setting disk unit

    In Pike, we began setting disk unit values manually for the
    'virtio-scsi' controller model in order to allow up to 256 devices [1].
    We do this by setting the disk unit of the address tag manually for the
    guest config. If we do not set the address tag manually, libvirt would
    autogenerate it for us.

    A problem occurs when a user has a SCSI disk that is a volume or isn't
    using the 'virtio-scsi' controller model because we're not guarding our
    manual setting of the address tag in the guest config by the disk unit,
    in addition to the SCSI bus. This means that for a SCSI volume, we
    generate an address tag like '<address type="drive" controller="0"/>'
    for any SCSI volume, so a user with more than one device will get the
    following error when they try to boot an instance:

      Failed to start libvirt guest: libvirtError: unsupported
        configuration: Found duplicate drive address for disk with target name
        'sda' controller='0' bus='0' target='0' unit='0'

    This updates the conditionals to only manually set the address tag if
    the bus is SCSI _and_ the disk unit has been specified. Otherwise, let
    libvirt autogenerate the address tag and take care of avoiding
    collisions.

    [1] https://bugs.launchpad.net/nova/+bug/1686116

    Closes-Bug: #1792077

     Conflicts:
         nova/tests/unit/virt/libvirt/volume/test_volume.py

    NOTE(melwitt): Conflict is due to not having change
    Ibfa64f18bbd2fb70db7791330ed1a64fe61c1355 in Pike.

    Change-Id: Iefab05e84ccc0bf8f15bdbbf515a290d282dbc5d
    (cherry picked from commit 48fd81648a7cadf2d147a6beabac067c28b288b0)
    (cherry picked from commit 7500c1910cf1cbe16ee48028b67aa919160f44da)
    (cherry picked from commit 8703282508954596e1d041340878c636693d8f5f)
    (cherry picked from commit c8ad9f4e1949ce6fdada565cc6660fbac7f75d6c)

melanie witt (melwitt) on 2019-04-24
no longer affects: nova/ocata

This issue was fixed in the openstack/nova 16.1.8 release.

This issue was fixed in the openstack/nova 19.0.1 release.

This issue was fixed in the openstack/nova 18.2.1 release.

This issue was fixed in the openstack/nova 17.0.11 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers