rbd volume disk device config has wrong scsi unit address since pike

Bug #1753394 reported by yafeng on 2018-03-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Unassigned
OpenStack Compute (nova)
High
Jay Pipes
Ocata
High
Artom Lifshitz
Pike
High
Artom Lifshitz
Queens
High
Artom Lifshitz

Bug Description

Yaml:
 vol_omu:
    type: OS::Heat::ResourceGroup
    properties:
      count: 2
      resource_def:
        type: OS::Cinder::Volume
        properties:
          name: OMU-Volume
          source_volid: { get_param: omu_volume_id }
          size: 22

  OMU-0:
    type: OS::Nova::Server
    properties:
      name: OMU-0
      image: { get_param: ipxe_image_id }
      flavor: { get_param: flavor_omu }
      key_name: { get_param: key_name }
      networks: { get_attr: [OMU_0_ports, port-map] }
      block_device_mapping_v2:
        - { "boot_index": -1, "disk_bus": "scsi", "volume_id": { get_attr: [vol_omu, resource.0] } }
        - { "boot_index": -1, "disk_bus": "scsi", "volume_id": { get_attr: [vol_omu, resource.1] } }

Related xml content for OpenStack Newton:
<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='none'/>
  <source protocol='rbd' name='cinder-volumes/volume-20807bd7-d1e7-4f87-a94d-f8234945b433'>
    <host name='172.168.42.3' port='6789'/>
    <host name='172.168.42.4' port='6789'/>
    <host name='172.168.42.5' port='6789'/>
  </source>
  <target dev='sda' bus='scsi'/>
  <serial>20807bd7-d1e7-4f87-a94d-f8234945b433</serial>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='none'/>
  <source protocol='rbd' name='cinder-volumes/volume-cd63e6f9-87f9-4c6b-ba87-bd4ff3c113c4'>
    <host name='172.168.42.3' port='6789'/>
    <host name='172.168.42.4' port='6789'/>
    <host name='172.168.42.5' port='6789'/>
  </source>
  <target dev='sdb' bus='scsi'/>
  <serial>cd63e6f9-87f9-4c6b-ba87-bd4ff3c113c4</serial>
  <address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>

Related xml content for OpenStack Pike:
 <disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='none' discard='unmap'/>
  <auth username='cinder'>
    <secret type='ceph' uuid='6c373d85-665f-4fbf-8bb5-880181870709'/>
  </auth>
  <source protocol='rbd' name='volumes/volume-029d943d-60ad-4f90-87b7-a2e7a20394f9'>
    <host name='192.168.1.29' port='6789'/>
    <host name='192.168.1.30' port='6789'/>
    <host name='192.168.1.31' port='6789'/>
  </source>
  <target dev='sda' bus='scsi'/>
  <serial>029d943d-60ad-4f90-87b7-a2e7a20394f9</serial>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='none' discard='unmap'/>
  <auth username='cinder'>
    <secret type='ceph' uuid='6c373d85-665f-4fbf-8bb5-880181870709'/>
  </auth>
  <source protocol='rbd' name='volumes/volume-0dc8e3a2-6fa9-4530-bd5b-db6cef83b9f3'>
    <host name='192.168.1.29' port='6789'/>
    <host name='192.168.1.30' port='6789'/>
    <host name='192.168.1.31' port='6789'/>
  </source>
  <target dev='sdb' bus='scsi'/>
  <serial>0dc8e3a2-6fa9-4530-bd5b-db6cef83b9f3</serial>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>

The problem happened on Pike.
Two disks have the same unit ID:
<address type='drive' controller='0' bus='0' target='0' unit='0'/>

And error log:
qemu-kvm: -object secret,id=scsi0-0-0-0-secret0,data=flYlAwpFVjBHcQ/NI3UUe+bahMmauiudlUbN3KW3RIk=,keyid=masterKey0,iv=7FE+TU876kfSu2Yrrm3uBQ==,format=base64:
Duplicate ID 'scsi0-0-0-0-secret0' for object

yafeng (yafeng) on 2018-03-06
description: updated
yafeng (yafeng) on 2018-03-06
affects: openstack-community → cinder
Matt Riedemann (mriedem) wrote :
tags: added: libvirt volumes
Matt Riedemann (mriedem) on 2018-03-06
summary: - block_device_mapping_v2 cannot work with Pike
+ rbd volume disk device config has wrong scsi unit address since pike
Changed in cinder:
status: New → Invalid
Changed in nova:
status: New → Confirmed

I would say it's a configuration issue... but it's difficult to say, we perhaps have to fix something.

The problem is that you need to indicate that which SCSI model you want for your device.

Can you add the image property hw_scsi_model=virtio-scsi?

yafeng (yafeng) wrote :

The image is a ipxe image, used to set some extra parameters and load boot program from volume.
If I remove "image: { get_param: ipxe_image_id }", and let the VM boot from volume, then ID is correct.
It means virtio-blk and virtio-scsi cannot exist together?
 OMU-0:
    type: OS::Nova::Server
    properties:
      name: OMU-0
      flavor: { get_param: flavor_omu }
      key_name: { get_param: key_name }
      networks: { get_attr: [OMU_0_ports, port-map] }
      block_device_mapping_v2:
        - { "boot_index": 0, "disk_bus": "scsi", "volume_id": { get_attr: [vol_omu, resource.0] } }
        - { "boot_index": -1, "disk_bus": "scsi", "volume_id": { get_attr: [vol_omu, resource.1] } }

yafeng (yafeng) wrote :

But why it can work on Newton?

yafeng (yafeng) wrote :

I tried to add image property hw_scsi_model=virtio-scsi, ID seems correct, but ipxe cannot boot from SAN device:
Boot from SAN device 0x81 failed: Exec format error(http://ipxe.org/2e852001)
Boot from SAN device 0x82 failed: Exec format error(http://ipxe.org/2e852001)

It seems ipxe cannot find the volume devices, because seabios only can detect the first device, ipxe image in this case?

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' discard='ignore'/>
      <source file='/mnt/nova/instances/89ed7307-68f6-4e20-8e5e-01ab5f11debe/disk'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='6c373d85-665f-4fbf-8bb5-880181870709'/>
      </auth>
      <source protocol='rbd' name='volumes/volume-abf00165-6784-4ea0-aa08-d4fb90fbaa0e'>
        <host name='192.168.1.29' port='6789'/>
        <host name='192.168.1.30' port='6789'/>
        <host name='192.168.1.31' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <serial>abf00165-6784-4ea0-aa08-d4fb90fbaa0e</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='6c373d85-665f-4fbf-8bb5-880181870709'/>
      </auth>
      <source protocol='rbd' name='volumes/volume-91b742b2-1570-484e-abf7-2302c9c4beaf'>
        <host name='192.168.1.29' port='6789'/>
        <host name='192.168.1.30' port='6789'/>
        <host name='192.168.1.31' port='6789'/>
      </source>
      <target dev='sdb' bus='scsi'/>
      <serial>91b742b2-1570-484e-abf7-2302c9c4beaf</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>

Jay is working on a patch which should fix your issue as-well.

Do you have the possibility to validate it?

  https://review.openstack.org/#/c/538310/

yafeng (yafeng) wrote :

OpenStack RPM packages are installed in my environment.
python-nova-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-conductor-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-ansible-os_nova-16.0.0-1.el7.centos.ncir.1.noarch
openstack-nova-api-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-novncproxy-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-console-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-scheduler-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
ansible-nova-nokia-c7.ge25659f-1.el7.centos.ncir.noarch
python2-novaclient-9.1.0-1.el7.noarch
openstack-nova-placement-api-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-common-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch
openstack-nova-compute-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch

yafeng (yafeng) wrote :

I found driver.py, driver.pyc and driver.pyo under the directory /usr/lib/python2.7/site-packages/nova/virt/libvirt from the package python-nova-16.0.1-0.20170926081607.edd59ae.el7.centos.noarch.
How can I apply the patch?

Matt Riedemann (mriedem) wrote :

You can checkout whatever tag of code you're on (16.0.1):

1. git clone <url to nova repo>
2. git checkout 16.0.1
3. cherry pick jay's patch:

git fetch https://git.openstack.org/openstack/nova refs/changes/10/538310/2 && git cherry-pick -x FETCH_HEAD

resolve any merge conflicts if there are any and commit those changes

4. Generate a patch:

git format-patch -1

That will give you a .patch file which you can then apply on your install directory where nova exists (you probably want to also delete those .pyo and .pyc files):

https://linuxacademy.com/blog/linux/introduction-using-diff-and-patch/

Changed in nova:
assignee: nobody → melanie witt (melwitt)
status: Confirmed → In Progress
Changed in nova:
assignee: melanie witt (melwitt) → sahid (sahid-ferdjaoui)

Reviewed: https://review.openstack.org/538310
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2616b384e642b6eb58eef7da87b6e893f25a949e
Submitter: Zuul
Branch: master

commit 2616b384e642b6eb58eef7da87b6e893f25a949e
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500

    only increment disk address unit for scsi devices

    We were erroneously incrementing the disk address unit attribute for
    non-scsi devices, which resulted in inconsistent disk device naming and
    addresses when SCSI devices were used along with non-SCSI devices (like
    configdrive devices).

    Also, we ensure that we assign unit number 0 for the boot volume of a
    boot-from-volume instance.

    Change-Id: Ia91e2f9c316e25394a0f41dc341d903dfcff6921
    Co-authored-by: Mehdi Abaakouk <email address hidden>
    Closes-bug: #1729584
    Closes-bug: #1753394

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem) on 2018-04-13
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Jay Pipes (jaypipes)

Reviewed: https://review.openstack.org/561196
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9c66434eea245ae05a449059391515376f5a456
Submitter: Zuul
Branch: stable/queens

commit f9c66434eea245ae05a449059391515376f5a456
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500

    only increment disk address unit for scsi devices

    We were erroneously incrementing the disk address unit attribute for
    non-scsi devices, which resulted in inconsistent disk device naming and
    addresses when SCSI devices were used along with non-SCSI devices (like
    configdrive devices).

    Also, we ensure that we assign unit number 0 for the boot volume of a
    boot-from-volume instance.

    Change-Id: Ia91e2f9c316e25394a0f41dc341d903dfcff6921
    Co-authored-by: Mehdi Abaakouk <email address hidden>
    Closes-bug: #1729584
    Closes-bug: #1753394
    (cherry picked from commit 2616b384e642b6eb58eef7da87b6e893f25a949e)

Matt Riedemann (mriedem) on 2018-04-19
Changed in nova:
importance: Undecided → High

Reviewed: https://review.openstack.org/561611
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b255e16bd93d9891caff8ffc84b8d7bc2991f90a
Submitter: Zuul
Branch: stable/pike

commit b255e16bd93d9891caff8ffc84b8d7bc2991f90a
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500

    only increment disk address unit for scsi devices

    We were erroneously incrementing the disk address unit attribute for
    non-scsi devices, which resulted in inconsistent disk device naming and
    addresses when SCSI devices were used along with non-SCSI devices (like
    configdrive devices).

    Also, we ensure that we assign unit number 0 for the boot volume of a
    boot-from-volume instance.

    Co-authored-by: Mehdi Abaakouk <email address hidden>
    Closes-bug: #1729584
    Closes-bug: #1753394

     Conflicts:
     nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(artom) Conflicts in nova/tests/unit/virt/libvirt/test_driver.py
    because the surrounding _get_guest_config_with_graphics method isn't
    present in pike.

    Change-Id: Ia91e2f9c316e25394a0f41dc341d903dfcff6921
    (cherry picked from commit 2616b384e642b6eb58eef7da87b6e893f25a949e)
    (cherry picked from commit f9c66434eea245ae05a449059391515376f5a456)

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Reviewed: https://review.openstack.org/561613
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1150d4a2af5b06c2df5dea053f5f7aea090c9145
Submitter: Zuul
Branch: stable/ocata

commit 1150d4a2af5b06c2df5dea053f5f7aea090c9145
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500

    only increment disk address unit for scsi devices

    We were erroneously incrementing the disk address unit attribute for
    non-scsi devices, which resulted in inconsistent disk device naming and
    addresses when SCSI devices were used along with non-SCSI devices (like
    configdrive devices).

    Also, we ensure that we assign unit number 0 for the boot volume of a
    boot-from-volume instance.

    Co-authored-by: Mehdi Abaakouk <email address hidden>
    Closes-bug: #1729584
    Closes-bug: #1753394

    Change-Id: Ia91e2f9c316e25394a0f41dc341d903dfcff6921
    (cherry picked from commit 2616b384e642b6eb58eef7da87b6e893f25a949e)
    (cherry picked from commit f9c66434eea245ae05a449059391515376f5a456)
    (cherry picked from commit b255e16bd93d9891caff8ffc84b8d7bc2991f90a)

This issue was fixed in the openstack/nova 16.1.2 release.

This issue was fixed in the openstack/nova 17.0.3 release.

This issue was fixed in the openstack/nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers