boot from volume + configdrive with virtio-scsi broken (regression)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Jay Pipes | ||
| Ocata |
High
|
Artom Lifshitz | ||
| Pike |
High
|
Artom Lifshitz | ||
| Queens |
High
|
Artom Lifshitz |
Bug Description
Hi,
Since last ocata update (2:15.0.
The libvirt xml generated looks wrong, on the first scsi disk the "unit" is wrong, it's 1 while it must be 0. The VM can't boot, kvm start but the boot screen show the "No boot disk found" message.
The wrong xml generated:
<disk type='network' device='cdrom'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<auth username=
<secret type='ceph' uuid='XXXXXXXXX
</auth>
<source protocol='rbd' name='disks/
<host name='XXX.
<host name='XXX.
<host name='XXX.
</source>
<target dev='hda' bus='ide'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
<auth username=
<secret type='ceph' uuid='XXXXXXXXX
</auth>
<source protocol='rbd' name='ssds/
<host name='XXX.
<host name='XXX.
<host name='XXX.
</source>
<target dev='sda' bus='scsi'/>
<
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
As workaround, I have fixed the issue here:
--- a/nova/
+++ b/nova/
@@ -3576,7 +3576,7 @@
disk_info = disk_mapping[name]
- if 'unit' in disk_mapping:
+ if 'unit' in disk_mapping and disk_info["bus"] == "scsi":
conf = disk.libvirt_
Mehdi Abaakouk (sileht) wrote : | #1 |
Changed in nova: | |
assignee: | nobody → sahid (sahid-ferdjaoui) |
tags: | added: libvirt |
Changed in nova: | |
status: | New → Confirmed |
Logan V (loganv) wrote : | #2 |
The patch provided fixed the issue for me when doing virtio-scsi boot from RBD volume + config drive on Ocata.
However, I do see a lingering issue on an instance which has 2 virtio-scsi RBD volumes attached. One volume is the boot device and is marked bootable in Cinder. The other volume is a storage disk and is not marked bootable in cinder. Nova keeps generating the libvirt XML with the storage disk listed as LUN 0, and the boot disk listed as LUN 1, breaking the instance boot. Viewing the volume in Cinder shows the correct boot device listed as mounted at /dev/sda.
For reference below:
Boot volume: a63ea64e-
Data volume: e82a7785-
ubuntu@
+------
| Field | Value |
+------
| attachments | [{u'server_id': u'd83da61f-
| availability_zone | us-dfw-1 |
| bootable | false |
| consistencygroup_id | None ...
Logan V (loganv) wrote : | #3 |
It seems like these 2 out of order virtio-scsi issues described above relating to secondary volumes and config drives ending up as index 0 are probably a regression introduced in https:/
This seriously impacts our environment due to heavy RBD use in nova/cinder. Most of our instances have at least some type of virtio-scsi backed disk. We've found that any instance using config drive or boot from volume + storage volume(s) is at risk of having an unbootable instance in Ocata or later when virtio-scsi is used.
Logan V (loganv) wrote : | #4 |
Doing some more debugging, it looks like the disk_info[
disk_info=
I guess in _get_guest_
melanie witt (melwitt) wrote : | #5 |
Marking this as High because it's a regression.
Changed in nova: | |
importance: | Undecided → High |
summary: |
- boot from volume + configdrive broken + boot from volume + configdrive broken (regression) |
tags: | added: volumes |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | sahid (sahid-ferdjaoui) → Jay Pipes (jaypipes) |
status: | Confirmed → In Progress |
Hi Logan,
Thank you for all the detailed info to help with debugging.
I noticed in your paste none of the volumes in your disk_info[
I'm trying to work out what's needed to address the remaining problem you're observing with boot-from-volume plus additional volumes and drives. Would it be possible for you to test the following patch that builds upon the first one?
--- a/nova/
+++ b/nova/
@@ -3763,6 +3763,8 @@ class LibvirtDriver(
# unit added and be able to increment it for each disk
# added.
+ if self._is_
+ disk_mapping[
def _get_ephemeral_
@@ -3852,8 +3854,11 @@ class LibvirtDriver(
info = disk_mapping[
if scsi_controller and scsi_controller
- info['unit'] = disk_mapping[
- disk_mapping[
+ if vol.get(
+ info['unit'] = 0
+ else:
+ info['unit'] = disk_mapping[
+ disk_mapping[
cfg = self._get_
melanie witt (melwitt) wrote : | #8 |
Sorry that formatting came out all wrong. Here's a paste of it: https:/
summary: |
- boot from volume + configdrive broken (regression) + boot from volume + configdrive with virtio-scsi broken (regression) |
Logan V (loganv) wrote : | #9 |
Sorry I missed the patch you posted earlier until the bug update today.. I tested the patch and it fixes the situation in #2.
Confirmed @ the disk_info[
melanie witt (melwitt) wrote : | #10 |
Thanks very much Logan for confirming that. We'll work to get the proposed patch [1] fixed up with proper unit test coverage and reviewed.
Changed in nova: | |
assignee: | Jay Pipes (jaypipes) → melanie witt (melwitt) |
Changed in nova: | |
assignee: | melanie witt (melwitt) → sahid (sahid-ferdjaoui) |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 2616b384e642b6e
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500
only increment disk address unit for scsi devices
We were erroneously incrementing the disk address unit attribute for
non-scsi devices, which resulted in inconsistent disk device naming and
addresses when SCSI devices were used along with non-SCSI devices (like
configdrive devices).
Also, we ensure that we assign unit number 0 for the boot volume of a
boot-
Change-Id: Ia91e2f9c316e25
Co-authored-by: Mehdi Abaakouk <email address hidden>
Closes-bug: #1729584
Closes-bug: #1753394
Changed in nova: | |
status: | In Progress → Fix Released |
Fix proposed to branch: stable/queens
Review: https:/
Changed in nova: | |
assignee: | sahid (sahid-ferdjaoui) → Jay Pipes (jaypipes) |
Fix proposed to branch: stable/pike
Review: https:/
Fix proposed to branch: stable/ocata
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit f9c66434eea245a
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500
only increment disk address unit for scsi devices
We were erroneously incrementing the disk address unit attribute for
non-scsi devices, which resulted in inconsistent disk device naming and
addresses when SCSI devices were used along with non-SCSI devices (like
configdrive devices).
Also, we ensure that we assign unit number 0 for the boot volume of a
boot-
Change-Id: Ia91e2f9c316e25
Co-authored-by: Mehdi Abaakouk <email address hidden>
Closes-bug: #1729584
Closes-bug: #1753394
(cherry picked from commit 2616b384e642b6e
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit b255e16bd93d989
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500
only increment disk address unit for scsi devices
We were erroneously incrementing the disk address unit attribute for
non-scsi devices, which resulted in inconsistent disk device naming and
addresses when SCSI devices were used along with non-SCSI devices (like
configdrive devices).
Also, we ensure that we assign unit number 0 for the boot volume of a
boot-
Co-authored-by: Mehdi Abaakouk <email address hidden>
Closes-bug: #1729584
Closes-bug: #1753394
Conflicts:
nova/
NOTE(artom) Conflicts in nova/tests/
because the surrounding _get_guest_
present in pike.
Change-Id: Ia91e2f9c316e25
(cherry picked from commit 2616b384e642b6e
(cherry picked from commit f9c66434eea245a
This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit 1150d4a2af5b06c
Author: Jay Pipes <email address hidden>
Date: Fri Jan 26 12:20:35 2018 -0500
only increment disk address unit for scsi devices
We were erroneously incrementing the disk address unit attribute for
non-scsi devices, which resulted in inconsistent disk device naming and
addresses when SCSI devices were used along with non-SCSI devices (like
configdrive devices).
Also, we ensure that we assign unit number 0 for the boot volume of a
boot-
Co-authored-by: Mehdi Abaakouk <email address hidden>
Closes-bug: #1729584
Closes-bug: #1753394
Change-Id: Ia91e2f9c316e25
(cherry picked from commit 2616b384e642b6e
(cherry picked from commit f9c66434eea245a
(cherry picked from commit b255e16bd93d989
This issue was fixed in the openstack/nova 16.1.2 release.
This issue was fixed in the openstack/nova 17.0.3 release.
This issue was fixed in the openstack/nova 15.1.1 release.
The issue is still present on master branch.