Unable to attach more than 6 scsi volumes

Bug #1864279 reported by Pavel
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Low
Unassigned
Ubuntu Cloud Archive
New
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Won't Fix
Low
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

Scsi volume with unit number 7 can not be attached because of this libvirt check: https://github.com/libvirt/libvirt/blob/89237d534f0fe950d06a2081089154160c6c2224/src/conf/domain_conf.c#L4796

Nova automatically increase volume unit number by 1, and when I attach 7th volume to vm I've got this error:
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [req-156a4725-279d-4173-9f11-85125e4a3e47] [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] Failed to attach volume at mountpoint: /dev/sdh: libvirt.libvirtError: Requested operation is not valid: Domain already contains a disk with that address
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] Traceback (most recent call last):
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1810, in attach_volume
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] guest.attach_device(conf, persistent=True, live=live)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 305, in attach_device
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] self._domain.attachDeviceFlags(device_xml, flags=flags)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 190, in doit
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 148, in proxy_call
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] rv = execute(f, *args, **kwargs)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 129, in execute
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] six.reraise(c, e, tb)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] raise value
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 83, in tworker
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] rv = meth(*args, **kwargs)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] File "/usr/lib/python3/dist-packages/libvirt.py", line 605, in attachDeviceFlags
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f] libvirt.libvirtError: Requested operation is not valid: Domain already contains a disk with that address
2020-02-21 09:12:53.309 3572 ERROR nova.virt.libvirt.driver [instance: 3532baf6-a0a4-4a81-84f9-3622c713435f]

After patching libvirt driver to skip unit 7 I can attach more than 6 volumes.

ii nova-compute 2:20.0.0-0ubuntu1~cloud0
ii nova-compute-kvm 2:20.0.0-0ubuntu1~cloud0
ii nova-compute-libvirt 2:20.0.0-0ubuntu1~cloud0
ii libvirt0:amd64 5.4.0-0ubuntu5~cloud0
ii librbd1 14.2.4-1bionic
ii libvirt-daemon-driver-storage-rbd 5.4.0-0ubuntu5~cloud0
ii python-rbd 14.2.4-1bionic
ii python3-rbd 14.2.4-1bionic

Tags: libvirt
Revision history for this message
drico (cp-ows) wrote :

Hi,
We are also hitting that bug

Revision history for this message
Henro (henro001) wrote :

Same here. Kubernetes cluster fails to attach Cinder volumes when the next unit number is 7.

Revision history for this message
Henro (henro001) wrote :

The workaround for me is to not use scsi bus and use virtio blk.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

This is an issue with libvirt itself and that _appears_ to have been fixed since 5.10. If nova needs to do anything here, it's add a warning about affected versions. That's a nice to have though.

Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Looks like it's been fixed on RHEL 7.7 too [1]. If you're on a different OS, I'd suggest opening a bug against the libvirt component for same and requesting a backport. I don't think there's much to do here from a nova perspective.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1741782

Changed in nova:
status: Confirmed → Won't Fix
Revision history for this message
melanie witt (melwitt) wrote :

Just adding a couple of more links that Stephen had mentioned in #openstack-nova today, this is the patch that fixed the issue in libvirt:

https://github.com/libvirt/libvirt/commit/c8007fdc5d2ce43fec2753cda60fb4963f55abd5

https://www.redhat.com/archives/libvir-list/2019-September/msg00407.html

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

As Stephen mentioned in comment#4 and the links Mel provided in comment#6, upstream libvirt v5.10.0 _does_ have these patches which should fix this problem.

Please make sure your Linux distribution has the libvirt package with those fixes.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

affects: cloud-archive → libvirt (Ubuntu)
Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Dincer Celik (dincercelik) wrote :

Faced this issue on UCA Stein as it has livbirt 5.0.0 and Train might be affected too as it has libvirt 5.4.0.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From libvirt package POV this is fixed in >=Focal (20.04) and the only release left affected is Bionic (18.04, libvirt 4.0).

Fixing it in Bionic is IMHO worth to consider but low prio (if users can influence, they can set an address to 8-16 and it works, only the default-no-address use case is broken).

Never the less, fixing it in Bionic won't help your cases on UCA with 5.0/5.4 - those would have to be done by the UCA team. Therefore I'm adding back a cloud-archive task.

Changed in libvirt (Ubuntu Groovy):
status: New → Fix Released
Changed in libvirt (Ubuntu Focal):
status: New → Fix Released
Changed in libvirt (Ubuntu Hirsute):
status: New → Fix Released
Changed in libvirt (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI, for Bionic this isn't a trivial backport.
The change as it was added later would need at least
  commit 932862b8 conf: Rework and rename virDomainDeviceFindControllerModel
  commit 6ae6ffd8 qemu: Introduce qemuDomainFindSCSIControllerModel
Maybe more.
It is not un-doable, but it gets into an area where slowly but surely the regression risk might out weight the benefit. 18.04 is out for almowst 3 years now, coming up only now indicates it can't be the most evil issue possible. Many (but sadly not all) use cases have workarounds by specifying explicit IDs.
And finally - at least for the case reported here fixing it in Bionic wouldn't even help as you want/need it for UCA based on later versions.

On the bright side, the versions in UCA that you ask for have those reworks applied. There the code should match more easily.

i'm willing to give this a deeper look IF someone actually cares about this in Bionic (not Bionuc-UCA-*). So I'll set the bug task there to Won't fix and would ask anyone affected to update the bug to make an argument for why we really need it (that reasoning will be needed for an SRU anyway).

Changed in libvirt (Ubuntu Bionic):
status: Triaged → Won't Fix
Revision history for this message
Szilard Cserey (szilard.cserey) wrote :

Hi Christian, we need this fixed in Bionic, can you please help us?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI Clarified offline as this would now go via an ESM upload, likely by SEG which Szilard was ok to escalate to. I also passed some hints on how these uploads work.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.