nova api return HTTP 500 error when interface attach failed due to lack of instance PCI slots

Bug #1881881 reported by Alexandre arents
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Wishlist
Unassigned

Bug Description

Description
===========

When no more pci slot are available for hot pluggable network interface, nova api return HTTP 500 internal error which is not very helpfull from client point of view.

It seems that nova catch all libvirt error and raise:
nova.exception.InterfaceAttachFailed: Failed to attach network adapter device to 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53

Should it be better to handle "libvirt.libvirtError: internal error: No more available PCI slots" As a exc.HTTPConflict() instead of webob.exc.HTTPInternalServerError() for example?
https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/attach_interfaces.py#L182

Steps to reproduce
==================

On a fresh devstack master install

1) spawn one instance

2) create 30 network interfaces
for i in $(seq 2 30) ; do openstack port create --network private port-${i} ; done

3) Attach interface to isntances:
for i in $(seq 2 30) ; do openstack server add port 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53 port-${i} ; done
last attach ones should failed we the folling out put:

openstack server add port 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53 port-26
Failed to attach network adapter device to 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53 (HTTP 500) (Request-ID: req-03476f4a-97ac-483d-bf15-e9f0bda776d4)

in Logs:

Jun 03 09:02:06 alex-devstack nova-compute[28931]: ERROR nova.virt.libvirt.driver [instance: 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53] File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 593, in attachDeviceFlags
Jun 03 09:02:06 alex-devstack nova-compute[28931]: ERROR nova.virt.libvirt.driver [instance: 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
Jun 03 09:02:06 alex-devstack nova-compute[28931]: ERROR nova.virt.libvirt.driver [instance: 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53] libvirt.libvirtError: internal error: No more available PCI slots
...
Jun 03 09:02:22 alex-devstack nova-compute[28931]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 7574, in attach_interface
Jun 03 09:02:22 alex-devstack nova-compute[28931]: ERROR oslo_messaging.rpc.server instance_uuid=instance.uuid)
Jun 03 09:02:22 alex-devstack nova-compute[28931]: ERROR oslo_messaging.rpc.server nova.exception.InterfaceAttachFailed: Failed to attach network adapter device to 0c6d2b7a-07d8-4f64-baa8-a6c05fb6ce53

Expected result
===============
Getting HTTP conflict or other explicit response indicating that we reach a limit.

Actual result
=============
Getting HTTP 500 internal error.

Tags: libvirt
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I can reproduce the problem.

Libvirt reports this as internal error (VIR_ERR_INTERNAL_ERROR = 1 (0x1)) and only the error message gives a hint that the fault is due to running out of PCI slots. However I don't like the idea to parse the libvirt error message in nova. I suggest instead to reach out to the libvirt community to either remove this limit or at least return a specific error code so nova can differentiate other libvirt internal errors from reaching this slot limit.

tags: added: libvirt
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

FYI the limit of 28 slots is coming from qemu[1] and can be avoided with q35 machine type.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg01271.html

Changed in nova:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote (last edit ):

Interesting libvirt's behavior is different between attaching 30 new interfaces from the case when the VM is created with that 30 interfaces in the first place.

In the boot case[1] there will be an extra device added to the guest

    <controller type='pci' index='1' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='1'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>

and this allows booting with more than 30 NICs.

In the attach case[2] there is no pci-bridge device.

[1] https://paste.opendev.org/show/811300/
[2] https://paste.opendev.org/show/811301/

Revision history for this message
do3meli (d-info-e) wrote :

following workaround helped me (of course requires a downtime of the guest):

- power off vm
- attach interface
- power on vm again

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.