attaching interface with q35 machine type fails with "No mode available PCI slots"

Bug #1831701 reported by Martin Schuppert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Martin Schuppert

Bug Description

Using q35 machine type, tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesTestJSON.test_create_list_show_delete_interfaces_by_network_port fails with:
tempest.lib.exceptions.ServerFault: Got server fault
Details: Failed to attach network adapter device to 483cfa3d-2af5-4a4e-9296-e4204b59fbd7

Seen in a 1 controller 1 compute ML2/OVS with VXLAN tunnels deployment

Digging in nova logs shows:
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] attaching network adapter failed.: libvirt.libvirtError: internal error: No more available PCI slots
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] Traceback (most recent call last):
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1761, in attach_interface
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] guest.attach_device(cfg, persistent=True, live=live)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] self._domain.attachDeviceFlags(device_xml, flags=flags)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] rv = execute(f, *args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] six.reraise(c, e, tb)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] raise value
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] rv = meth(*args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 605, in attachDeviceFlags
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] libvirt.libvirtError: internal error: No more available PCI slots

tl;dr — The immediate "fix" is to make TripleO configure the
        'num_pcie_ports' to 12 (or 16), because 'q35' machine type by
        default allows hotplugging only _one_ PCIe device.

Long
----

(*) Firstly, the Tempest test[1],
    test_create_list_show_delete_interfaces_by_network_port(), is trying
    to hot-plug *three* network interfaces:

        [...]
        try:
            iface = self._test_create_interface(server)
        [...]
        iface = self._test_create_interface_by_network_id(server, ifs)
        ifs.append(iface)

        iface = self._test_create_interface_by_port_id(server, ifs)
 ifs.append(iface)
        [...]

(*) We're here using 'q35' machine type, which by default allows only a
    *single* PCIe device to be hotplugged. And Nova currently sets
    'num_pcie_ports' to "0" (which means, it defaults to libvirt's "1"),
    but as the previous point showed, the test is hot-plugging _3_
    interfaces.

    And as the libvirt document[2] states: "If you plan to hotplug more
    than a single PCI Express device, you should add a suitable number
    of pcie-root-port controllers when defining the guest".

(*) But the next question is: "Why does the test work with 'pc'
    machine type, then?" It works because, with 'pc' (or 'i440fx'),
    "each of the 31 slots (from 0x01 to 0x1f) on the pci-root controller
    is hotplug capable and can accept a legacy PCI device"[3].

[1] https://github.com/openstack/tempest/blob/25f5d28f3c2c79d7d0abfaa48db5d53a41f5e40d/tempest/api/compute/servers/test_attach_interfaces.py#L219
[2] https://libvirt.org/pci-hotplug.html#x86_64-q35
[3] https://libvirt.org/pci-hotplug.html#x86_64-i440fx

Next Steps
----------

- Immediately, make TripleO increment the no. of 'num_pcie_ports' to 16.

- Long-term, write-up a spec-less Blueprint for allowing this via
  flavor and image metadata property (e.g. "hw_num_pcie_ports").

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/663261

Changed in tripleo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/663261
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0f6dabc725b16c975023a0e3f09433d7039922e2
Submitter: Zuul
Branch: master

commit 0f6dabc725b16c975023a0e3f09433d7039922e2
Author: Martin Schuppert <email address hidden>
Date: Wed Jun 5 09:28:10 2019 +0200

    Add new role parameter NovaLibvirtNumPciePorts

    Add role parameter NovaLibvirtNumPciePorts which sets `libvirt/num_pcie_ports`
    to specify the number of PCIe ports an instance will get.
    Libvirt allows a custom number of PCIe ports (pcie-root-port controllers) a
    target instance will get. Some will be used by default, rest will be available
    for hotplug use. When using the 'q35' machine type, by default, it allows only
    a *single* PCIe device to be hotplugged. And Nova currently sets
    'num_pcie_ports' to "0" (which means, it defaults to libvirt's "1"), which is
    not sufficient for hotplug use.

    Default for NovaLibvirtNumPciePorts is 16.

    Change-Id: Ida27b52a091640545aecc982fc1a509fb5107db8
    Closes-Bug: #1831701
    Depends-On: I16732c9d6013112381cfad999540dd41ec3d7ba3

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/663500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/663500
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b9992d9658754dfd50af12df38dee847b695c399
Submitter: Zuul
Branch: stable/stein

commit b9992d9658754dfd50af12df38dee847b695c399
Author: Martin Schuppert <email address hidden>
Date: Wed Jun 5 09:28:10 2019 +0200

    Add new role parameter NovaLibvirtNumPciePorts

    Add role parameter NovaLibvirtNumPciePorts which sets `libvirt/num_pcie_ports`
    to specify the number of PCIe ports an instance will get.
    Libvirt allows a custom number of PCIe ports (pcie-root-port controllers) a
    target instance will get. Some will be used by default, rest will be available
    for hotplug use. When using the 'q35' machine type, by default, it allows only
    a *single* PCIe device to be hotplugged. And Nova currently sets
    'num_pcie_ports' to "0" (which means, it defaults to libvirt's "1"), which is
    not sufficient for hotplug use.

    Default for NovaLibvirtNumPciePorts is 16.

    Change-Id: Ida27b52a091640545aecc982fc1a509fb5107db8
    Closes-Bug: #1831701
    Depends-On: I16732c9d6013112381cfad999540dd41ec3d7ba3
    (cherry picked from commit 0f6dabc725b16c975023a0e3f09433d7039922e2)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.0.0

This issue was fixed in the openstack/tripleo-heat-templates 11.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.6.0

This issue was fixed in the openstack/tripleo-heat-templates 10.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.