Virtual function is being attached to port regardless of the exclude_devices configuration

Bug #2066989 reported by Felipe Figueroa Vergara
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

When configuring SR-IOV and excluding certain VFs in the exclude_devices entry of sriov_agent.ini, I encounter an issue where the excluded devices are still being attached to created ports. When I create a VM from scratch with a port attached to an excluded device, the VM creation eventually fails with the error "nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed." I suppose this is the expected behavior because the device should be excluded. However, when I create a VM and then try to attach the port with the excluded device, the port attaches without any issues. The core problem seems to be that the port with the excluded device is created regardless of its exclusion.

I use the following config for exclude a device, in the sriov_agent.ini:

[sriov_nic]
exclude_devices = enp65s0f0np0:0000:41:01.0,enp65s0f1np1:0000:41:11.0

I can still create a port on that device, and I can see the pci_slot entry on the port as pci_slot='0000:41:11.0'.

Tags: sriov-pci-pt
Revision history for this message
Brian Haley (brian-haley) wrote :

Can you give some more info on the commands you're running (openstack port create, etc) and the versions? Looking at the code I don't see any fixed in this area, but maybe there's a code path that is failing to check the excluded devices. Thanks.

tags: added: sriov-pci-pt
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Felipe Figueroa Vergara (felipeafv) wrote :

Here is an example of what I'm doing for VMs created from scratch with the port attached:

1. Create the SR-IOV ports using openstack port create (for each port).
2. Create the server using openstack server create.

With this process, when I create the last VM (which is forced to use the excluded device), it stays in build status for about 6 minutes before failing to create. Checking the Nova compute logs, I see the error "Virtual Interface creation failed."

The other process I use involves creating a single VM and then attaching the SR-IOV ports with the following steps:

1. Create the server using openstack server create.
2. Create the SR-IOV ports using openstack port create (for each port).
3. Attach the ports to the server using openstack server add port.

With this process, I can attach all the SR-IOV virtual functions to the VM, including the one excluded by the exclude_devices directive.

The versions I’m using are:
nova 26.2.3
neutron 21.2.1.dev58

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Felipe:

While trying locally to reserve a SRIOV environment to test that, let me ask you:
- What is the DEBUG output of the SRIOV agent in the first case? When the VM creation fails with the ``VirtualInterfaceCreateException``.
- What is the DEBUG output of the SRIOV agent in the second case? Is there any exception? Is the port bound in Neutron?

Regards.

Revision history for this message
Felipe Figueroa Vergara (felipeafv) wrote :
Download full text (39.9 KiB)

In the first case these are the logs in the SRIOV agent:

2024-07-02 18:39:30.565 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-2a098da0-3f37-463a-ad69-4b1e555fe29a - - - - - -] Agent rpc_loop - iteration:388 started daemon_loop /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:446
2024-07-02 18:39:38.749 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-2dcbd196-a72e-4a0c-9b05-031836bc8ad8 c0ad75b3e80743aaa023cc85cd455d74 5a2793ea4c5a416d9bf954f2d14be095 - - - -] port_update received port_update /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:78
2024-07-02 18:39:38.749 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-2dcbd196-a72e-4a0c-9b05-031836bc8ad8 c0ad75b3e80743aaa023cc85cd455d74 5a2793ea4c5a416d9bf954f2d14be095 - - - -] port_update RPC received for port: 62654989-e8ae-4c59-9cb2-ffd8df55b54e with MAC fa:16:3e:c9:56:27 and PCI slot 0000:1a:02.0 slot port_update /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:95
2024-07-02 18:39:45.032 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-b3b1da38-5f93-4f83-99a0-77bb45815c46 c0ad75b3e80743aaa023cc85cd455d74 5a2793ea4c5a416d9bf954f2d14be095 - - - -] port_update received port_update /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:78
2024-07-02 18:39:45.033 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-b3b1da38-5f93-4f83-99a0-77bb45815c46 c0ad75b3e80743aaa023cc85cd455d74 5a2793ea4c5a416d9bf954f2d14be095 - - - -] port_update RPC received for port: b4279cdb-1789-4f2f-a9c7-c199452e4860 with MAC fa:16:3e:69:74:df and PCI slot 0000:1a:02.1 slot port_update /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:95
2024-07-02 18:39:45.565 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-2a098da0-3f37-463a-ad69-4b1e555fe29a - - - - - -] Agent rpc_loop - iteration:389 started daemon_loop /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:446
2024-07-02 18:39:45.574 1699 DEBUG oslo.privsep.daemon [-] privsep: reply[39522ee0-da5d-4300-8c86-718d30815df3]: (4, {0: {'mac': 'fa:16:3e:c9:56:27', 'link_state': 1, 'max_tx_rate': 0, 'min_tx_rate': 0}, 1: {'mac': 'aa:2c:9d:e7:c6:77', 'link_state': 1, 'max_tx_rate': 0, 'min_tx_rate': 0}, 2: {'mac': 'f2:3c:63:fb:a2:16', 'link_state': 1, 'max_tx_rate': 0, 'min_tx_rate': 0}, 3: {'mac': 'd6:ca:24:13:77:ce', 'link_state': 1, 'max_tx_rate': 0, 'min_tx_rate': 0}, 4: {'mac': 'aa:bc:f7:26:7f:58', 'link_state': 0, 'max_tx_rate': 0, 'min_tx_rate': 0}}) _call_back /var/lib/kolla/venv/lib/python3.10/site-packages/oslo_privsep/daemon.py:501
2024-07-02 18:39:45.577 7 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [None req-2a098da0-3f37-463a-ad69-4b1e555fe29a - - - - - -] Agent loop found changes! {'current':...

Revision history for this message
German E (gespinozat) wrote (last edit ):

Hi, we were able to reproduce this behavior when using the exclude_devices parameter.

------------------------------------------------------------------------

Case 1: Booting servers with SR-IOV interfaces

When attaching an SR-IOV port during server creation, the port binding may refer to a PCI slot that has been defined as excluded (this can be seen in the pci_slot field within the port's binding profile).
In this case, the port state ends up in DOWN, and the server state in ERROR. nova-compute logs:
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

This seems to be working as expected, as it prevents the use of the excluded device.

------------------------------------------------------------------------

Case 2: Attaching SR-IOV interfaces to existing server

When attaching an SR-IOV port to an existing server, the port binding may refer to a PCI slot that has been defined as excluded. However, the port is still successfully attached to the server (as seen at both the server and host OS levels), and no error or warning is displayed when executing the `openstack server add port` command. The port state in Neutron, however, ends up in DOWN.

This, in fact, does not prevent the use of the excluded device.

Revision history for this message
German E (gespinozat) wrote :

Steps to reproduce.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

In both cases (VM creation, port attachment), the port is defined as DOWN because the SR-IOV agent does not define this port as UP. What I was expecting in these logs (c#4) is a message like this:
  LOG.info("No device %s defined on agent.", device_info)

Not having the Neutron API logs reduce the debugging process but I'm very sure that in both cases, the Neutron API doesn't declare the port as UP not activated.

To reproduce this issue I require special HW. I'm waiting for this assignation. But as I commented, I don't think this is a Neutron bug. Because you have this HW, please check that the SR-IOV agent never declares the port as UP.

The PCI assignation is done by Nova. Did you check the Nova "pci.device_spec" config option? You need to provide a list of devices that nova compute (the Nova agent) will use to assign to this port. This list should NOT have this PCI address.

Regards.

Revision history for this message
Felipe Figueroa Vergara (felipeafv) wrote :

Tests were conducted using the pci.device_spec configuration, and indeed, this causes an explicit failure when attempting to attach a port with an excluded device to a server. We believe this behavior should be handled by Neutron because, with the current configuration, it additionally requires explicitly defining the whitelisted devices in Nova. Perhaps the exclude_devices configuration should be the one generating explicit failures in both Nova and Neutron, eliminating the need for redundant configuration.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.