Enhance SR-IOV agent to handle duplicate MAC addresses

Bug #1791159 reported by William Konitzer
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Miguel Lavalle

Bug Description

There are scenarios where a VNF wishes to to have the same MAC address in different VLANs or different provider_network types.

Tested so far:

1. Same MAC address used between 2 different VLAN networks ----> Success
2. Same MAC address used between 2 different SR-IOV networks ----> Success
3. Same MAC address used between 2 different VXLAN networks ----> Success
4. Same MAC address used between SR-IOV and VXLAN networks
 a)created sriov port first then vxlan port -----> Success
 b)created vxlan port first then sriov port ----> this scenario sometimes works and sometimes fails. It seems to be dependent on order of values returned from ports table

Examining the code, the SR-IOV agent requests device info by MAC address which calls into the function "get_port_from_device_mac" (https://github.com/openstack/neutron/blob/87223c10cbad33a3b75f22c20725ef9e01728b57/neutron/plugins/ml2/db.py#L146). This function just returns the last item in the list with the implicit assumption that the MAC address will always be unique and hence there's only one value.

Enhancement is requested to allow SR-IOV agent to request device info by MAC address but filtered on host_id as it knows which host is making the request.

There is still an edge case where this won't work if ports are on the same host - in which case an informative error should be returned.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Hi William,

It seems to be a straightforward improvement. I wonder, though, why case 2 (2 different SR-IOV networks) worked. Isn't this case also subject to the same ambiguity. Any thoughts?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Is it really RFE? For me it sounds like (valid) bug that get_port_from_device_mac always assume that there is only one unique port to return.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Yes, I agree with Slawek,. Let's process this as a normal bug. I'll assign it to myself and propose a fix soon.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Miguel Lavalle (minsel)
Revision history for this message
William Konitzer (wkonitzer) wrote :

Hi, I'm not sure why case 2 works..I didn't spend a lot of time analyzing it as it worked!

I registered it as an enhancement as it seemed the current design was for unique MAC addresses, but I can also see an argument for it being a bug. If you're able to fix it under the normal defect handling process, even better.

I didn't think the fix would be trivial as it looks like we might need to update rpc between neutron server and agents as there is no info about network id and host id in the context where get_port_from_device_mac() is called. Because of this it seemed more like an enhancement than bug.

Obiouvsly edge cases need some careful thought, e.g. how far does it scale? What happens if you live migrate an instance onto a node that already has another instance with the MAC address, etc.

tags: removed: rfe
Revision history for this message
huangshan (huangshan) wrote :

The different networks are isolated from each other, so it's ip address and mac information can be reused.

Revision history for this message
Trygve Vea (trygve-vea-gmail) wrote :

I think we've run into the same bug, however - what we want to achieve differs from the original poster. We want to be able to bond two interfaces on our guests.

So far we have done the following:

- We have two PFs (X722) in our hypervisors - these have been mapped to different physnets.
- We have created two network objects, pointing to different physnets, but the same VLAN id.
- We create one port on each network, and set the same MAC address on both. (We need both to have the same MAC address for bonding to work, and the guest is not allowed to change the mac address of a VF)

If we do NOT set the same mac address on both interfaces, the guest will start up fine. If we do set the same mac address on both interfaces, the guest will throw an error after 300 seconds due to hitting the network-vif-plugged-timeout.

I have concluded that the SR-IOV-agent using the mac address as a key for looking up the port id for which it is to send a network-vif-plugged-notification to nova, is the reason for this.

As a very hacky workaround for this, we can actually get this instance up and running by doing:

- Create the instance

On hypervisor:

- Wait for a paused instance to exist in libvirt
- virsh resume UUID
- Restart openstack-nova-compute (will put the instance in error state, but will not clean it up)

In nova:
- Reset the state of the instance to active

The two VFs are visible inside the instance, and can be successfully bonded.

Revision history for this message
Trygve Vea (trygve-vea-gmail) wrote :

I found a second, less ugly workaround:

1) Create the instance
2) List the ports belonging to the server, and identify the port ID of one or more ports in DOWN state
3) Using the credentials of the neutron user (maybe it will work with any admin user), do

SERVER_ID=<instance_id>
PORT_ID=$(openstack port list --server $SERVER_ID -c ID -c Status | grep DOWN | awk '{ print $2 }')
KS_TOKEN=$(openstack token issue -f value -c ID)

curl -H "x-auth-token: $KS_TOKEN" -H 'Content-Type: application/json' --data "{\"events\": [{\"status\": \"completed\", \"tag\": \"$PORT_ID\", \"name\": \"network-vif-plugged\", \"server_uuid\": \"$SERVER_ID\"}]}" NOVA_ENDPOINT/v2/$OS_PROJECT_ID/os-server-external-events

(This will send the network-vif-plugged-event which nova is waiting for before it will resume the instance. The SR-IOV agent have already successfully configured the VF on the host.)

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Miguel:

I've started coding a patch to solve the related issue. Do you mind if I take the ownership of this bug?

Thank you in advance.

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/780055
Committed: https://opendev.org/openstack/neutron/commit/77ac42d2ee664263dd1dfd391dac4d8e062875e0
Submitter: "Zuul (22348)"
Branch: master

commit 77ac42d2ee664263dd1dfd391dac4d8e062875e0
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Mar 9 17:46:47 2021 +0000

    SR-IOV agent can handle ports with same MAC addresses

    SR-IOV agent can handle ports with same MAC address (located in
    different networks). The agent can retrieve, from the system, the
    MAC address and the PCI slot; because the PCI slot is unique per
    port in the same host, this parameter is used to match with the
    Neutron port ID stored in the database (published via RPC).

    RPC API bumped to version 1.9.

    Closes-Bug: #1791159

    Change-Id: Id8c3e0485bebc55c778ecaadaabca1c28ec56205

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Because the fix proposed [1] implies a change in the RPC version (1.9) between server and agent, I don't recommend to backport it.

[1]https://review.opendev.org/c/openstack/neutron/+/780055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.