Port status goes BUILD when migrating non-sriov instance in sriov setting.

Bug #2072154 reported by Seyeong Kim
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Status tracked in Epoxy
Antelope
In Progress
Undecided
Seyeong Kim
Bobcat
In Progress
Undecided
Seyeong Kim
Caracal
In Progress
Undecided
Seyeong Kim
Dalmation
Fix Released
Undecided
Unassigned
Epoxy
Fix Released
Undecided
Unassigned
Yoga
In Progress
Undecided
Seyeong Kim
Zed
In Progress
Undecided
Seyeong Kim
neutron
Fix Released
High
Seyeong Kim
neutron (Ubuntu)
Fix Released
Undecided
Seyeong Kim
Jammy
Fix Committed
Undecided
Seyeong Kim
Noble
Fix Committed
Undecided
Seyeong Kim
Oracular
Fix Released
Undecided
Seyeong Kim

Bug Description

[ Impact ]

Port status goes BUILD when migrating non-sriov instance in sriov setting.

[ Test Plan ]

1. Deploy OpenStack using Juju & Charms ( upstream also has the same code )
2. Enable SRIOV
3. create a vm without sriov nic. (test)
4. migrate it to another host
- openstack server migrate --live-migration --os-compute-api-version 2.30 --host node-04.maas test
5. check port status
- https://paste.ubuntu.com/p/RKGnP76MvB/

[ Where problems could occur ]

this patch is related to sriov agent. it adds checking if port is sriov or not. so it could be possible that sriov port can be handled inproperly.

[ Other Info ]

nova-compute has neutron-sriov-nic-agent and neutron-ovn-metadata-agent

So far, I've checked that

ovn_monitor change it to ACTIVE but sriov-nic-agent change it back to BUILD by calling _get_new_status

./plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py
binding_activate
- get_device_details_from_port_id
- get_device_details
- _get_new_status < this makes status BUILD.

so as running order is not fixed, sometimes it goes ACTIVE, sometimes BUILD.

Related branches

Seyeong Kim (seyeongkim)
tags: added: sts
Changed in neutron:
status: New → In Progress
Seyeong Kim (seyeongkim)
Changed in neutron:
assignee: nobody → Seyeong Kim (seyeongkim)
tags: added: ovn sriov-pci-pt
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Thanks for reporting, could you please give some extra details.
I suppose you read the suggested documentation to see the limitation of sriov with OVN (https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html & https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html )

If I understand well you have 2 hosts where sriov is enabled and sriov-agent is running, am I right?

Changed in neutron:
importance: Undecided → High
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

yes

I uploaded (just tried) patch like this

https://review.opendev.org/c/openstack/neutron/+/923467

you could understand current situation better with it.

Thank!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/923467
Committed: https://opendev.org/openstack/neutron/commit/a311606fcdae488e76c29e0e5e4035f8da621a34
Submitter: "Zuul (22348)"
Branch: master

commit a311606fcdae488e76c29e0e5e4035f8da621a34
Author: Seyeong Kim <email address hidden>
Date: Thu Jul 4 06:23:59 2024 +0000

    Checking pci_slot to avoid changing staus to BUILD forever

    Currently when sriov agent is enabled and migrating a non-sriov
    instance, non-sriov port status is frequently set to BUILD
    instead of ACTIVE.
    This is because the 'binding_activate' function in sriov-nic-agent sets it
    BUILD with get_device_details_from_port_id(as it calls _get_new_status).

    This patch checks network_ports in binding_activate and
    skip binding port if it is not sriov port

    Closes-Bug: #2072154
    Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/neutron/+/923719

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/923773

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/923774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/923719
Committed: https://opendev.org/openstack/neutron/commit/59bc8e476f17dee323973a0e6ea4cd4c343f77b6
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit 59bc8e476f17dee323973a0e6ea4cd4c343f77b6
Author: Seyeong Kim <email address hidden>
Date: Thu Jul 4 06:23:59 2024 +0000

    Checking pci_slot to avoid changing staus to BUILD forever

    Currently when sriov agent is enabled and migrating a non-sriov
    instance, non-sriov port status is frequently set to BUILD
    instead of ACTIVE.
    This is because the 'binding_activate' function in sriov-nic-agent sets it
    BUILD with get_device_details_from_port_id(as it calls _get_new_status).

    This patch checks network_ports in binding_activate and
    skip binding port if it is not sriov port

    Closes-Bug: #2072154
    Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
    (cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/923773
Committed: https://opendev.org/openstack/neutron/commit/1da102e4024a1c7398179b29bd63e2ba42b19000
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 1da102e4024a1c7398179b29bd63e2ba42b19000
Author: Seyeong Kim <email address hidden>
Date: Thu Jul 4 06:23:59 2024 +0000

    Checking pci_slot to avoid changing staus to BUILD forever

    Currently when sriov agent is enabled and migrating a non-sriov
    instance, non-sriov port status is frequently set to BUILD
    instead of ACTIVE.
    This is because the 'binding_activate' function in sriov-nic-agent sets it
    BUILD with get_device_details_from_port_id(as it calls _get_new_status).

    This patch checks network_ports in binding_activate and
    skip binding port if it is not sriov port

    Closes-Bug: #2072154
    Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
    (cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/923774
Committed: https://opendev.org/openstack/neutron/commit/27eee0b9e85ec23c54c4c907e2c80fe2d4609221
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 27eee0b9e85ec23c54c4c907e2c80fe2d4609221
Author: Seyeong Kim <email address hidden>
Date: Thu Jul 4 06:23:59 2024 +0000

    Checking pci_slot to avoid changing staus to BUILD forever

    Currently when sriov agent is enabled and migrating a non-sriov
    instance, non-sriov port status is frequently set to BUILD
    instead of ACTIVE.
    This is because the 'binding_activate' function in sriov-nic-agent sets it
    BUILD with get_device_details_from_port_id(as it calls _get_new_status).

    This patch checks network_ports in binding_activate and
    skip binding port if it is not sriov port

    Closes-Bug: #2072154
    Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
    (cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (unmaintained/zed)

Fix proposed to branch: unmaintained/zed
Review: https://review.opendev.org/c/openstack/neutron/+/923838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.1

This issue was fixed in the openstack/neutron 24.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.2.0

This issue was fixed in the openstack/neutron 22.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.2.0

This issue was fixed in the openstack/neutron 23.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (unmaintained/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/923838
Committed: https://opendev.org/openstack/neutron/commit/aa68e695e1b9cca8a45e772edab471cd28212f42
Submitter: "Zuul (22348)"
Branch: unmaintained/zed

commit aa68e695e1b9cca8a45e772edab471cd28212f42
Author: Seyeong Kim <email address hidden>
Date: Thu Jul 4 06:23:59 2024 +0000

    Checking pci_slot to avoid changing staus to BUILD forever

    Currently when sriov agent is enabled and migrating a non-sriov
    instance, non-sriov port status is frequently set to BUILD
    instead of ACTIVE.
    This is because the 'binding_activate' function in sriov-nic-agent sets it
    BUILD with get_device_details_from_port_id(as it calls _get_new_status).

    This patch checks network_ports in binding_activate and
    skip binding port if it is not sriov port

    Closes-Bug: #2072154
    Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
    (cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)
    (cherry picked from commit 27eee0b9e85ec23c54c4c907e2c80fe2d4609221)

tags: added: in-unmaintained-zed
Seyeong Kim (seyeongkim)
description: updated
Seyeong Kim (seyeongkim)
Changed in neutron (Ubuntu Oracular):
status: New → In Progress
assignee: nobody → Seyeong Kim (seyeongkim)
Changed in neutron (Ubuntu Focal):
assignee: nobody → Seyeong Kim (seyeongkim)
Changed in neutron (Ubuntu Jammy):
assignee: nobody → Seyeong Kim (seyeongkim)
Changed in neutron (Ubuntu Noble):
assignee: nobody → Seyeong Kim (seyeongkim)
Seyeong Kim (seyeongkim)
Changed in neutron (Ubuntu Noble):
status: New → In Progress
Changed in neutron (Ubuntu Jammy):
status: New → In Progress
James Page (james-page)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 25.0.0.0rc1

This issue was fixed in the openstack/neutron 25.0.0.0rc1 release candidate.

no longer affects: neutron (Ubuntu Focal)
Changed in neutron (Ubuntu Oracular):
status: In Progress → Fix Released
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Hi Seyeong.
Thank you for the patches.

I have reviewed jammy & noble.

What I have noticed is that in case of noble debian/patches/lp2072154_port_status_antelope.diff is missing the patch headers (e.g. From:, Subject:).

I believe this is important for anybody looking into that piece of code in the future.

Can you please update the noble debdiff to also contain the headers?

Thank you!

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@dgadomski

Thanks!, I uploaded it again.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Seyeong, or anyone else affected,

Accepted neutron into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:24.0.0-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu):
status: In Progress → Fix Released
Changed in neutron (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Seyeong, or anyone else affected,

Accepted neutron into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:20.5.0-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I verified the patch for jammy.

followed steps in description. and confirmed port status is ACTIVE now.

root@node-01:/home/ubuntu# dpkg -l | grep neutron
ii neutron-common 2:20.5.0-0ubuntu2 all Neutron is a virtual network service for Openstack - common
ii neutron-ovn-metadata-agent 2:20.5.0-0ubuntu2 all Neutron is a virtual network service for Openstack - OVN metadata agent
ii neutron-sriov-agent 2:20.5.0-0ubuntu2 all Neutron is a virtual network service for Openstack - SR-IOV agent
ii python3-neutron 2:20.5.0-0ubuntu2 all Neutron is a virtual network service for Openstack - Python library

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I had to use this command to install pkgs from -proposed

sudo apt install neutron-common neutron-ovn-metadata-agent neutron-sriov-agent python3-neutron -t noble-proposed

and confirmed that the port status is now ACTIVE.

root@node-04:/etc/apt# dpkg -l | grep neutron
ii neutron-common 2:24.0.0-0ubuntu2 all Neutron is a virtual network service for Openstack - common
ii neutron-ovn-metadata-agent 2:24.0.0-0ubuntu2 all Neutron is a virtual network service for Openstack - OVN metadata agent
ii neutron-sriov-agent 2:24.0.0-0ubuntu2 all Neutron is a virtual network service for Openstack - SR-IOV agent
ii python3-neutron 2:24.0.0-0ubuntu2 all Neutron is a virtual network service for Openstack - Python library

tags: added: verification-done-noble
removed: verification-needed-noble
tags: added: verification-done
removed: verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.