Bug #1921150 “[QoS min bw] repeated ERROR log: Unable to save re...” : Bugs : neutron

Balazs Gibizer (balazs-gibizer) on 2021-03-24

tags:

added: qos

Balazs Gibizer (balazs-gibizer) on 2021-03-24

description:	updated
summary:	- Repeated ERROR log: Unable to save resource provider ... because: re- - parenting a provider is not currently allowed + [QoS min bw] repeated ERROR log: Unable to save resource provider ... + because: re-parenting a provider is not currently allowed

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-03-24:

#1

This SQL will print all the wrongly parented device RPs.

SELECT *
FROM placement.resource_providers
WHERE
  (name LIKE '%:NIC Switch agent:%' OR
   name LIKE '%:Open vSwitch agent:%') AND
  parent_provider_id=root_provider_id

I don't have enough SQL foo to formulate an UPDATE statement that fixes them. But if somebody can do that then it would be nice to provide that SQL for admins on stable branches having wrongly parented RPs and wanting to fix the tree structure and get rid of the repeated logs and placement load.

Revision history for this message

Bence Romsics (bence-romsics) wrote on 2021-03-24:

#2

https://review.opendev.org/c/openstack/neutron/+/782553

Changed in neutron:
assignee:	nobody → Bence Romsics (bence-romsics)
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-03-25:

#3

sql script that can fix the wrong device RP parents caused by this bug Edit (1.5 KiB, application/x-sql)

OK, I think I managed to create an SQL script that re-parents the deviceRPs to be under the agentRP. Admins can use this script to clean up _after_ the fix for bug 1921150 is applied to neutron.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-02: Fix merged to neutron (master)

#4

Reviewed: https://review.opendev.org/c/openstack/neutron/+/782553
Committed: https://opendev.org/openstack/neutron/commit/7f35e4e857f7c6e83c635125ce9b42df6e10a510
Submitter: "Zuul (22348)"
Branch: master

commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510
Author: Bence Romsics <email address hidden>
Date: Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP

    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:

https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159

This *did not* break the minimum bandwidth aware scheduling.
But still there are multiple problems:

    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.

    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.

    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.

The proper RP structure was and should be the following:

    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.

Unfortunately at the moment the Placement API generically prohibits
update of the parent resource provider id in a PUT request:

https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider

    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.

    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.

    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <email address hidden>
    Related-Bug: #1853840

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/782553
Committed: https://opendev.org/openstack/neutron/commit/7f35e4e857f7c6e83c635125ce9b42df6e10a510
Submitter: "Zuul (22348)"
Branch:    master

commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510
Author: Bence Romsics <bence.romsics@gmail.com>
Date:   Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP
    
    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:
    
    https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159
    
    This *did not* break the minimum bandwidth aware scheduling.
    But still there are multiple problems:
    
    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.
    
    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.
    
    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.
    
    The proper RP structure was and should be the following:
    
    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.
    
    Unfortunately at the moment the Placement API generically prohibits
    update of the parent resource provider id in a PUT request:
    
    https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider
    
    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.
    
    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.
    
    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <balazs.gibizer@est.tech>
    Related-Bug: #1853840

Changed in neutron:
status:	Triaged → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-05: Fix proposed to neutron (stable/wallaby)

#5

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/789674

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-08: Fix merged to neutron (stable/wallaby)

#6

Reviewed: https://review.opendev.org/c/openstack/neutron/+/789674
Committed: https://opendev.org/openstack/neutron/commit/d3be39433cb43bcaceb36a04d2accd6ff9a3aa8b
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d3be39433cb43bcaceb36a04d2accd6ff9a3aa8b
Author: Bence Romsics <email address hidden>
Date: Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP

    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:

https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159

This *did not* break the minimum bandwidth aware scheduling.
But still there are multiple problems:

    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.

    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.

    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.

The proper RP structure was and should be the following:

    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.

Unfortunately at the moment the Placement API generically prohibits
update of the parent resource provider id in a PUT request:

https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider

    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.

    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.

    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <email address hidden>
    Related-Bug: #1853840
    (cherry picked from commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510)

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/789674
Committed: https://opendev.org/openstack/neutron/commit/d3be39433cb43bcaceb36a04d2accd6ff9a3aa8b
Submitter: "Zuul (22348)"
Branch:    stable/wallaby

commit d3be39433cb43bcaceb36a04d2accd6ff9a3aa8b
Author: Bence Romsics <bence.romsics@gmail.com>
Date:   Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP
    
    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:
    
    https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159
    
    This *did not* break the minimum bandwidth aware scheduling.
    But still there are multiple problems:
    
    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.
    
    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.
    
    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.
    
    The proper RP structure was and should be the following:
    
    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.
    
    Unfortunately at the moment the Placement API generically prohibits
    update of the parent resource provider id in a PUT request:
    
    https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider
    
    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.
    
    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.
    
    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <balazs.gibizer@est.tech>
    Related-Bug: #1853840
    (cherry picked from commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510)

tags:

added: in-stable-wallaby

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-10: Fix proposed to neutron (stable/victoria)

#7

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/790270

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-28: Fix merged to neutron (stable/victoria)

#8

Download full text (3.1 KiB)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/790270
Committed: https://opendev.org/openstack/neutron/commit/11904b20ad6ce17904f2a685438d7985e32e2cd7
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 11904b20ad6ce17904f2a685438d7985e32e2cd7
Author: Bence Romsics <email address hidden>
Date: Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP

    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:

https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159

This *did not* break the minimum bandwidth aware scheduling.
But still there are multiple problems:

    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.

    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.

    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.

The proper RP structure was and should be the following:

    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.

Unfortunately at the moment the Placement API generically prohibits
update of the parent resource provider id in a PUT request:

https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider

    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.

    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.

    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <email address hidden>
    Related-Bug: #1853840
    (cherry picked from commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510)
    (cherry picked from commit d3be39433cb43bcaceb36a04d2accd6ff9a...

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/790270
Committed: https://opendev.org/openstack/neutron/commit/11904b20ad6ce17904f2a685438d7985e32e2cd7
Submitter: "Zuul (22348)"
Branch:    stable/victoria

commit 11904b20ad6ce17904f2a685438d7985e32e2cd7
Author: Bence Romsics <bence.romsics@gmail.com>
Date:   Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP
    
    In the fix for #1853840 I made a mistake and since then we created
    the physical NIC resource providers as a child of the hypervisor
    resource provider instead of the agent resource provider. Here:
    
    https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159
    
    This *did not* break the minimum bandwidth aware scheduling.
    But still there are multiple problems:
    
    1) If you created your physical NIC RPs before the fix for #1853840
       but upgraded to after the fix for #1853840, then resource syncs
       will throw an error in neutron-server at each physical NIC RP
       update. That pollutes the logs and wastes some resources since
       the prohibited update will be forever retried.
    
    2) If you created your physical NIC RPs after the fix for #1853840
       then your physical NIC RPs have the wrong parent. Which again
       does not break minimum bandwidth aware scheduling. But it may pose
       problems for later features wanting to build on the originally
       planned RP tree structure.
    
    3) Cleanup of decommissioned RPs is a bit different than expected.
       This cleanup was always left to the admin, so it only affects a
       manual process.
    
    The proper RP structure was and should be the following:
    
    The hypervisor RP(s) must be the root(s).
    As a child of each hypervisor RP, there should be an agent RP.
    The physical NIC RPs should be the children of the agent RPs.
    
    Unfortunately at the moment the Placement API generically prohibits
    update of the parent resource provider id in a PUT request:
    
    https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider
    
    Therefore without a later Placement change we cannot fix the RPs
    already created with the wrong parent. However we can fix the RPs
    to be created later. We do that here. We also fix a bug in the unit
    tests that allowed the wrong parent to pass unnoticed. Plus we
    add an extra log message to direct the user seeing the pollution
    in the logs to the proper bug report.
    
    There may be a follow up patch later, because not all RP re-parenting
    operations are problematic, therefore we are thinking of relaxing
    this blanket prohibition in Placement. When Placement allows updates
    to the parent id we can fix RPs already created with the wrong parent
    too.
    
    Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
    Closes-Bug: #1921150
    Co-Authored-By: "Balazs Gibizer" <balazs.gibizer@est.tech>
    Related-Bug: #1853840
    (cherry picked from commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510)
    (cherry picked from commit d3be39433cb43bcaceb36a04d2accd6ff9a3aa8b)

tags:

added: in-stable-victoria

Bernard Cafarelli (bcafarel) on 2021-06-11

tags:

added: neutron-proactive-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-12: Fix included in openstack/neutron 17.2.0

#9

This issue was fixed in the openstack/neutron 17.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-12: Fix included in openstack/neutron 18.1.0

#10

This issue was fixed in the openstack/neutron 18.1.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-30: Related fix merged to neutron-lib (master)

#11

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/785337
Committed: https://opendev.org/openstack/neutron-lib/commit/270184e936352c07c8325d88584a6d25d0a4c8cc
Submitter: "Zuul (22348)"
Branch: master

commit 270184e936352c07c8325d88584a6d25d0a4c8cc
Author: Bence Romsics <email address hidden>
Date: Wed Apr 7 13:35:18 2021 +0200

Use placement version allowing re-parenting RP update

That is microversion 1.37.

    The next time a placement re-sync is triggered (for example by
    restarting the respective agents) this corrects the parents
    of wrongly created resource providers introduced by bug #1921150.

    Change-Id: I6b54aa9c21bf28de1d451c195e37efde6110258a
    Depends-On: https://review.opendev.org/c/openstack/placement/+/784020
    Related-Bug: #1921150

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-09-15: Fix included in openstack/neutron 19.0.0.0rc1

#12

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

neutron

[QoS min bw] repeated ERROR log: Unable to save resource provider ... because: re-parenting a provider is not currently allowed

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches