Late affinity check failre counted as failed build

Bug #1996732 reported by Balazs Gibizer
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Yusuke Okada

Bug Description

The late anti-affinity checks runs in the compute manager to avoid parallel scheduling requests to invalidate the anti-affinity server group policy. When the check fails the instance is re-scheduled. However this failure counted as a real instance boot failure of the compute host[1][2][3] and can lead to de-prioritization of the compute host in the scheduler via BuildFailureWeigher[4]. As the late anti-affinity check is does not indicate any fault of the compute host itself it should not be counted towards the build failure counter.

[1] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2496
[2] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L1808
[3] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2265
[4] https://docs.openstack.org/nova/latest/configuration/config.html#compute.consecutive_build_service_disable_threshold

tags: added: compute scheduler
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this is a bug yes and a trivially fixable one.

we just need to add a new exception that inherits from the existing build failed exception and skip incrementing the fail
count if its for affinity.

setting this to low as the impact is low and this is more an optimisation then a bug.

Changed in nova:
importance: Undecided → Low
status: New → Triaged
Amit Uniyal (auniyal)
Changed in nova:
assignee: nobody → Amit Uniyal (auniyal)
Revision history for this message
Yusuke Okada (yusokada) wrote :

Hi Amit,

How is the bug fix going?
If you are busy to fix this issue, I can propose fix for this issue.
Could you let me know the current status?

Revision history for this message
Amit Uniyal (auniyal) wrote :

Hi yusokada,

sure, please reassign it to yourself.

Changed in nova:
assignee: Amit Uniyal (auniyal) → nobody
Yusuke Okada (yusokada)
Changed in nova:
assignee: nobody → Yusuke Okada (yusokada)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/873216

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/nova/+/885343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/885344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/885345

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/885347

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/885348

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/885349

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/885353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/885355

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/873216
Committed: https://opendev.org/openstack/nova/commit/56d320a203a13f262a2e94e491af222032e453d3
Submitter: "Zuul (22348)"
Branch: master

commit 56d320a203a13f262a2e94e491af222032e453d3
Author: Yusuke Okada <email address hidden>
Date: Wed Feb 8 22:10:31 2023 -0500

    Fix failed count for anti-affinity check

    The late anti-affinity check runs in the compute manager to avoid
    parallel scheduling requests to invalidate the anti-affinity server
    group policy. When the check fails the instance is re-scheduled.
    However this failure counted as a real instance boot failure of the
    compute host and can lead to de-prioritization of the compute host
    in the scheduler via BuildFailureWeigher. As the late anti-affinity
    check does not indicate any fault of the compute host itself it
    should not be counted towards the build failure counter.
    This patch adds new build results to handle this case.

    Closes-Bug: #1996732
    Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718
    Signed-off-by: Yusuke Okada <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/885355
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 28.0.0.0rc1

This issue was fixed in the openstack/nova 28.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/885344
Committed: https://opendev.org/openstack/nova/commit/2f1d65774fbcf5c25c4ba53583b6a802a03f4c4d
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 2f1d65774fbcf5c25c4ba53583b6a802a03f4c4d
Author: Yusuke Okada <email address hidden>
Date: Wed Feb 8 22:10:31 2023 -0500

    Fix failed count for anti-affinity check

    The late anti-affinity check runs in the compute manager to avoid
    parallel scheduling requests to invalidate the anti-affinity server
    group policy. When the check fails the instance is re-scheduled.
    However this failure counted as a real instance boot failure of the
    compute host and can lead to de-prioritization of the compute host
    in the scheduler via BuildFailureWeigher. As the late anti-affinity
    check does not indicate any fault of the compute host itself it
    should not be counted towards the build failure counter.
    This patch adds new build results to handle this case.

    Closes-Bug: #1996732
    Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718
    Signed-off-by: Yusuke Okada <email address hidden>
    (cherry picked from commit 56d320a203a13f262a2e94e491af222032e453d3)
    (cherry picked from commit 1b56714e9119ab4152e6f33985a499b2d83a491b)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/885345
Committed: https://opendev.org/openstack/nova/commit/cd0403dd3b1099bd13da503500a50249db8e49ea
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit cd0403dd3b1099bd13da503500a50249db8e49ea
Author: Yusuke Okada <email address hidden>
Date: Wed Feb 8 22:10:31 2023 -0500

    Fix failed count for anti-affinity check

    The late anti-affinity check runs in the compute manager to avoid
    parallel scheduling requests to invalidate the anti-affinity server
    group policy. When the check fails the instance is re-scheduled.
    However this failure counted as a real instance boot failure of the
    compute host and can lead to de-prioritization of the compute host
    in the scheduler via BuildFailureWeigher. As the late anti-affinity
    check does not indicate any fault of the compute host itself it
    should not be counted towards the build failure counter.
    This patch adds new build results to handle this case.

    Closes-Bug: #1996732
    Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718
    Signed-off-by: Yusuke Okada <email address hidden>
    (cherry picked from commit 56d320a203a13f262a2e94e491af222032e453d3)
    (cherry picked from commit 1b56714e9119ab4152e6f33985a499b2d83a491b)
    (cherry picked from commit 2f1d65774fbcf5c25c4ba53583b6a802a03f4c4d)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.3.0

This issue was fixed in the openstack/nova 25.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.2.0

This issue was fixed in the openstack/nova 27.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.2.1

This issue was fixed in the openstack/nova 26.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/885353
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/885349
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/885348
Reason: stable/wallaby branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/wallaby if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/xena)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/885347
Reason: stable/xena branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/xena if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.