Bug #1869050 “migration of anti-affinity server fails due to sta...” : Bugs : OpenStack Compute (nova)

Balazs Gibizer (balazs-gibizer) on 2020-03-25

Changed in nova:
status:	New → Triaged
importance:	Undecided → Low
assignee:	nobody → Balazs Gibizer (balazs-gibizer)
tags:	added: compute scheduler

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-25: Fix proposed to nova (master)

#1

Fix proposed to branch: master
Review: https://review.opendev.org/714997

Changed in nova:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-25:

#2

Fix proposed to branch: master
Review: https://review.opendev.org/714998

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-15: Fix merged to nova (master)

#3

Reviewed: https://review.opendev.org/714997
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b52c483308f32f3744dd8a5df424b9f518c13155
Submitter: Zuul
Branch: master

commit b52c483308f32f3744dd8a5df424b9f518c13155
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:38:14 2020 +0100

Reproduce bug 1869050

This patch adds a functional test that reproduce the bug when stale
scheduler instance info prevents booting server with anti-affinity.

Change-Id: If485330b48ae2671651aafabc93f92a8999f7ca2
Related-Bug: #1869050

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-16:

#4

Reviewed: https://review.opendev.org/714998
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=738110db7492b1360f5f197e8ecafd69a3b141b4
Submitter: Zuul
Branch: master

commit 738110db7492b1360f5f197e8ecafd69a3b141b4
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:48:23 2020 +0100

Update scheduler instance info at confirm resize

    When a resize is confirmed the instance does not belong to the source
    compute any more. In the past the scheduler instance info is only
    updated by the _sync_scheduler_instance_info periodic. This caused that
    server boots with anti-affinity did not consider the source host.
    But now at the end of the confirm_resize call the compute also updates
    the scheduler about the move.

Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8
Closes-Bug: #1869050

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18: Fix proposed to nova (stable/ussuri)

#5

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728781

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18:

#6

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728782

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18: Fix merged to nova (stable/ussuri)

#7

Reviewed: https://review.opendev.org/728781
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=016eeec9841116bbbbc6c3019850c18012e3781a
Submitter: Zuul
Branch: stable/ussuri

commit 016eeec9841116bbbbc6c3019850c18012e3781a
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:38:14 2020 +0100

Reproduce bug 1869050

This patch adds a functional test that reproduce the bug when stale
scheduler instance info prevents booting server with anti-affinity.

    Change-Id: If485330b48ae2671651aafabc93f92a8999f7ca2
    Related-Bug: #1869050
    (cherry picked from commit b52c483308f32f3744dd8a5df424b9f518c13155)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18:

#8

Reviewed: https://review.opendev.org/728782
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e8b3927c92d29c74fd0c79b5a51b7a34e9d66236
Submitter: Zuul
Branch: stable/ussuri

commit e8b3927c92d29c74fd0c79b5a51b7a34e9d66236
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:48:23 2020 +0100

Update scheduler instance info at confirm resize

    When a resize is confirmed the instance does not belong to the source
    compute any more. In the past the scheduler instance info is only
    updated by the _sync_scheduler_instance_info periodic. This caused that
    server boots with anti-affinity did not consider the source host.
    But now at the end of the confirm_resize call the compute also updates
    the scheduler about the move.

    Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8
    Closes-Bug: #1869050
    (cherry picked from commit 738110db7492b1360f5f197e8ecafd69a3b141b4)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-19: Fix proposed to nova (stable/train)

#9

Fix proposed to branch: stable/train
Review: https://review.opendev.org/729162

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-19:

#10

Fix proposed to branch: stable/train
Review: https://review.opendev.org/729163

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-19: Fix merged to nova (stable/train)

#11

Reviewed: https://review.opendev.org/729162
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=66e4d8218133a5e4a68f68a3017446cb585675c4
Submitter: Zuul
Branch: stable/train

commit 66e4d8218133a5e4a68f68a3017446cb585675c4
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:38:14 2020 +0100

Reproduce bug 1869050

This patch adds a functional test that reproduce the bug when stale
scheduler instance info prevents booting server with anti-affinity.

Some adjustment was needed due to I8c96b337f32148f8f5899c9b87af331b1fa41424
is missing from stable/train

    Change-Id: If485330b48ae2671651aafabc93f92a8999f7ca2
    Related-Bug: #1869050
    (cherry picked from commit b52c483308f32f3744dd8a5df424b9f518c13155)
    (cherry picked from commit 016eeec9841116bbbbc6c3019850c18012e3781a)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-19:

#12

Reviewed: https://review.opendev.org/729163
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e34b375a6161b15d92beba64fa281f40634ffeab
Submitter: Zuul
Branch: stable/train

commit e34b375a6161b15d92beba64fa281f40634ffeab
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:48:23 2020 +0100

Update scheduler instance info at confirm resize

    When a resize is confirmed the instance does not belong to the source
    compute any more. In the past the scheduler instance info is only
    updated by the _sync_scheduler_instance_info periodic. This caused that
    server boots with anti-affinity did not consider the source host.
    But now at the end of the confirm_resize call the compute also updates
    the scheduler about the move.

    Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8
    Closes-Bug: #1869050
    (cherry picked from commit 738110db7492b1360f5f197e8ecafd69a3b141b4)
    (cherry picked from commit e8b3927c92d29c74fd0c79b5a51b7a34e9d66236)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20: Fix proposed to nova (stable/stein)

#13

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/729505

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20: Fix proposed to nova (stable/rocky)

#14

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/729527

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20: Change abandoned on nova (stable/stein)

#15

Change abandoned by Qiu Fossen (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/729505

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20: Change abandoned on nova (stable/rocky)

#16

Change abandoned by Qiu Fossen (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/729527

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20: Fix proposed to nova (stable/stein)

#17

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/729530

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-20:

#18

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/729538

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Fix merged to nova (stable/stein)

#19

Reviewed: https://review.opendev.org/729530
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2a15e0096bc87234a930bb75b73d4874f0f7ec87
Submitter: Zuul
Branch: stable/stein

commit 2a15e0096bc87234a930bb75b73d4874f0f7ec87
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:38:14 2020 +0100

Reproduce bug 1869050

This patch adds a functional test that reproduce the bug when stale
scheduler instance info prevents booting server with anti-affinity.

    Change-Id: If485330b48ae2671651aafabc93f92a8999f7ca2
    Related-Bug: #1869050
    (cherry picked from commit b52c483308f32f3744dd8a5df424b9f518c13155)
    (cherry picked from commit 016eeec9841116bbbbc6c3019850c18012e3781a)
    (cherry picked from commit 66e4d8218133a5e4a68f68a3017446cb585675c4)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22:

#20

Reviewed: https://review.opendev.org/729538
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e1116ee3b776ec84e4ce7d6ac9346fa0d43269b5
Submitter: Zuul
Branch: stable/stein

commit e1116ee3b776ec84e4ce7d6ac9346fa0d43269b5
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:48:23 2020 +0100

Update scheduler instance info at confirm resize

    When a resize is confirmed the instance does not belong to the source
    compute any more. In the past the scheduler instance info is only
    updated by the _sync_scheduler_instance_info periodic. This caused that
    server boots with anti-affinity did not consider the source host.
    But now at the end of the confirm_resize call the compute also updates
    the scheduler about the move.

    Conflicts:
          nova/tests/unit/compute/test_compute_mgr.py
          due to Ib50b6b02208f5bd2972de8a6f8f685c19745514c and
          Ia6d8a7909081b0b856bd7e290e234af7e42a2b38 are missing from
          stable/stein

    Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8
    Closes-Bug: #1869050
    (cherry picked from commit 738110db7492b1360f5f197e8ecafd69a3b141b4)
    (cherry picked from commit e8b3927c92d29c74fd0c79b5a51b7a34e9d66236)
    (cherry picked from commit e34b375a6161b15d92beba64fa281f40634ffeab)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Fix proposed to nova (stable/rocky)

#21

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/730343

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22:

#22

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/730344

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-27: Fix merged to nova (stable/rocky)

#23

Reviewed: https://review.opendev.org/730343
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44c47421785ba07d4b238bba06eacc42827c84ab
Submitter: Zuul
Branch: stable/rocky

commit 44c47421785ba07d4b238bba06eacc42827c84ab
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:38:14 2020 +0100

Reproduce bug 1869050

This patch adds a functional test that reproduce the bug when stale
scheduler instance info prevents booting server with anti-affinity.

    Change-Id: If485330b48ae2671651aafabc93f92a8999f7ca2
    Related-Bug: #1869050
    (cherry picked from commit b52c483308f32f3744dd8a5df424b9f518c13155)
    (cherry picked from commit 016eeec9841116bbbbc6c3019850c18012e3781a)
    (cherry picked from commit 66e4d8218133a5e4a68f68a3017446cb585675c4)
    (cherry picked from commit 2a15e0096bc87234a930bb75b73d4874f0f7ec87)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-28:

#24

Reviewed: https://review.opendev.org/730344
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=abe04f048c432fed5726af8244bb055e6e44657e
Submitter: Zuul
Branch: stable/rocky

commit abe04f048c432fed5726af8244bb055e6e44657e
Author: Balazs Gibizer <email address hidden>
Date: Wed Mar 25 17:48:23 2020 +0100

Update scheduler instance info at confirm resize

    When a resize is confirmed the instance does not belong to the source
    compute any more. In the past the scheduler instance info is only
    updated by the _sync_scheduler_instance_info periodic. This caused that
    server boots with anti-affinity did not consider the source host.
    But now at the end of the confirm_resize call the compute also updates
    the scheduler about the move.

    Conflicts:
      nova/compute/manager.py due to
      I933687891abef4878de09481937d576ce5899511 is a stable only patch
      nova/tests/unit/compute/test_compute_mgr.py due to
      35ce77835bb271bad3c18eaf22146edac3a42ea0 is missing from stable/rocky

    Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8
    Closes-Bug: #1869050
    (cherry picked from commit 738110db7492b1360f5f197e8ecafd69a3b141b4)
    (cherry picked from commit e8b3927c92d29c74fd0c79b5a51b7a34e9d66236)
    (cherry picked from commit e34b375a6161b15d92beba64fa281f40634ffeab)
    (cherry picked from commit e1116ee3b776ec84e4ce7d6ac9346fa0d43269b5)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-28: Fix proposed to nova (stable/queens)

#25

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/731563

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-28:

#26

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/731564

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-06-03:

#27

The problem cannot be reproduced on stable/queens. The rocky patch [1] changed the logic of the affinity filter to use the host_state. As the host_state could be stale we have this bug since rocky. But on queens the filter queries the instance.host from the database and that information is up to date after the migration therefore the bug is not reproducible any more.

[1] https://review.opendev.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py@101

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-06-03:

#28

The same is true for stable/pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-03: Change abandoned on nova (stable/queens)

#29

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/731563
Reason: Bug is not valid for stable/queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-03:

#30

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/731564
Reason: Bug is not valid for stable/queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Fix included in openstack/nova rocky-eol

#31

This issue was fixed in the openstack/nova rocky-eol release.

OpenStack Compute (nova)

migration of anti-affinity server fails due to stale scheduler instance info

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Low	Balazs Gibizer
Pike	Invalid	Low	Unassigned
Queens	Invalid	Low	Balazs Gibizer
Rocky	Fix Released	Low	Balazs Gibizer
Stein	Fix Released	Low	Elod Illes
Train	Fix Released	Low	Elod Illes
Ussuri	Fix Released	Low	Balazs Gibizer