RR patch failed when trying to lock a host because of a failed live-migration/stop instance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Heitor Matsui |
Bug Description
Brief Description
-----------------
RR patch could not be removed because of an error when trying to lock a host: Lock of host(s) compute-1 failed because instance(s) tenant2-virtio26 were not migrated or stopped.
Severity
-----------------
Major
Steps to Reproduce
-----------------
Launch 60 VMs on DX+
Apply RR patch using orchestration (Alarm: relaxed, VM: migrate)
Remove RR patch using orchestration (same settings)
Verify that RR patch operation fails as couldn't migrate a VM
Expected Behavior
-----------------
Patches application/removal should be carried out with no problems.
Actual Behavior
-----------------
RR patch removal failed because a host could not be locked as a VM did not migrate/stop.
Reproducibility
-----------------
Seems to be intermittent but migration failures are frequently happening. Developer's patch and RR patch applied successfully, but RR patch removal failed.
System Configuration
-----------------
DX+ workers
Branch/Pull Time/Commit
-----------------
2021-06-09_18-58-11
Last Pass
-----------------
Not applicable
Timestamp/Logs
-----------------
Moment of failure:
log-id = 29
event-id = sw-patch-
event-type = action-event
event-context = admin
importance = high
entity = orchestration=
reason_text = Software patch auto-apply failed, reason = Lock of host(s) compute-1 failed because instance(s) tenant2-virtio26 were not migrated or stopped
additional_text =
timestamp = 2021-08-10 16:53:49.149041
Last successful tenant2-virtio26 migration:
2021-08-10 16:28:27.501378 \N \N 3538 c4cf6f2a-
Test Activity
-----------------
Developer testing
Workaround
-----------------
Not applicable
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
assignee: | nobody → Heitor Matsui (heitormatsui) |
importance: | Undecided → Medium |
tags: | added: stx.7.0 stx.nfv |
Reviewed: https:/ /review. opendev. org/c/starlingx /nfv/+/ 825879 /opendev. org/starlingx/ nfv/commit/ 1c4e0484659f5b9 8c09485e135fd02 386eee2dac
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 1c4e0484659f5b9 8c09485e135fd02 386eee2dac
Author: Heitor Matsui <email address hidden>
Date: Fri Jan 21 17:14:08 2022 -0300
Add migrate steps for hosts without instances
During the patch strategy creation the migrate-instances step
only happens for hosts who have instances running at that moment.
As a consequence, if an instance is migrated, during patching
operation, to a host that didn't have any instances running
previously, the patch operation will fail as it will try to lock
the host directly, without migrating its instances previously.
This issue can happen either during patch application or removal.
This commit changes the patching build strategy adding the instances- from-host step that will be applied to all
migrate-
worker hosts unconditionally (given they are OpenStack compute
nodes), and because the previous step (migrate-instances) was built
for a list of instances, some implementations had to take place to
allow building it for a list of hosts.
Test Plan
PASS: serial patch application runs successfully outside
Openstack context;
PASS: parallel patch application runs successfully outside
Openstack context;
PASS: serial patch application runs successfully with a host
not having instances before patch operation begins and
having an instance migrated to it during patch application;
PASS: parallel patch application runs successfully with a host
not having instances before patch operation begins and
having an instance migrated to it during patch application;
Closes-bug: 1960833 5bc84c78864b118 debc265ceb4
Change-Id: I99675ea0b5d0c7
Signed-off-by: Heitor Matsui <email address hidden>
Co-authored-by: Rafael Falcão <email address hidden>