Cannot pin/unpin cpus during migration

Bug #1751873 reported by Andrey Epifanov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Critical
MOS Nova

Bug Description

Detailed bug description:
 Some VM after migration moved to ERROR state due failed of confirm_resize ().
 VM actually in running state on dest compute.
Steps to reproduce:
 Migration should take more time than update_resources_interval (needs to be checked)
Reproducibility:
 100%
Workaround:
 Reset state for VM to ACTIVE
Impact:
 Customers VMs often failed after migration
Description of the environment:
- Operation system: Ubuntu 14.04
- Versions of MOS: 9.2

Additional information:
- Suspect that it is the actual issue for master nova as well

I found similar bugs[1][2] in Openstack but fix already exists in the env.

[1]: https://bugs.launchpad.net/nova/+bug/1585214
[2]: https://bugs.launchpad.net/nova/+bug/1545675

Anton Matveev (amatveev)
Changed in mos:
importance: Undecided → Critical
tags: added: sla1
Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
Revision history for this message
Alexander Rubtsov (arubtsov) wrote :
Changed in mos:
status: New → In Progress
Changed in mos:
milestone: none → 9.2-mu-5
milestone: 9.2-mu-5 → 9.x-updates
milestone: 9.x-updates → 9.2-mu-5
Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

Please see latest updates in https://mirantis.jira.com/browse/PROD-15177

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

The customer has confirmed that migration works well after applying the patch.
Please merge the fix to Mitaka branch.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/37963
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 1e9f18dabcbb1cd143966c31bccd2e7586a5cac2
Author: Andrey Volkov <email address hidden>
Date: Thu Mar 22 12:47:40 2018

Skip numa usage update for update_available_resource task

Current thoughts that update_available_resource could be the source
of troubles for CPU pin/unpin. New CPU pinning for migrated instance
could be incompatible with the old compute node. This causes fail
in periodic tasks like update_available_resource as it's check
resources for the old compute node. Effectively this means
that instance numa topology is updated but instance host is not.

This patch skips numa usage update while instance in a migration
state, for other states it's same as it was.

PROD: https://mirantis.jira.com/browse/PROD-15177
Change-Id: I542be52636d6aabc4ef93ed338ca89958ccb4cf2
Closes-Bug: #1751873

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Closing the bug because we have a confirmation (#3) that the path works

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.