Live migration does not update numa hugepages info in xml

Bug #1607996 reported by Tina Kevin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Stephen Finucane

Bug Description

Description
===========
Live migration is not update instance numa hugepages info in xml.
if the numa hugepages info of source host is different from the
numa hugepages info of destation host, then instance in destation
host can not start normally, result in the live-migration is failed.

Steps to reproduce
==================
A chronological list of steps which will bring off the
issue:
* There are two compute nodes(host1 and host2).
  The two hosts have same numa topolopy and all have two numa nodes,
  each numa node has eight cpus.

* I boot two instances(A and B) to the compute nodes, instance A
  is located on host1 and instance B is located on host2.
  The two instances are all dedicated cpu_policy and use hugepages.
  Each instance has eight cpus. Instance A is located on the numa node1
  of host1 and instance B is located on the numa node1 of host2.

* Then I live migrate the instance A, the scheduler selects the numa node2 of host2, but because of the numa hugepages info of xml is not updated, the instance in destation host starts error.

Expected result
===============
The live-migration of the instance is success.

Actual result
=============
The live-migration of the instance is failed and somthing resembling the following error is produced:

    ERROR nova.virt.libvirt.driver [req-6d4ca272-f20e-428e-8ca6-48ea7de57e58 ebe821cc991f4657aa3002054739933c 71acb857b6e34df6bfa2da07b0ce7902 - - -] [instance: a84691da-1831-4825-a933-83c46bb9ba4d] Live Migration failure: hugepages: node 0 not found
    ERROR nova.virt.libvirt.driver [req-6d4ca272-f20e-428e-8ca6-48ea7de57e58 ebe821cc991f4657aa3002054739933c 71acb857b6e34df6bfa2da07b0ce7902 - - -] [instance: a84691da-1831-4825-a933-83c46bb9ba4d] Migration operation has aborted

The reason is that the NUMA hugepages info of XML is not updated:

    <numatune>
      <memory mode='strict' nodeset='0'/>
      <memnode cellid='0' mode='strict' nodeset='0'/>
    </numatune>

In the above, numatune/memnode/nodeset should be updated from the source info to the destination info.

Environment
===========
1. Exact version of OpenStack
 Mitaka

2. Which hypervisor did you use?
 Libvirt + KVM

3. Which networking type did you use?
 Neutron with OpenVSwitch

Tina Kevin (song-ruixia)
tags: added: hugepages live-migration numa
description: updated
description: updated
summary: - Live migraion is not update numa hugepages info in xml
+ Live migration does not update numa hugepages info in xml
Changed in nova:
status: New → Confirmed
Changed in nova:
importance: Undecided → High
Changed in nova:
assignee: nobody → Stephen Finucane (stephenfinucane)
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Could you provide some logs demonstrating the errors that you receive?

Revision history for this message
Tina Kevin (song-ruixia) wrote :

@Stephen Finucane
The nova-compute.log error log:
2016-07-01 19:33:23.548 35512 ERROR nova.virt.libvirt.driver [req-6d4ca272-f20e-428e-8ca6-48ea7de57e58 ebe821cc991f4657aa3002054739933c 71acb857b6e34df6bfa2da07b0ce7902 - - -] [instance: a84691da-1831-4825-a933-83c46bb9ba4d] Live Migration failure: hugepages: node 0 not found
2016-07-01 19:33:23.810 35512 ERROR nova.virt.libvirt.driver [req-6d4ca272-f20e-428e-8ca6-48ea7de57e58 ebe821cc991f4657aa3002054739933c 71acb857b6e34df6bfa2da07b0ce7902 - - -] [instance: a84691da-1831-4825-a933-83c46bb9ba4d] Migration operation has aborted

The reason is that the hugepage info of xml does not updated, lead to libvirt raise the exception.
Below this section of the XML:
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/> nodeset is the source info ,should to update the destination info
  </numatune>

I try to update the nodeset to the destination info, then I successful live migration of the instance.

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → nobody
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

> The reason is that the hugepage info of xml does not updated, lead
> to libvirt raise the exception.
> Below this section of the XML:
> <numatune>
> <memory mode='strict' nodeset='0'/>
> <memnode cellid='0' mode='strict' nodeset='0'/> nodeset is the source info ,should to update > the destination info
> </numatune>

Is this in the instance XML or the host XML?

Stephen

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Tina Kevin (song-ruixia) wrote :

@Stephen Finucane

It is in the instance XML.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

OK, I'll take another shot at it

Changed in nova:
assignee: nobody → Stephen Finucane (stephenfinucane)
status: Confirmed → In Progress
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Are you in a position to experiment? AFAICT, the below patch (including dependencies) should resolve this issue:

    https://review.openstack.org/#/c/286744

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.