Bug #1370390 “Resize instace will not change the NUMA topology o...” : Bugs : OpenStack Compute (nova)

Sean Dague (sdague) on 2014-09-17

Changed in nova:
status:	New → Confirmed
importance:	High → Medium

Tiago Mello (timello) on 2014-09-25

Changed in nova:
assignee:	nobody → Tiago Rodrigues de Mello (timello)

Revision history for this message

Bart Wensley (bartwensley) wrote on 2015-02-04:

#1

This bug essentially means that resize is not usable for any instances that have a NUMA topology. Is anyone working on this?

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2015-02-17:

#2

This is basically the same as https://bugs.launchpad.net/nova/+bug/1417667 but this one is slightly more general, so I will mark the other one as a duplicate of this.

So after investigating this - it seems that there is really not that much work that needs to be done all the information is passed in to the filter, it's just that we mangle the request_spec and filter_properties dicts so much, and the keys are so generic, that it is really difficult to make sense of it without following the code all the way from the API.

Because of this it would probably be good to add a method that basically says - when inside a filter - give me a flavor I should be looking at right now.

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-02-20:

#3

While it's true that this bug would cover the resize case that I mentioned in note #1 of bug #1417667, I think that we still need to keep that bug open for the more general case of live-migration, evacuate, rebuild, etc.

The key different for that bug is that when using dedicated CPUs we need to recalculate which CPUs to use on the destination compute node (and claim those resources) before actually doing the migration/evacuation/rebuild. As it stands, we'll continue to use the originally-specific vcpu/pcpu mapping, even though it may not be valid on the new host.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-23: Fix proposed to nova (master)

#4

Fix proposed to branch: master
Review: https://review.openstack.org/158245

Changed in nova:
assignee:	Tiago Rodrigues de Mello (timello) → Nikola Đipanov (ndipanov)
status:	Confirmed → In Progress

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2015-02-26:

#5

@Chris - well from the POV of the code, fixing this for the general case of CPU pinning is a sub-problem of fixing it for NUMA as such really, since CPU pinning uses the same code paths as NUMA does and relies on the same filter.

Fixing it for live migration with specified host likely requires a different bug anyway - so we might want to open that and leave this one closed?

Revision history for this message

zhangtralon (zhangchunlong1) wrote on 2015-03-02:

#6

This is a big problem, I think that we need to think every features related to NUMA. Now, when using the feature of huge page, I meet the same problem.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-02:

#7

Fix proposed to branch: master
Review: https://review.openstack.org/160484

OpenStack Infra (hudson-openstack) on 2015-04-29

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Ed Leafe (ed-leafe)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-13: Change abandoned on nova (master)

#8

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/160484
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-29:

#9

Change abandoned by Nikola Dipanov (<email address hidden>) on branch: master
Review: https://review.openstack.org/158245

Revision history for this message

Stephen Finucane (stephenfinucane) wrote on 2016-02-23:

#10

Download full text (8.0 KiB)

I undertook some research into this. My findings are below, but tl;dr: it appears that this now works as expected and the bug can be closed.

---

# Problem

There were reports that resizing an instance from a pinned flavor to a unpinned
one not result in the pinning being removed. The opposite is also reportedly
true.

# Steps

## Create the required flavors

    $ openstack flavor create test.unpinned --id 100 --ram 2048 --disk 0 --vcpus 2
    $ openstack flavor create test.pinned --id 101 --ram 2048 --disk 0 --vcpus 2
    $ openstack flavor set test.pinned --property "hw:cpu_policy=dedicated"

# Ensure this is available

    $ openstack flavor list
    +-----+---------------+-------+------+-----------+-------+-----------+
    | ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
    +-----+---------------+-------+------+-----------+-------+-----------+
    | 1 | m1.tiny | 512 | 1 | 0 | 1 | True |
    | 101 | test.unpinned | 2048 | 0 | 0 | 2 | True |
    | 101 | test.pinned | 2048 | 0 | 0 | 2 | True |
    | 2 | m1.small | 2048 | 20 | 0 | 1 | True |
    | 3 | m1.medium | 4096 | 40 | 0 | 2 | True |
    | 4 | m1.large | 8192 | 80 | 0 | 4 | True |
    | 42 | m1.nano | 64 | 0 | 0 | 1 | True |
    | 5 | m1.xlarge | 16384 | 160 | 0 | 8 | True |
    | 84 | m1.micro | 128 | 0 | 0 | 1 | True |
    +-----+---------------+-------+------+-----------+-------+-----------+

# Boot an instance

$ openstack server create --flavor=test.pinned \
--image=cirros-0.3.4-x86_64-uec --wait test1

# Validate that the instance is pinned

    $ openstack server list
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | ID | Name | Status | Networks |
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | ACTIVE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+--------+--------------------------------------------------------+

    $ sudo virsh list
     Id Name State
    ----------------------------------------------------
...

I undertook some research into this. My findings are below, but tl;dr: it appears that this now works as expected and the bug can be closed.

---

# Problem

There were reports that resizing an instance from a pinned flavor to a unpinned
one not result in the pinning being removed. The opposite is also reportedly
true.

# Steps

## Create the required flavors

$ openstack flavor create test.unpinned --id 100 --ram 2048 --disk 0 --vcpus 2
    $ openstack flavor create test.pinned --id 101 --ram 2048 --disk 0 --vcpus 2
    $ openstack flavor set test.pinned --property "hw:cpu_policy=dedicated"

# Ensure this is available

$ openstack flavor list
    +-----+---------------+-------+------+-----------+-------+-----------+
    | ID  | Name          |   RAM | Disk | Ephemeral | VCPUs | Is Public |
    +-----+---------------+-------+------+-----------+-------+-----------+
    | 1   | m1.tiny       |   512 |    1 |         0 |     1 | True      |
    | 101 | test.unpinned |  2048 |    0 |         0 |     2 | True      |
    | 101 | test.pinned   |  2048 |    0 |         0 |     2 | True      |
    | 2   | m1.small      |  2048 |   20 |         0 |     1 | True      |
    | 3   | m1.medium     |  4096 |   40 |         0 |     2 | True      |
    | 4   | m1.large      |  8192 |   80 |         0 |     4 | True      |
    | 42  | m1.nano       |    64 |    0 |         0 |     1 | True      |
    | 5   | m1.xlarge     | 16384 |  160 |         0 |     8 | True      |
    | 84  | m1.micro      |   128 |    0 |         0 |     1 | True      |
    +-----+---------------+-------+------+-----------+-------+-----------+

# Boot an instance

$ openstack server create --flavor=test.pinned \
        --image=cirros-0.3.4-x86_64-uec --wait test1

# Validate that the instance is pinned

$ openstack server list
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | ID                                   | Name  | Status | Networks                                               |
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | ACTIVE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+--------+--------------------------------------------------------+

$ sudo virsh list
     Id    Name                           State
    ----------------------------------------------------
     1     instance-00000001              running

$ sudo virsh dumpxml instance-00000001
    <domain type='kvm' id='1'>
      <name>instance-00000001</name>
      ...
      <vcpu placement='static'>2</vcpu>
      <cputune>
        <shares>2048</shares>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='21'/>
        <emulatorpin cpuset='1,21'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
        <memnode cellid='0' mode='strict' nodeset='0'/>
      </numatune>
      ...
      <cpu>
        <topology sockets='1' cores='1' threads='2'/>
        <numa>
          <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
        </numa>
      </cpu>
      ...
    </domain>

# Resize the instance to the unpinned flavor

$ openstack server resize test1 --flavor test.unpinned --wait
    complete

$ openstack server list
    +--------------------------------------+-------+---------------+--------------------------------------------------------+
    | ID                                   | Name  | Status        | Networks                                               |
    +--------------------------------------+-------+---------------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | VERIFY_RESIZE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+---------------+--------------------------------------------------------+

$ openstack server resize test1 --confirm

# Validate that the instance is no longer pinned

$ openstack server list
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | ID                                   | Name  | Status | Networks                                               |
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | ACTIVE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+--------+--------------------------------------------------------+

$ sudo virsh list
     Id    Name                           State
    ----------------------------------------------------
     2     instance-00000001              running

$ sudo virsh dumpxml instance-00000001
    <domain type='kvm' id='2'>
      <name>instance-00000002</name>
      ...
      <vcpu placement='static'>2</vcpu>
      <cputune>
        <shares>2048</shares>
      </cputune>
      ...
    </domain>

# Resize the instance back to the pinned flavor

$ openstack server resize test1 --flavor test.pinned --wait
    complete

$ openstack server list
    +--------------------------------------+-------+---------------+--------------------------------------------------------+
    | ID                                   | Name  | Status        | Networks                                               |
    +--------------------------------------+-------+---------------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | VERIFY_RESIZE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+---------------+--------------------------------------------------------+

$ openstack server resize test1 --confirm

# Validate that the instance is pinned once more

$ openstack server list
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | ID                                   | Name  | Status | Networks                                               |
    +--------------------------------------+-------+--------+--------------------------------------------------------+
    | 857597cb-266b-4032-8030-e3cc76ebf0e7 | test1 | ACTIVE | private=10.0.0.3, fd2a:ec16:99e1:0:f816:3eff:fe99:df9f |
    +--------------------------------------+-------+--------+--------------------------------------------------------+

$ sudo virsh list
     Id    Name                           State
    ----------------------------------------------------
     3     instance-00000001              running

$ sudo virsh dumpxml instance-00000001
    <domain type='kvm' id='3'>
      <name>instance-00000001</name>
      ...
      <vcpu placement='static'>2</vcpu>
      <cputune>
        <shares>2048</shares>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='21'/>
        <emulatorpin cpuset='1,21'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
        <memnode cellid='0' mode='strict' nodeset='0'/>
      </numatune>
      ...
      <cpu>
        <topology sockets='1' cores='1' threads='2'/>
        <numa>
          <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
        </numa>
      </cpu>
      ...
    </domain>

Changed in nova:
assignee:	Ed Leafe (ed-leafe) → Stephen Finucane (sfinucan)

Stephen Finucane (stephenfinucane) on 2016-02-23

Changed in nova:
status:	In Progress → Invalid

Revision history for this message

Tony Walker (tony-walker-h) wrote on 2016-06-27:

#11

I'm seeing this on Kilo @ 2015.1.0. I have 2 NUMA flavors - one double the size of the other in terms of CPU and memory.
If I boot a new instance of the large type, all is well. If I boot the small, and resize to the large, the cputune section gets the correct shares for the large, but the <vcpupin> entries for the old. To compound the issue, the <numa> section contains the memory size of the smaller flavor resulting in:

qemu-system-x86_64: total memory for NUMA nodes (0x1c00000000) should equal RAM size (0x3800000000)

@sfinucan - what version did you find this fixed on?

Revision history for this message

liuxiuli (liu-lixiu) wrote on 2016-07-06:

#12

@Stephen Finucane - This problem exists in master version. Do you have time to deal with this bug? I wish to see your modification as soon as possible. Thank you.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-08-01:

#13

Change abandoned by Jay Pipes (<email address hidden>) on branch: master
Review: https://review.openstack.org/160484
Reason: The bug appears to now be fixed and Nikola is no longer working on Nova. Abandoning...

OpenStack Compute (nova)

Resize instace will not change the NUMA topology of a running instance to the one from the new flavor

Bug Description

Other bug subscribers

Remote bug watches