live migration of instance should claim resources on target compute node

Bug #1289064 reported by Chris Friesen on 2014-03-06
60
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Artom Lifshitz

Bug Description

I'm looking at the current Icehouse code, but this applies to previous versions as well.

When we create a new instance via _build_instance() or _build_and_run_instance(), in both cases we call instance_claim() to test for resources and reserve them.

During a cold migration we call prep_resize() which calls resize_claim() to reserve resources.

However, when we live-migrate or evacuate an instance we don't do this. As far as I can see the current code will just spawn the new instance but the resource usage won't be updated until the audit runs at some unknown time in the future at which point it will add the new instance to self.tracked_instances and update the resource usage.

This means that until the audit runs the scheduler has a stale view of system resources.

Michael Still (mikal) on 2014-03-07
tags: added: compute
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Rohan (kanaderohan) on 2014-03-07
Changed in nova:
assignee: nobody → Rohan (kanaderohan)
Chris Friesen (cbf123) on 2014-03-11
Changed in nova:
assignee: Rohan (kanaderohan) → Chris Friesen (cbf123)

Fix proposed to branch: master
Review: https://review.openstack.org/79806

Changed in nova:
status: Triaged → In Progress
Sean Dague (sdague) wrote :

The upstream patch is stalled. New owner welcomed.

Changed in nova:
assignee: Chris Friesen (cbf123) → nobody
status: In Progress → Confirmed

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/79806
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Fix proposed to branch: master
Review: https://review.openstack.org/142001

Changed in nova:
assignee: nobody → jichenjc (jichenjc)
status: Confirmed → In Progress

Change abandoned by jichenjc (<email address hidden>) on branch: master
Review: https://review.openstack.org/142001
Reason: wrong direction

Fix proposed to branch: master
Review: https://review.openstack.org/142740

Changed in nova:
assignee: jichenjc (jichenjc) → Alex Xu (xuhj)

Reviewed: https://review.openstack.org/142739
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=737fb8e7a7db775e937fe8b8a5f0ca148e1641be
Submitter: Jenkins
Branch: master

commit 737fb8e7a7db775e937fe8b8a5f0ca148e1641be
Author: jichenjc <email address hidden>
Date: Thu Dec 18 18:23:57 2014 +0800

    Enhance EvacuateHostTestCase test cases

    Currently even if the EvacuateHostTestCase test cases pass,
    there are some error log in the logs, it may lead to
    confusion when debug the problem, and more important,
    it will fail if the 'node' variable is used in the
    compute layer code since the 'node' is None and the
    cases will fail.
    Use stub by purpose because don't want to change current
    test structure.

    2014-12-18 18:20:23,694 ERROR [nova.compute.manager] Failed to get compute_info for fake-mini
    Traceback (most recent call last):
      File "/home/jichen/git/nova/nova/compute/manager.py", line 2797, in rebuild_instance
        compute_node = self._get_compute_info(context, self.host)
      File "/home/jichen/git/nova/nova/compute/manager.py", line 4859, in _get_compute_info
        service = objects.Service.get_by_compute_host(context, host)
      File "/home/jichen/git/nova/nova/objects/base.py", line 156, in wrapper
        result = fn(cls, context, *args, **kwargs)
      File "/home/jichen/git/nova/nova/objects/service.py", line 111, in get_by_compute_host
        db_service = db.service_get_by_compute_host(context, host)
      File "/home/jichen/git/nova/nova/db/api.py", line 131, in service_get_by_compute_host
        use_slave=use_slave)
      File "/home/jichen/git/nova/nova/db/sqlalchemy/api.py", line 127, in wrapper
        return f(*args, **kwargs)
      File "/home/jichen/git/nova/nova/db/sqlalchemy/api.py", line 431, in service_get_by_compute_host
        raise exception.ComputeHostNotFound(host=host)
    ComputeHostNotFound: Compute host fake-mini could not be found.

    Change-Id: I5541fc27afc23346ddcd685667737548b2a813c7
    Partial-Bug: #1289064

Changed in nova:
assignee: Alex Xu (xuhj) → jichenjc (jichenjc)
Bart Wensley (bartwensley) wrote :

It looks to me like the fixes being delivered against this bug are for evacuate - not live migration. The bug is specifically for the live migration case.

Note that as part of the work I am doing to fix bug 1417667, I plan to add resource claims for both evacuate and live migration. We could mark 1289064 as a duplicate of 1417667.

Bart, title itself says only about live migration, but in description you can find some informations about evacute operation too. Also I'm already working on a fix for the issue with live migration.

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/142740
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

tags: added: live-migrate

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/142740
Reason: This patch has been stalled for quite a while, so I am going to abandon it to keep the code review queue sane. Please restore the change when it is ready for review.

Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https://review.openstack.org/142740
Reason: this seems like a duplicate, so abandoning this for now, if thats not true, feel free to bring it back again.

Chris Friesen (cbf123) wrote :

Not a duplicate...as far as I know this is still a problem for live migration though Nikola has done some work for other scenarios.

Paul Murray (pmurray) on 2015-11-06
tags: added: live-migration
removed: live-migrate

Nikola will probably fix this issue so assigning him there - https://review.openstack.org/#/q/topic:bug/1417667,n,z

Changed in nova:
assignee: jichenjc (jichenjc) → Nikola Đipanov (ndipanov)
Changed in nova:
assignee: Nikola Đipanov (ndipanov) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)

Change abandoned by Daniel Berrange (<email address hidden>) on branch: master
Review: https://review.openstack.org/286742
Reason: Abadoning since its obsolet & nikola no longer works on nova

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Stephen Finucane (stephenfinucane)
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Pawel Koniszewski (pawel-koniszewski)
Changed in nova:
assignee: Pawel Koniszewski (pawel-koniszewski) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Pawel Koniszewski (pawel-koniszewski)
Changed in nova:
assignee: Pawel Koniszewski (pawel-koniszewski) → Andrey Volkov (avolkov)
Sean Dague (sdague) wrote :

Automatically discovered version icehouse in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.icehouse

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/244489
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/286744
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: Andrey Volkov (avolkov) → Stephen Finucane (stephenfinucane)
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → sahid (sahid-ferdjaoui)

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.openstack.org/244489
Reason: Safe to say this is dead in the water and should finally be put out of its misery. artom: your turn.

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Artom Lifshitz (notartom)

Reviewed: https://review.openstack.org/611088
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ae2e5650d14a2c81dd397727d67b60f9b8dd0dd7
Submitter: Zuul
Branch: master

commit ae2e5650d14a2c81dd397727d67b60f9b8dd0dd7
Author: Stephen Finucane <email address hidden>
Date: Tue Oct 16 17:41:17 2018 +0100

    Fail to live migration if instance has a NUMA topology

    Live migration is currently totally broken if a NUMA topology is
    present. This affects everything that's been regrettably stuffed in with
    NUMA topology including CPU pinning, hugepage support and emulator
    thread support. Side effects can range from simple unexpected
    performance hits (due to instances running on the same cores) to
    complete failures (due to instance cores or huge pages being mapped to
    CPUs/NUMA nodes that don't exist on the destination host).

    Until such a time as we resolve these issues, we should alert users to
    the fact that such issues exist. A workaround option is provided for
    operators that _really_ need the broken behavior, but it's defaulted to
    False to highlight the brokenness of this feature to unsuspecting
    operators.

    Change-Id: I217fba9138132b107e9d62895d699d238392e761
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1289064

Reviewed: https://review.openstack.org/625880
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=52b89734426253f64b6d4797ba4d849c3020fb52
Submitter: Zuul
Branch: stable/rocky

commit 52b89734426253f64b6d4797ba4d849c3020fb52
Author: Stephen Finucane <email address hidden>
Date: Tue Oct 16 17:41:17 2018 +0100

    Fail to live migration if instance has a NUMA topology

    Live migration is currently totally broken if a NUMA topology is
    present. This affects everything that's been regrettably stuffed in with
    NUMA topology including CPU pinning, hugepage support and emulator
    thread support. Side effects can range from simple unexpected
    performance hits (due to instances running on the same cores) to
    complete failures (due to instance cores or huge pages being mapped to
    CPUs/NUMA nodes that don't exist on the destination host).

    Until such a time as we resolve these issues, we should alert users to
    the fact that such issues exist. A workaround option is provided for
    operators that _really_ need the broken behavior, but it's defaulted to
    False to highlight the brokenness of this feature to unsuspecting
    operators.

    Conflicts:
     nova/conf/workarounds.py
     nova/tests/unit/api/openstack/compute/admin_only_action_common.py
     nova/tests/unit/api/openstack/compute/test_migrate_server.py

    NOTE(stephenfin): Conflicts due to removal of
    'report_ironic_standard_resource_class_inventory' option and addition of
    change Iaea1cb4ed93bb98f451de4f993106d7891ca3682 on master.

    Change-Id: I217fba9138132b107e9d62895d699d238392e761
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1289064
    (cherry picked from commit ae2e5650d14a2c81dd397727d67b60f9b8dd0dd7)

tags: added: in-stable-rocky

Reviewed: https://review.opendev.org/629597
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9999bce00f5bea5f3e90ab9e16625d4237504bcb
Submitter: Zuul
Branch: stable/queens

commit 9999bce00f5bea5f3e90ab9e16625d4237504bcb
Author: Stephen Finucane <email address hidden>
Date: Tue Oct 16 17:41:17 2018 +0100

    Fail to live migration if instance has a NUMA topology

    Live migration is currently totally broken if a NUMA topology is
    present. This affects everything that's been regrettably stuffed in with
    NUMA topology including CPU pinning, hugepage support and emulator
    thread support. Side effects can range from simple unexpected
    performance hits (due to instances running on the same cores) to
    complete failures (due to instance cores or huge pages being mapped to
    CPUs/NUMA nodes that don't exist on the destination host).

    Until such a time as we resolve these issues, we should alert users to
    the fact that such issues exist. A workaround option is provided for
    operators that _really_ need the broken behavior, but it's defaulted to
    False to highlight the brokenness of this feature to unsuspecting
    operators.

    Conflicts:
     nova/conf/workarounds.py
     nova/tests/unit/api/openstack/compute/admin_only_action_common.py
     nova/tests/unit/api/openstack/compute/test_migrate_server.py
     nova/tests/unit/conductor/tasks/test_live_migrate.py

    NOTE(stephenfin): stable/rocky conflicts due to removal of
    'report_ironic_standard_resource_class_inventory' option and addition of
    change Iaea1cb4ed93bb98f451de4f993106d7891ca3682 on master.

    NOTE(stephenfin): stable/queens conflicts due to presence of
    the 'enable_consoleauth' configuration option and change
    I83b473e9ba557545b5c186f979e068e442de2424 (Mox to mock) in stable/rocky.
    A hyperlink is removed from the config option help text as the version
    of 'oslo.config' used here does not parse help text as rST (bug 1755783).

    Change-Id: I217fba9138132b107e9d62895d699d238392e761
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1289064
    (cherry picked from commit ae2e5650d14a2c81dd397727d67b60f9b8dd0dd7)
    (cherry picked from commit 52b89734426253f64b6d4797ba4d849c3020fb52)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints