Live migration should use the same memory over subscription logic as instance boot

Bug #1214943 reported by XiaoLiang Hu on 2013-08-21
70
This bug affects 14 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Sylvain Bauza

Bug Description

I encounter an issue when live migrate an instance specified the target host, i think the operation will be successes , but it is failed for below reason:

MigrationPreCheckError: Migration pre-check error: Unable to migrate a34f9b88-1e07-4798-af46-ca3b3dbaceda to hchenos2: Lack of memory(host:336 <= instance:512)

  1 . My OpenStack cluster information :

1). There are two compute nodes in my cluster, and i created 4 instance(1vcpu/512Mmemory) on these hosts

-----------
mysql> select hypervisor_hostname,vcpus,vcpus_used,running_vms,memory_mb,memory_mb_used,free_ram_mb,deleted from compute_nodes where deleted=0;
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
| hypervisor_hostname | vcpus | vcpus_used | running_vms | memory_mb | memory_mb_used | free_ram_mb | deleted |
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
| hchenos1.eng.platformlab.ibm.com | 2 | 2 | 2 | 1872 | 1536 | 336 | 0 |
| hchenos2.eng.platformlab.ibm.com | 2 | 2 | 2 | 1872 | 1536 | 336 | 0 |
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
2 rows in set (0.00 sec)

mysql>
------------------------
[root@hchenos ~]# nova list
+--------------------------------------+------+--------+----------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+----------+
| a34f9b88-1e07-4798-af46-ca3b3dbaceda | vm1 | ACTIVE | | >>> on host 'hchenos1'
| f6aaeff9-2220-4693-8e5a-710f4c52b774 | vm2 | ACTIVE | | >>>> on host 'hchenos2'
| bbee57a2-81cd-4933-a943-1c2272f7f550 | vm4 | ACTIVE | | >>>> on host 'hchenos1'
| 74fe26ec-919c-4fa7-890f-f59abe09ef4f | vm5 | ACTIVE | | >>>> on host 'hchenos2'
+--------------------------------------+------+--------+----------+
[root@hchenos ~]#

 2). I also enable the ComputeFilter,RamFilter and CoreFilter in nova.conf, but don't config over commit ratio for both vcpu and memory, so the default ratio will be used.

2. In the above conditions, live migrate instance vm1 to hchenos2 failed:

[root@hchenos ~]# nova live-migration vm1 hchenos2
ERROR: Live migration of instance a34f9b88-1e07-4798-af46-ca3b3dbaceda to host hchenos2 failed (HTTP 400) (Request-ID: req-68244b99-e438-4000-8bdb-cc43b275c018)

 conductor log:
...
ckages/nova/conductor/tasks/live_migrate.py", line 87, in _check_requested_destination\n self._check_destination_has_enough_memory()\n\n File "/usr/lib/python2.6/site-packages/nova/conductor/tasks/live_migrate.py", line 108, in _check_destination_has_enough_memory\n mem_inst=mem_inst))\n\nMigrationPreCheckError: Migration pre-check error: Unable to migrate a34f9b88-1e07-4798-af46-ca3b3dbaceda to hchenos2: Lack of memory(host:336 <= instance:512)\n\n']

I think the reason for above as below:

the free_ram_mb for 'hchenos2 ' is 336M, the request memory is 512M, so the operation is failed.

free_ram_mb = memory_mb (1872) - 512(reserved_host_memory_mb) - 2*512(instance consume) = 336

3. But successfully boot an instance on 'hchenos2'

[root@hchenos ~]# nova boot --image cirros-0.3.0-x86_64 --flavor 1 --availability-zone nova:hchenos2 xhu

[root@hchenos ~]# nova list
+--------------------------------------+------+--------+----------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+----------+
| a34f9b88-1e07-4798-af46-ca3b3dbaceda | vm1 | ACTIVE | |
| f6aaeff9-2220-4693-8e5a-710f4c52b774 | vm2 | ACTIVE | |
| bbee57a2-81cd-4933-a943-1c2272f7f550 | vm4 | ACTIVE | |
| 74fe26ec-919c-4fa7-890f-f59abe09ef4f | vm5 | ACTIVE | |
| 364d1a01-67ed-4966-bbfd-d21b6bc3067c | xhu | ACTIVE | | >>>> is active
+--------------------------------------+------+--------+----------+
[root@hchenos ~]#

mysql> select hypervisor_hostname,vcpus,vcpus_used,running_vms,memory_mb,memory_mb_used,free_ram_mb,deleted from compute_nodes where deleted=0;
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
| hypervisor_hostname | vcpus | vcpus_used | running_vms | memory_mb | memory_mb_used | free_ram_mb | deleted |
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
| hchenos1.eng.platformlab.ibm.com | 2 | 2 | 2 | 1872 | 1536 | 336 | 0 |
| hchenos2.eng.platformlab.ibm.com | 2 | 3 | 3 | 1872 | 2048 | -176 | 0 |
+----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
2 rows in set (0.00 sec)

mysql>

So, I'm very confused for above test result, why boot an instance is OK on 'hchenos2', but live migration an instance to this host failed due to "not enough memory" ?

After carefully go through NOVA source code (live_migrate.py: execute()) , i think below will cause this issue:

1). The function '_check_destination_has_enough_memory' doesn't consider the ram allocation ratio(default value is 1.5) when calculate host free memory('free_ram_mb'), it is inconsistent with 'RamFilter' for memory check when boot instance.

I think the free memory of host 'hchenos2' should be:

free_ram_mb = memory_mb (1872) * ram_allocation_ratio(1.5) - memory_mb_used('1536') = 1272

2) why not check vcpu for live migration target host, only check memory is enough?

live_migrate.py: execute

        self._check_instance_is_running()
        self._check_host_is_up(self.source)

        if not self.destination:
            self.destination = self._find_destination()
        else:
            self._check_requested_destination() >>>>

    def _check_requested_destination(self):
        self._check_destination_is_not_source()
        self._check_host_is_up(self.destination)
        self._check_destination_has_enough_memory() >>>> Only check memory, why not check vcpu together?
        self._check_compatible_with_source_hypervisor(self.destination)
        self._call_livem_checks_on_host(self.destination)

3) The VM status need to be considering as well, for example, if the instance is off, it doesn't consume compute node resource anymore on KVM platform(is different form IBM PowerVM), but in resource_tracker.py:_update_usage_from_instances() , only instance 'deleted' flag
is taken into account when calculate resource usage.

XiaoLiang Hu (xlhuxa) on 2013-08-21
summary:
Jake Liu (jake-liu) wrote :

There is no need to check memory for live migration, since it live migration failed, nova compute will rollback the live migration operation.

Also some hypervisors does support resource overcommit natively such as KVM and VMWare, we can remove the memory checking and let nova compute handle this case.

Changed in nova:
assignee: nobody → Jake Liu (jake-liu)
XiaoLiang Hu (xlhuxa) wrote :

I am agree with you, remove memory check from nova-conductor, thanks.

I don't think removing the check is appropriate; live migration and scheduler decisions should be in sync with each other, and this makes them be out of sync : a cloud configured for no over-commit could end up overcommitted this way, couldn't it?

Thanks all for the input here.

Robert, I think that we should only do the checking in nova scheduler but not in both scheduler and conductor as it might make the code difficult to maintain (we need to check if RamFilter was enabled in nova conductor and then get ram allocation ratio in conductor).

It is better that we keep the logic same as Grizzly, do not check memory in conductor but leave it to nova compute.

Jian Wen (wenjianhn) wrote :

Why not migrate the instance to another host?
I don't think migrating/booting an instance to/on a host with high memory usage is a good practice.

The cost of doing live migration first and rolling back live migration later is high.

I think we need to keep the check.

Lingxian Kong (kong) wrote :

why not delete the 'host' parameter once for all? I think user always do not know anything about host information below, why not leave the decision to the system itself, then I think you will success more times.

Hans Lindgren (hanlind) wrote :

The same issue existed in Grizzly and was fixed to align with the scheduler behavior here: https://review.openstack.org/#/c/19369/
It probably broke when the migration path was moved from the scheduler to the conductor.

Jake Liu (jake-liu) wrote :

@Lingxian, yes, we are using openstack nova scheduler to select the host for migration. But the problem here is that the host selected by scheduler does not meet the requirement of memory checking in conductor. So my fix was remove the checking but let leave the decision to hypervisor itself as nova scheduler already selected the best host for such case.

@Hans, just review the code https://review.openstack.org/#/c/19369/ and noticed that live_migrate.py was using same logic as your fix. But I think that your fix might still not resolve this issue.

def _check_destination_has_enough_memory(self):
        avail = self._get_compute_info(self.destination)['free_ram_mb'] << This is get from DB, free_ram_mb = memory_mb - reserved_host_memory_mb - n*instance consume, the avail also did not consinder ram allocation ratio if RamFilter enabled.
        mem_inst = self.instance.memory_mb

        if not mem_inst or avail <= mem_inst:
            instance_uuid = self.instance.uuid
            dest = self.destination
            reason = _("Unable to migrate %(instance_uuid)s to %(dest)s: "
                       "Lack of memory(host:%(avail)s <= "
                       "instance:%(mem_inst)s)")
            raise exception.MigrationPreCheckError(reason=reason % dict(
                    instance_uuid=instance_uuid, dest=dest, avail=avail,
                    mem_inst=mem_inst))

Changed in nova:
status: New → In Progress
Michael Still (mikal) on 2014-03-13
summary: - The destination host check for live migration is not correct
+ Live migration should use the same memory over subscription logic as
+ instance boot
Changed in nova:
status: In Progress → Triaged
importance: Undecided → Medium
Andrew Bogott (andrewbogott) wrote :

This bug means that in order to support evacuation from a given (possible overprovisioned) host I need to keep TWO empty hosts at the ready. Expensive!

Jon Proulx (jproulx) wrote :

I can't quite believe that this bug is still open and only triaged as medium certainly seems 'high' to me.

My users typically demand more memory than they actually use so while I may be allocating at 1.5:1 or even 2:1 actual utilization is usually more like 25% or 50% so if my cluster is evenly loaded and all systems are at 1:1 RAM allocation I still have lots of head room, but cannot do any migrations.

I also want to keep the host option for testing purposes though agree best practice is to let the scheduler schedule.

Adrian Gherasim (gherasim-a) wrote :

I don't understand, the bug Status is Abandoned, what it mean?
I have the same problem.

Adrian

This is painful bug. We add these lines to nova-compute.conf on each compute node in order to allow migrations:

reserved_host_disk_mb=-2097152
reserved_host_memory_mb=-32768

So we are cheating the scheduler telling them we have -2TB of storage and -32GB of RAM. Then we can migrate more VMs to each node.

It's dangerous if you don't monitor RAM and storage in your compute nodes...

And yes, I can't understand how this bug can still be open after more than 2 years...

Changed in nova:
assignee: Jake Liu (jake-liu) → nobody
Sylvain Bauza (sylvain-bauza) wrote :

To be clear, the bug report states that it fails when a destination host is failed. That's due to the fact that when specifying a destination host, it totally bypasses the scheduler logic and directly talks to the nova-compute service.

Unfortunately, overcommit ratios are at the moment scheduler config flags which are defined and used by RAMFilter.

Conceptually, it would make sense to consider those flags as n-cpu ones so that it would be specified per compute host.
Following that path, it would imply that compute hosts report their allocation ratios to the scheduler as a resource item (so the scheduler can still scheduler based on that logic).
Computes then could check instance claims based on those ratios and trigger an Exception if either the scheduler was wrong (potentially due to a race condition or whatever else) or if the operator was wrong when explicitely giving a destination host in the request.

That's heavily tied to an approved spec https://review.openstack.org/#/c/98664/ but I would prefer a approach where allocation ratios are a compute metric, not something that the scheduler can just import_opt.

tags: added: compute
Sylvain Bauza (sylvain-bauza) wrote :

By thinking of it, until we fix the allocation ratio logic to be moved to the computes, an easy workaround would be to validate the destination host by calling the scheduler even if the destination host is specified.
Provided that the destination host would be in the result set, we could consider that the host is valid.

Changed in nova:
assignee: nobody → Sylvain Bauza (sylvain-bauza)
tags: added: live-migrate
removed: live migration
Sean Dague (sdague) on 2015-03-30
Changed in nova:
status: Triaged → Confirmed
importance: Medium → High
Marian Krcmarik (mkrcmari) wrote :

I can still observing the bug and took me some time to figure out that the logic of migration when destination host is specified and when is not is different. is it so difficult to take the logic from host_passes() method in ram_filter.py (scheduler) and place it to _check_destination_is_not_source() method from live_migrate.py in nova compute?

Marian, blueprint that will move allocation ratios from scheduler to resource tracker has been accepted. Implementation of this blueprint should finally fix memory oversubscription problem. Please refer to https://review.openstack.org/#/c/173252/

Changed in nova:
status: Confirmed → In Progress
Paul Murray (pmurray) on 2015-11-06
tags: added: live-migration
removed: live-migrate

Reviewed: https://review.openstack.org/180151
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f5e35dcfe8ee586106438bcfa551426babc75bf6
Submitter: Jenkins
Branch: master

commit f5e35dcfe8ee586106438bcfa551426babc75bf6
Author: Sylvain Bauza <email address hidden>
Date: Thu Sep 17 17:51:06 2015 +0200

    Correct memory validation for live migration

    Since live migration has been moved to the conductor, there was no
    possibility for the conductor to verify if the destination had
    enough RAM just because it didn't know the allocation ratios given
    by the scheduler.

    Now that ComputeNodes provide a ram_allocation_field, we can fix
    that check and provide the same validation than RAMFilter to make
    sure that the destination is good.

    Closes-Bug: #1451831
    Closes-Bug: #1214943

    Change-Id: Ie6c768fc915553da73160ea51961078bfbacec77

Changed in nova:
status: In Progress → Fix Committed

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released

Has there been any discussion about solving this same problem for disk? I have disk_allocation_ratio set to >1 in my nova configuration, but this same type of issue occurs when trying a live migration. There is plenty of space on the actual disk where the instance data is stored, but it wont let me migrate because disk_available_least is far below that:

+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 1 |
| current_workload | 0 |
| disk_available_least | 12 |
| free_disk_gb | 57 |
| free_ram_mb | 15612 |
| local_gb | 399 |
| local_gb_used | 342 | <- also even this isnt right because according to df i have 193G used.
| memory_mb | 64252 |
| memory_mb_used | 48640 |
| running_vms | 14 |
| vcpus | 24 |
| vcpus_used | 25 |

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers