Pre-migration memory check- Invalid error message if memory value is 0

Bug #1413119 reported by Ravishankar Patil
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

There is possible correction in the logic and error handling in the pre-migration memory check in case when the memory value is zero '0'.
This check is present in the source code class: nova/conductor/tasks/live_migrate.py

Below is the current code snippet:
        if not mem_inst or avail <= mem_inst:
            instance_uuid = self.instance.uuid
            dest = self.destination
            reason = _("Unable to migrate %(instance_uuid)s to %(dest)s: "
                       "Lack of memory(host:%(avail)s <= "
                       "instance:%(mem_inst)s)")
            raise exception.MigrationPreCheckError(reason=reason % dict(
                    instance_uuid=instance_uuid, dest=dest, avail=avail,
                    mem_inst=mem_inst))

It can be seen that when mem_inst value is 0, the if condition is met and control enters the block resulting in 'lack of memory' error with a message 'avail memory < 0'. This is absurd.

Sample error message when memory is zero:
2014-10-13 19:52:09.441 3907 INFO nova.api.openstack.wsgi [req-8430dd30-4f17-4094-bbec-ec9cf3593c79 502 eec6e74886804b79b78ac4fceed5b685] NV-4EB7C79 HTTP exception thrown:
NV-78D5611 Migration pre-check error: NV-37B7976 Unable to migrate 352122ae-1ca1-43b3-8ba6-709d93fd580c to 9117MMB_100DBCP: Lack of memory(host:65536 <= instance:0)

The trailing end part of the error doesn't make much sense (host:65536 <= instance:0).

Possible correction:
The check should handle the special case of zero memory value in a different way. If memory equals zero, then a different error of exception should be raised about invalid memory.

Sample fix:
if mem_inst <= 0:
   reason = _("Unable to migrate %(instance_uuid)s to %(dest)s: "
              "Invalid value for existing instance memory: %(mem_inst)s.")
   raise exception.MigrationPreCheckError(reason=reason % dict(
               instance_uuid=instance_uuid, dest=dest, mem_inst=mem_inst))
elif avail <= mem_inst:
   reason = _("Unable to migrate %(instance_uuid)s to %(dest)s: "
              "Lack of memory available at destination. (host:%(avail)s <= "
              "instance:%(mem_inst)s.")
   raise exception.MigrationPreCheckError(reason=reason % dict(
               instance_uuid=instance_uuid, dest=dest, avail=avail, mem_inst=mem_inst))

Revision history for this message
Padmakanth (padmakanth-chandrapati) wrote :

Hi Ravishankar Patil,

How could we get that Error Message? I mean what are the commands that I have to use to produce the bug. Could you please provide me a relevant information?

Changed in nova:
assignee: nobody → Padmakanth (padmakanth-chandrapati)
Revision history for this message
Ravishankar Patil (ravispat) wrote :

Hi Padmakanth,

Somehow the virtual instance moved to a corrupt state with memory values zero.
I mean the value of 'memory_mb' was zero.
We are not able to find out why the instance moved to that state with zero memory value.

In this state, when user tried a Live Partition Migration, the pre-migration check which checks whether destination has enough memory (as detailed above) fails.

But this check doesn't handle the special case of memory value zero correctly. So this bug is to correct this so that a new error message with a more apt log can be created in this case.

Revision history for this message
Padmakanth (padmakanth-chandrapati) wrote :

Hi Ravishankar,

I added the code which you suggest to do. But I didn't produce the bug error that you discussed earlier.

Revision history for this message
Ravishankar Patil (ravispat) wrote :

To recreate, you may need to somehow mimic the memory stat (memory_mb property) of the virtual instance to be zero '0'.
And then, try a migration of the same so that the above pre-migration check is hit.

We just want a minor change in the error handling here to be correct in rare case when memory_mb is zero (mem_inst variable in above code)

Revision history for this message
vishal yadav (vishalcdac07) wrote :

> Somehow the virtual instance moved to a corrupt state with memory values zero.

What was the instance state at that time ERROR or ACTIVE?
IMO additionally instance state should be moved to ERROR state as well because there is normal use case where mem_inst can become <= 0.

Revision history for this message
Joe Gordon (jogo) wrote :

It sounds like the deeper issue is how the memory stat going to 0 for the instance, and if it went to ERROR state or not. This should never happen. Marking this bug as invalid and we should instead open a separate bug on how to prevent us from getting stuck in that state. If memory goes to 0 the instance should be in ERROR state.

Changed in nova:
status: New → Invalid
Revision history for this message
Ravishankar Patil (ravispat) wrote :

Agreed that the instance might have gone to ERROR state. Or it seemed to be toggling between ERROR and ACTIVE state (see log below)
But in any case, if this code method is being hit in case memory_mb being zero, then I reckon a minor change should be implemented to handle it.

Please move the bug back to working state.

Log:
2014-10-06 17:19:07.936 3882 WARNING NV-43FB562 Failed to compute_task_migrate_server: NV-78D5611 Migration pre-check error: NV-37B7976 Unable to migrate 352122ae-1ca1-43b3-8ba6-709d93fd580c to 9117MMB_100DBCP: Lack of memory(host:65536 <= instance:0)
2014-10-06 17:19:07.938 3882 WARNING NV-EAF7DD6 Setting instance to ACTIVE state.

Changed in nova:
status: Invalid → New
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Changed in nova:
assignee: Padmakanth (padmakanth-chandrapati) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.