Bug #1291246 “race problem when checking expected_task_state in ...” : Bugs : OpenStack Compute (nova)

Alex Xu (xuhj) on 2014-03-12

Changed in nova:
assignee:	nobody → Alex Xu (xuhj)

Tracy Jones (tjones-i) on 2014-03-12

tags:	added: compute
tags:	added: api removed: compute
tags:	added: db unified-objects
tags:	removed: api

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-13: Fix proposed to nova (master)

#1

Fix proposed to branch: master
Review: https://review.openstack.org/80142

Changed in nova:
status:	New → In Progress

Revision history for this message

Jay Pipes (jaypipes) wrote on 2014-03-19:

#2

Alex, is this something that you've seen crop up in practice, or is this a theoretical bug?

I ask because there are already a number of locks in place around instance updates, and I'm wondering if adding lockmode=update is overkill.

Revision history for this message

Alex Xu (xuhj) wrote on 2014-03-25:

#3

Jay, yes, this is found by reading code. But I have tried to add some fake code to produce the race case.

def _instance_update(context, instance_uuid, values, copy_old_instance=False,
columns_to_join=None):
session = get_session()

if not uuidutils.is_uuid_like(instance_uuid):
raise exception.InvalidUUID(instance_uuid)

    with session.begin():
        instance_ref = _instance_get_by_uuid(context, instance_uuid,
                                             session=session,
                                             columns_to_join=columns_to_join)
        if "expected_task_state" in values:
            # it is not a db column so always pop out
            expected = values.pop("expected_task_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["task_state"]
            if actual_state not in expected:
                if actual_state == task_states.DELETING:
                    raise exception.UnexpectedDeletingTaskStateError(
                            actual=actual_state, expected=expected)
                else:
                    raise exception.UnexpectedTaskStateError(
                            actual=actual_state, expected=expected)

       # I already removed the original code. I just quick rewrite that fake at this comment. I think you can understand the meaning
       # of that code, and make it work easily when you want to try
        import os.path
        if not os.path.exists('/tmp/test'):
            f = open('/tmp/test', 'w+')
            f.write(' ')
            f.close()
            for i in xrange(0, 9999):
                print 'I am busy'

        if "expected_vm_state" in values:
            expected = values.pop("expected_vm_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["vm_state"]
            if actual_state not in expected:
                raise exception.UnexpectedVMStateError(actual=actual_state,
                                                       expected=expected)

..........

When you send the first request, the code get into the 'busy' code. Then you can send second request, you find the second request pass the check, and wait a moment, the first request pass the check also.

I have tried with shelve vm. After two request finished, you will find there are two vm snapshot uploaded to glance. That isn't the behavior we expected.

Jay, yes, this is found by reading code. But I have tried to add some fake code to produce the race case.

def _instance_update(context, instance_uuid, values, copy_old_instance=False,
                     columns_to_join=None):
    session = get_session()

if not uuidutils.is_uuid_like(instance_uuid):
        raise exception.InvalidUUID(instance_uuid)

with session.begin():
        instance_ref = _instance_get_by_uuid(context, instance_uuid,
                                             session=session,
                                             columns_to_join=columns_to_join)
        if "expected_task_state" in values:
            # it is not a db column so always pop out
            expected = values.pop("expected_task_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["task_state"]
            if actual_state not in expected:
                if actual_state == task_states.DELETING:
                    raise exception.UnexpectedDeletingTaskStateError(
                            actual=actual_state, expected=expected)
                else:
                    raise exception.UnexpectedTaskStateError(
                            actual=actual_state, expected=expected)

# I already removed the original code. I just quick rewrite that fake at this comment. I think you can understand the meaning 
       # of that code, and make it work easily when you want to try
        import os.path
        if not os.path.exists('/tmp/test'):
            f = open('/tmp/test', 'w+')
            f.write(' ')
            f.close()
            for i in xrange(0, 9999):
                print 'I am busy'

if "expected_vm_state" in values:
            expected = values.pop("expected_vm_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["vm_state"]
            if actual_state not in expected:
                raise exception.UnexpectedVMStateError(actual=actual_state,
                                                       expected=expected)

..........

When you send the first request, the code get into the 'busy' code. Then you can send second request, you find the second request pass the check, and wait a moment, the first request pass the check also.

I have tried with shelve vm. After two request finished, you will find there are two vm snapshot uploaded to glance. That isn't the behavior we expected.

Revision history for this message

Dan Smith (danms) wrote on 2014-06-04:

#4

This has been a known race for a long time, and has nothing to do with objects. So, I'm going to remove that tag and also mark it as wishlist. It would be nice if this race didn't exist, but the performance implications of locking or doing something different are large.

tags:	removed: unified-objects
Changed in nova:
importance:	Undecided → Wishlist

OpenStack Compute (nova)

race problem when checking expected_task_state in instance updating

Bug Description

Other bug subscribers

Remote bug watches