race problem when checking expected_task_state in instance updating

Bug #1291246 reported by Alex Xu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Wishlist
Alex Xu

Bug Description

When execute some action on instance, instance will update task state first. For example:

        instance.task_state = task_states.SHELVING
        instance.save(expected_task_state=[None])

When update task_state, it will check the expected_task_state first.
But there is race problem when check the expected task state.

def _instance_update(context, instance_uuid, values, copy_old_instance=False,
                     columns_to_join=None):
    session = get_session()

    if not uuidutils.is_uuid_like(instance_uuid):
        raise exception.InvalidUUID(instance_uuid)

    with session.begin():
        instance_ref = _instance_get_by_uuid(context, instance_uuid,
                                             session=session,
                                             columns_to_join=columns_to_join)
        if "expected_task_state" in values:
            # it is not a db column so always pop out
            expected = values.pop("expected_task_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)

<!-----
# If there are two request concurrence for this transaction, second request already modify the instance's task state, but the first request's still get the old state.
# we need use lock for get instance from db
----!>

            actual_state = instance_ref["task_state"]
            if actual_state not in expected:
                if actual_state == task_states.DELETING:
                    raise exception.UnexpectedDeletingTaskStateError(
                            actual=actual_state, expected=expected)
                else:
                    raise exception.UnexpectedTaskStateError(
                            actual=actual_state, expected=expected)

Tags: db
Alex Xu (xuhj)
Changed in nova:
assignee: nobody → Alex Xu (xuhj)
Tracy Jones (tjones-i)
tags: added: compute
tags: added: api
removed: compute
tags: added: db unified-objects
tags: removed: api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/80142

Changed in nova:
status: New → In Progress
Revision history for this message
Jay Pipes (jaypipes) wrote :

Alex, is this something that you've seen crop up in practice, or is this a theoretical bug?

I ask because there are already a number of locks in place around instance updates, and I'm wondering if adding lockmode=update is overkill.

Revision history for this message
Alex Xu (xuhj) wrote :

Jay, yes, this is found by reading code. But I have tried to add some fake code to produce the race case.

def _instance_update(context, instance_uuid, values, copy_old_instance=False,
                     columns_to_join=None):
    session = get_session()

    if not uuidutils.is_uuid_like(instance_uuid):
        raise exception.InvalidUUID(instance_uuid)

    with session.begin():
        instance_ref = _instance_get_by_uuid(context, instance_uuid,
                                             session=session,
                                             columns_to_join=columns_to_join)
        if "expected_task_state" in values:
            # it is not a db column so always pop out
            expected = values.pop("expected_task_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["task_state"]
            if actual_state not in expected:
                if actual_state == task_states.DELETING:
                    raise exception.UnexpectedDeletingTaskStateError(
                            actual=actual_state, expected=expected)
                else:
                    raise exception.UnexpectedTaskStateError(
                            actual=actual_state, expected=expected)

       # I already removed the original code. I just quick rewrite that fake at this comment. I think you can understand the meaning
       # of that code, and make it work easily when you want to try
        import os.path
        if not os.path.exists('/tmp/test'):
            f = open('/tmp/test', 'w+')
            f.write(' ')
            f.close()
            for i in xrange(0, 9999):
                print 'I am busy'

        if "expected_vm_state" in values:
            expected = values.pop("expected_vm_state")
            if not isinstance(expected, (tuple, list, set)):
                expected = (expected,)
            actual_state = instance_ref["vm_state"]
            if actual_state not in expected:
                raise exception.UnexpectedVMStateError(actual=actual_state,
                                                       expected=expected)

     ..........

When you send the first request, the code get into the 'busy' code. Then you can send second request, you find the second request pass the check, and wait a moment, the first request pass the check also.

I have tried with shelve vm. After two request finished, you will find there are two vm snapshot uploaded to glance. That isn't the behavior we expected.

Revision history for this message
Dan Smith (danms) wrote :

This has been a known race for a long time, and has nothing to do with objects. So, I'm going to remove that tag and also mark it as wishlist. It would be nice if this race didn't exist, but the performance implications of locking or doing something different are large.

tags: removed: unified-objects
Changed in nova:
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.