Bug #1821373 “Most instance actions can be called concurrently” : Bugs : OpenStack Compute (nova)

Takashi Natsume (natsume-takashi) on 2019-03-26

tags:	added: api
tags:	added: race-condition

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2019-03-26:

#1

I intend to fix this by changing instance.save() such that it will:

1. return the expected InstanceUpdateConflict directly without hitting the db if instance.task_state doesn't match expected_task_state.

2. assert that task state is being changed if expected_task_state is set (this would be a programming error).

3. ditto for vm_state.

I believe unit test coverage in test_instance will be sufficient for this change.

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2019-05-13:

#2

When writing tests for the above I discovered that this is not the issue. Specifically, OVO doesn't check the previous value when setting an attribute, so when we do:

instance.task_state = SHELVING

this always makes the object dirty, regardless of the previous value of task_state. Therefore in the above case we always hit the DB, and this isn't the problem.

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2019-05-13:

#3

The root cause of this issue appears to be in sqlalchemy.api._instance_update. This method modifies the contents of the values dict when called, including popping the value of expected_task_state. If this results in db conflict due a concurrent update and it is retried, the second invocation is called without expected_task_state in values, and therefore it isn't checked.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-13: Fix proposed to nova (master)

#4

Fix proposed to branch: master
Review: https://review.opendev.org/658845

Changed in nova:
assignee:	nobody → Matthew Booth (mbooth-9)
status:	New → In Progress

Sylvain Bauza (sylvain-bauza) on 2019-05-14

Changed in nova:
importance:	Undecided → Low

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-15: Fix proposed to nova (stable/stein)

#5

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/659317

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-15: Fix proposed to nova (stable/rocky)

#6

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/659318

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-15: Fix proposed to nova (stable/queens)

#7

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/659320

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-16: Fix merged to nova (master)

#8

Reviewed: https://review.opendev.org/658845
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aae5c7aa3819ad9161fd2effed3872d540099230
Submitter: Zuul
Branch: master

commit aae5c7aa3819ad9161fd2effed3872d540099230
Author: Matthew Booth <email address hidden>
Date: Mon May 13 16:04:39 2019 +0100

Fix retry of instance_update_and_get_original

    _instance_update modifies its 'values' argument. Consequently if it is
    retried due to an update conflict, the second invocation has the wrong
    arguments.

    A specific issue this causes is that if we called it with
    expected_task_state a concurrent modification to task_state will cause
    us to fail and retry. However, expected_task_state will have been popped
    from values on the first invocation and will not be present for the
    second. Consequently the second invocation will fail to perform the
    task_state check and therefore succeed, resulting in a race.

We rewrite the old race unit test which wasn't testing the correct
thing for 2 reasons:

    1. Due to the bug fixed in this patch, although we were calling
       update_on_match() twice, the second call didn't check the task state.
    2. side_effect=iterable returns function items without executing them,
       but we weren't hitting this due to the bug fixed in this patch.

Closes-Bug: #1821373
Change-Id: I01c63e685113bf30e687ccb14a4d18e344b306f6

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-21: Fix merged to nova (stable/stein)

#9

Reviewed: https://review.opendev.org/659317
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=61fef49b1555c6509398fc47c21319c1b5e50505
Submitter: Zuul
Branch: stable/stein

commit 61fef49b1555c6509398fc47c21319c1b5e50505
Author: Matthew Booth <email address hidden>
Date: Mon May 13 16:04:39 2019 +0100

Fix retry of instance_update_and_get_original

    _instance_update modifies its 'values' argument. Consequently if it is
    retried due to an update conflict, the second invocation has the wrong
    arguments.

    A specific issue this causes is that if we called it with
    expected_task_state a concurrent modification to task_state will cause
    us to fail and retry. However, expected_task_state will have been popped
    from values on the first invocation and will not be present for the
    second. Consequently the second invocation will fail to perform the
    task_state check and therefore succeed, resulting in a race.

We rewrite the old race unit test which wasn't testing the correct
thing for 2 reasons:

    1. Due to the bug fixed in this patch, although we were calling
       update_on_match() twice, the second call didn't check the task state.
    2. side_effect=iterable returns function items without executing them,
       but we weren't hitting this due to the bug fixed in this patch.

    Closes-Bug: #1821373
    Change-Id: I01c63e685113bf30e687ccb14a4d18e344b306f6
    (cherry picked from commit aae5c7aa3819ad9161fd2effed3872d540099230)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-30: Fix merged to nova (stable/rocky)

#10

Reviewed: https://review.opendev.org/659318
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4540cd6ef9731837cf4ed1e56850bcfa3512ae1c
Submitter: Zuul
Branch: stable/rocky

commit 4540cd6ef9731837cf4ed1e56850bcfa3512ae1c
Author: Matthew Booth <email address hidden>
Date: Mon May 13 16:04:39 2019 +0100

Fix retry of instance_update_and_get_original

    _instance_update modifies its 'values' argument. Consequently if it is
    retried due to an update conflict, the second invocation has the wrong
    arguments.

    A specific issue this causes is that if we called it with
    expected_task_state a concurrent modification to task_state will cause
    us to fail and retry. However, expected_task_state will have been popped
    from values on the first invocation and will not be present for the
    second. Consequently the second invocation will fail to perform the
    task_state check and therefore succeed, resulting in a race.

We rewrite the old race unit test which wasn't testing the correct
thing for 2 reasons:

    1. Due to the bug fixed in this patch, although we were calling
       update_on_match() twice, the second call didn't check the task state.
    2. side_effect=iterable returns function items without executing them,
       but we weren't hitting this due to the bug fixed in this patch.

    Closes-Bug: #1821373
    Change-Id: I01c63e685113bf30e687ccb14a4d18e344b306f6
    (cherry picked from commit aae5c7aa3819ad9161fd2effed3872d540099230)
    (cherry picked from commit 61fef49b1555c6509398fc47c21319c1b5e50505)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-06: Fix included in openstack/nova 19.0.1

#11

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Fix included in openstack/nova 18.2.1

#12

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-01: Fix merged to nova (stable/queens)

#13

Reviewed: https://review.opendev.org/659320
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9c2503609bfad0b871d9e87227aa2bb5c137467
Submitter: Zuul
Branch: stable/queens

commit f9c2503609bfad0b871d9e87227aa2bb5c137467
Author: Matthew Booth <email address hidden>
Date: Mon May 13 16:04:39 2019 +0100

Fix retry of instance_update_and_get_original

    _instance_update modifies its 'values' argument. Consequently if it is
    retried due to an update conflict, the second invocation has the wrong
    arguments.

    A specific issue this causes is that if we called it with
    expected_task_state a concurrent modification to task_state will cause
    us to fail and retry. However, expected_task_state will have been popped
    from values on the first invocation and will not be present for the
    second. Consequently the second invocation will fail to perform the
    task_state check and therefore succeed, resulting in a race.

We rewrite the old race unit test which wasn't testing the correct
thing for 2 reasons:

    1. Due to the bug fixed in this patch, although we were calling
       update_on_match() twice, the second call didn't check the task state.
    2. side_effect=iterable returns function items without executing them,
       but we weren't hitting this due to the bug fixed in this patch.

    Closes-Bug: #1821373
    Change-Id: I01c63e685113bf30e687ccb14a4d18e344b306f6
    (cherry picked from commit aae5c7aa3819ad9161fd2effed3872d540099230)
    (cherry picked from commit 61fef49b1555c6509398fc47c21319c1b5e50505)
    (cherry picked from commit 4540cd6ef9731837cf4ed1e56850bcfa3512ae1c)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-10: Fix included in openstack/nova 17.0.11

#14

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-27: Fix included in openstack/nova 20.0.0.0rc1

#15

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

OpenStack Compute (nova)

Most instance actions can be called concurrently

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Low	Matthew Booth
Queens	Fix Committed	Low	Matthew Booth
Rocky	Fix Released	Low	Matthew Booth
Stein	Fix Released	Low	Matthew Booth