Database transactions can fail with "TypeError: Can't upgrade a READER transaction to a WRITER mid-transaction" because of scatter_gather_cells

Bug #1722404 reported by melanie witt
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
melanie witt
Ocata
Confirmed
Undecided
Unassigned
Pike
Fix Committed
High
Matt Riedemann

Bug Description

I found this while working on a change to improve the usage of the CellDatabases test fixture by defaulting to the 'cell0' database instead of the 'cell1' database.

In the set_target_cell method, we synchronize access to the cached cell
database transaction context manager objects to prevent more than one
thread from using a cell's transaction context manager at the same
time.

In scatter_gather_cells, we're calling target_cell in such a way that
the lock is acquired and released BEFORE the green thread actually
accesses the database in the spawned function. So multiple threads can
access a cell's database transaction context manager and it's possible
for a database transaction to fail with the error:

  TypeError: Can't upgrade a READER transaction to a WRITER
             mid-transaction

because the in-scope transaction context might be in the middle of a
read when a concurrent green thread tries to do a write.

I saw this happen in the test:

nova.tests.unit.compute.test_compute.DisabledInstanceTypesTestCase.test_can_resize_to_visible_instance_type

where a parallel read of instances during a quota check accessed the cell's database transaction context manager at the same time as an instance.save() in the compute/api, causing the instance.save() to fail with "TypeError: Can't upgrade a READER transaction to a WRITER mid-transaction."

Tags: cells
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/510691

Changed in nova:
status: New → In Progress
melanie witt (melwitt)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/510691
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3cc3cc453dc16e22365bea597c1be5bb0be57aeb
Submitter: Jenkins
Branch: master

commit 3cc3cc453dc16e22365bea597c1be5bb0be57aeb
Author: melanie witt <email address hidden>
Date: Mon Oct 9 20:36:59 2017 +0000

    Fix target_cell usage for scatter_gather_cells

    In the set_target_cell method, we synchronize access to the cached cell
    database transaction context manager objects to prevent more than one
    thread from using a cell's transaction context manager at the same
    time.

    In scatter_gather_cells, we're calling target_cell in such a way that
    the lock is acquired and released BEFORE the green thread actually
    accesses the database in the spawned function. So multiple threads can
    access a cell's database transaction context manager and it's possible
    for a database transaction to fail with the error:

      TypeError: Can't upgrade a READER transaction to a WRITER
                 mid-transaction

    because the in-scope transaction context might be in the middle of a
    read when a concurrent green thread tries to do a write.

    Closes-Bug: #1722404

    Change-Id: I07dd4d5aebdc82e343ec2035dc94c744e4754c96

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/511538

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/511651
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=42b70509c6e632b0553e92d8ff04cb4cecaa563c
Submitter: Zuul
Branch: master

commit 42b70509c6e632b0553e92d8ff04cb4cecaa563c
Author: Dan Smith <email address hidden>
Date: Thu Oct 12 16:37:14 2017 -0700

    Regenerate context during targeting

    We are only doing a copy.copy() on context when we do target_cell(),
    which may mean we're sharing more across threads than we intend to.
    This makes that a full regeneration of the context (like we would
    do over RPC). This will necessarily dump the targeting, if already
    set, and any per-thread database context.

    Closes-Bug: #1722404
    Change-Id: Id24dea7465bafc1f6f58c4a121c4ffb35b7a634a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/512456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by melanie witt (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/511538
Reason: Turns out that this didn't really fix anything, so we won't backport it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/512456
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=42e5b7549e8b920e682204f5745454c0680ec0fb
Submitter: Zuul
Branch: stable/pike

commit 42e5b7549e8b920e682204f5745454c0680ec0fb
Author: Dan Smith <email address hidden>
Date: Thu Oct 12 16:37:14 2017 -0700

    Regenerate context during targeting

    We are only doing a copy.copy() on context when we do target_cell(),
    which may mean we're sharing more across threads than we intend to.
    This makes that a full regeneration of the context (like we would
    do over RPC). This will necessarily dump the targeting, if already
    set, and any per-thread database context.

    Closes-Bug: #1722404
    Change-Id: Id24dea7465bafc1f6f58c4a121c4ffb35b7a634a
    (cherry picked from commit 42b70509c6e632b0553e92d8ff04cb4cecaa563c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.2

This issue was fixed in the openstack/nova 16.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/540145

Revision history for this message
Saulo (sauloaugustosilva) wrote :

HI ,

I have been running the nova 16.0.3 and it still show me the same bug .

Revision history for this message
Ionuț Bîru (ionut-3) wrote :

@Saulo, check https://bugs.launchpad.net/nova/+bug/1746509/comments/4

That tip helped me to resolve this issue on Pike.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ocata)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/540145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/653894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/653894
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0ae53f2bf0a753edfacf3ed8b45bcc40b69428d1
Submitter: Zuul
Branch: master

commit 0ae53f2bf0a753edfacf3ed8b45bcc40b69428d1
Author: melanie witt <email address hidden>
Date: Fri Apr 19 00:59:25 2019 +0000

    Revert "Fix target_cell usage for scatter_gather_cells"

    This reverts commit 3cc3cc453dc16e22365bea597c1be5bb0be57aeb.

    We had determined that this commit had not actually fixed any bug or
    issue with:

      TypeError: Can't upgrade a READER transaction to a WRITER
                 mid-transaction

    and the real fix was change Id24dea7465bafc1f6f58c4a121c4ffb35b7a634a.

    This reverts the unnecessary commit in order to simplify the code and
    it's possible that targeting the context before spawning a thread will
    result in decreased cell cache lock contention once the threads are
    spawned.

    Related-Bug: #1722404

     Conflicts:
     nova/context.py

    NOTE(melwitt): The conflict is due to change
    I861b223ee46b0f0a31f646a4b45f8a02410253cf which is new since the
    original commit had landed.

    Change-Id: I3890d14d8679b96cfebcd735f3b03ec808730b5c

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.