db.sqlalchemy.api.Connection.backfill_version_column has race condition

Bug #1715190 reported by Ruby Loo on 2017-09-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Critical
Ruby Loo

Bug Description

db.sqlalchemy.api.Connection.backfill_version_column() has a race condition. It first does a 'read' to get the list of objects that need their version column (if value is None) to be updated. Then it does the update to set the value. The problem is that the update part will set the value without checking that it is still None -- some other DB request may have changed the value between the time of the read and the actual update. We don't want to update it if it is not None.

Discovered by Dmitry Tantsur while reviewing https://review.openstack.org/#/c/497666/1/ironic/db/sqlalchemy/api.py

Ruby Loo (rloo) wrote :

Some background -- this is needed for populating the newly added 'version' column in Pike. It is invoked via the 'ironic-dbsync online_data_migrations' command in Pike (not in master).

Changed in ironic:
assignee: nobody → Ruby Loo (rloo)
status: New → Triaged

Fix proposed to branch: master
Review: https://review.openstack.org/500955

Changed in ironic:
status: Triaged → In Progress
Ruby Loo (rloo) wrote :

I took a look at the versions of objects, between Ocata and Pike. There is only one object that has a different version, the Port object is version 1.6 in Ocata, and 1.7 (added 'physical_network' field) in Pike.

After someone upgrades (rolling or cold) from Ocata to Pike, at some point (before they upgrade to Queens), they will have to run 'ironic-dbsync online_data_migrations' which invokes the backfill_column_version() code (from the Pike release). Without this fix, there is the possibility that *if* one or more ports' physical_network field are being set at 'the same time' as the dbsync call, that the port's version will end up being set to 1.6 (instead of 1.7) and the physical_network field will be set to None.

Since the ironic-dbsync call is being done in the Pike release, this fix needs to be backported to stable/pike.

Running ironic-dbsync online_data_migrations from master (Queens) doesn't make sense; it'll be too late. Because in order to upgrade to master (or Queens), they would have had to run the online_data_migrations cmd from Pike, and if they did that, all the version columns in the DB would already be populated, so there would be no need to run it in Queens/master (which is why we can delete the backfill code in master*)

* assuming we also backport the 'set conductor version' patch to stable/pike too.

Changed in ironic:
importance: High → Critical

Reviewed: https://review.openstack.org/500955
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=d44a210a507c731b66c87cfdf1987716df37d0e3
Submitter: Jenkins
Branch: master

commit d44a210a507c731b66c87cfdf1987716df37d0e3
Author: Ruby Loo <email address hidden>
Date: Tue Sep 5 14:34:46 2017 -0400

    Fix race condition in backfill_version_column()

    Fixes a race condition in backfill_version_column(). It was fetching
    the objects whose version==None. Then it did an update to set the
    value of those object versions. However, it is possible for one of
    the object versionss to have been updated after the fetch but before
    the update operation, in which case the update operation might set
    the version to be an older version.

    Change-Id: I882fdd3e83582a4d7110c68b0d84f243234ea7dd
    Closes-Bug: #1715190

Changed in ironic:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/501816
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=00c26e85ba81fb886b9dc7e9434ba6b40ac2b939
Submitter: Jenkins
Branch: stable/pike

commit 00c26e85ba81fb886b9dc7e9434ba6b40ac2b939
Author: Ruby Loo <email address hidden>
Date: Tue Sep 5 14:34:46 2017 -0400

    Fix race condition in backfill_version_column()

    Fixes a race condition in backfill_version_column(). It was fetching
    the objects whose version==None. Then it did an update to set the
    value of those object versions. However, it is possible for one of
    the object versionss to have been updated after the fetch but before
    the update operation, in which case the update operation might set
    the version to be an older version.

    Change-Id: I882fdd3e83582a4d7110c68b0d84f243234ea7dd
    Closes-Bug: #1715190
    (cherry picked from commit d44a210a507c731b66c87cfdf1987716df37d0e3)

tags: added: in-stable-pike

Reviewed: https://review.openstack.org/501784
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=3e386b4e6f5a855e764122ebbf6da1ac0e30d702
Submitter: Jenkins
Branch: master

commit 3e386b4e6f5a855e764122ebbf6da1ac0e30d702
Author: Ruby Loo <email address hidden>
Date: Thu Sep 7 11:50:24 2017 -0400

    Update upgrade guide to use new pike release

    This adds a note to the upgrade guide for Ocata -> Pike,
    to use the 'ironic-dbsync online_data_migrations'
    command from the latest (9.1.1 or newer) pike releases,
    due to a race-condition bug.

    Change-Id: I4e9bf902c7f7083278831060cde4cd9d3f152e90
    Related-Bug: 1715190

This issue was fixed in the openstack/ironic 9.1.1 release.

This issue was fixed in the openstack/ironic 9.2.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers