races in assignment sql backend cause spurious 404s and transient errors while granting roles

Bug #1246489 reported by Peter Feiner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
High
Adam Young

Bug Description

There are numerous schedules that lead to inconsistent data in the various grant tables and role table. These races arise from a lack of synchronization during grant table modification (i.e., during grant record creation + deletion and role deletion) and deletions of records that grant records refer to (i.e., user, group, domain, and project records).

For example, consider a project and a role that are related by a UserProjectGrant, which are concurrently deleted. In the implementation of delete_role, the project ids that are enumerated in one transaction (in keystone.assignment.backends.sql.Assignment.delete_role) are asserted to exist during another transaction (the call to self._get_project in delete_grant in the same class) and are referred to in two more transactions (_get_metadata and _create_metadata or _update_metadata). Suppose that after the enumeration of a project_id, but before the assertion of its existence, a transaction is committed that deletes the project_id. The result will be a 404 (project not found) and the role isn't deleted. Note that a subsequent delete_role API call will remove the Role record without hitting a 404 since the UserProjectGrant record will have been deleted by the delete_project transaction.

Suppose two roles are being granted to the same user on the same project concurrently. Further suppose that prior to these two grants, the user didn't have any roles on that project. In Assignment.create_grant, a UserProjectGrant record is either updated or created anew depending on whether or not a UserProjectGrant record already exists for the (user, project) pair passed in. Since the existence check and the creation of a new record are performed in separate transactions, two (user, project) UserProjectGrant records can be created given the correct interleaving of the concurrent create_grant executions. Having the same user and project values, these two records would violate the primary key constraints on the UserProjectGrant table. Assuming the DBMS enforces the primary key constraints, then one of the create_grant requests will fail.

Another example is described in https://review.openstack.org/#/c/50767/.

Although I haven't been able to come up with an example yet, I suspect that grants or roles might become undeletable because of inconsistent data.

The races pertaining to data owned by the keystone.assignment module (i.e., tenants, roles & grants) can be fixed by judicious use of transactions in keystone.assignment.backends.sql.Assignment. In particular, using a single transaction per API-level operation. The races pertaining to data owned by another module (i.e., groups and users are owned by keystone.identity) can't be fixed with transactions since the other module might be using a different backend (e.g., LDAP). Those races can't be outright eliminated but they can be turned into transient problems by removing existence assertions during deletion operations (such as https://review.openstack.org/#/c/50767/) and adding existence assertions to creation operations.

Running multiple keystone processes, which can be done either via apache or https://review.openstack.org/#/c/42967/, aggravates the likelihood of these races.

Dolph Mathews (dolph)
Changed in keystone:
status: New → Triaged
importance: Undecided → High
Peter Feiner (pete5)
Changed in keystone:
assignee: nobody → Peter Feiner (pete5)
Revision history for this message
Peter Feiner (pete5) wrote :
Peter Feiner (pete5)
Changed in keystone:
status: Triaged → In Progress
Changed in keystone:
assignee: Peter Feiner (pete5) → Adam Young (ayoung)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/56430
Committed: http://github.com/openstack/keystone/commit/8f685962a1d761107653f3a55757b588d0a3a67e
Submitter: Jenkins
Branch: master

commit 8f685962a1d761107653f3a55757b588d0a3a67e
Author: Peter Feiner <email address hidden>
Date: Tue Nov 12 10:57:09 2013 -0500

    One transaction per call to sql assignment backend

    There are numerous schedules that lead to inconsistent data in the
    various grant tables and role table. These races arose from a lack of
    synchronization during grant table modification (i.e., during grant
    record creation + deletion and role deletion) and deletions of records
    that grant records refer to (i.e., user, group, domain, and project
    records). The races manifested as 404 errors in response to API
    requests. This patch adds the necessary synchronization by way of
    transactions.

    Also removed pointless session.flush() calls at the ends of transactions.

    Closes-Bug: #1246489
    Change-Id: I958d48bafc7fcb95f1d9ea71e408b4cefc2469c9

Changed in keystone:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in keystone:
milestone: none → icehouse-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in keystone:
milestone: icehouse-1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.