Placement incomplete consumers online migration fails

Bug #1798163 reported by Mohammed Naser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Matt Riedemann
Rocky
Fix Committed
Critical
Matt Riedemann

Bug Description

When upgrading a cloud from Queens to Rocky, it will likely fail to start up properly with the following conditions (or in this case at least):

- 3 instances from the same resource provider
- 3 resource classes per allocation

The INSERT INTO database query ends up being the following:

INSERT INTO consumers (uuid, project_id, user_id, created_at, generation) \
SELECT allocations.consumer_id, 535, 536, NOW(), 0 \
FROM allocations LEFT OUTER JOIN consumers ON allocations.consumer_id = consumers.uuid \
WHERE allocations.resource_provider_id = 4 AND consumers.id IS NULL

However, the SQL query fails to work because the SELECT returns multiple duplicate consumer IDs, it breaks the allocation API as well, you end up with this traceback:

2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap [<snip>] Placement API unexpected error: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'fbee657b-6a60-4525-b7c8-b070643404ec' for key 'uniq_consumers0uuid'") [SQL: u'INSERT INTO consumers (uuid, project_id, user_id, created_at, generation) SELECT allocations.consumer_id, 535, 536, %s AS anon_1, %s AS anon_2 \nFROM allocations LEFT OUTER JOIN consumers ON allocations.consumer_id = consumers.uuid \nWHERE allocations.resource_provider_id = %s AND consumers.id IS NULL'] [parameters: (datetime.datetime(2018, 10, 16, 16, 43, 57, 583715), 0, 4)] (Background on this error at: http://sqlalche.me/e/gkpj): DBDuplicateEntry: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'fbee657b-6a60-4525-b7c8-b070643404ec' for key 'uniq_consumers0uuid'") [SQL: u'INSERT INTO consumers (uuid, project_id, user_id, created_at, generation) SELECT allocations.consumer_id, 535, 536, %s AS anon_1, %s AS anon_2 \nFROM allocations LEFT OUTER JOIN consumers ON allocations.consumer_id = consumers.uuid \nWHERE allocations.resource_provider_id = %s AND consumers.id IS NULL'] [parameters: (datetime.datetime(2018, 10, 16, 16, 43, 57, 583715), 0, 4)] (Background on this error at: http://sqlalche.me/e/gkpj)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap Traceback (most recent call last):
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/fault_wrap.py", line 40, in __call__
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return self.application(environ, start_response)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/microversion_parse/middleware.py", line 80, in __call__
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap response = req.get_response(self.application)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap application, catch_exc_info=False)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap app_iter = application(self.environ, start_response)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 209, in __call__
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return dispatch(environ, start_response, self._map)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 146, in dispatch
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return handler(environ, start_response)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, **kwargs)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/util.py", line 81, in decorated_function
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return f(req)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handlers/allocation.py", line 250, in list_for_resource_provider
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap allocs = rp_obj.AllocationList.get_all_by_resource_provider(context, rp)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 2063, in get_all_by_resource_provider
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap _create_incomplete_consumers_for_provider(context, rp.id)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 993, in wrapper
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return fn(*args, **kwargs)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 1923, in _create_incomplete_consumers_for_provider
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap res = ctx.session.execute(ins_stmt)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1176, in execute
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap bind, close_with_result=True).execute(clause, params or {})
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return meth(self, multiparams, params)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap return connection._execute_clauseelement(self, multiparams, params)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap compiled_sql, distilled_params
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap context)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap util.raise_from_cause(newraise, exc_info)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap context)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap cursor.execute(statement, parameters)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap self.errorhandler(self, exc, value)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap raise errorclass, errorvalue
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap DBDuplicateEntry: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'fbee657b-6a60-4525-b7c8-b070643404ec' for key 'uniq_consumers0uuid'") [SQL: u'INSERT INTO consumers (uuid, project_id, user_id, created_at, generation) SELECT allocations.consumer_id, 535, 536, %s AS anon_1, %s AS anon_2 \nFROM allocations LEFT OUTER JOIN consumers ON allocations.consumer_id = consumers.uuid \nWHERE allocations.resource_provider_id = %s AND consumers.id IS NULL'] [parameters: (datetime.datetime(2018, 10, 16, 16, 43, 57, 583715), 0, 4)] (Background on this error at: http://sqlalche.me/e/gkpj)
2018-10-16 18:43:57.612 39463 ERROR nova.api.openstack.placement.fault_wrap

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/611113

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/611115

Changed in nova:
assignee: nobody → Mohammed Naser (mnaser)
status: New → In Progress
Changed in nova:
assignee: Mohammed Naser (mnaser) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Critical
assignee: Matt Riedemann (mriedem) → Mohammed Naser (mnaser)
tags: added: placement upgrade
Changed in nova:
assignee: Mohammed Naser (mnaser) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/611113
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=618b47627d8dc4f071032e3e63c530d3fe199b39
Submitter: Zuul
Branch: master

commit 618b47627d8dc4f071032e3e63c530d3fe199b39
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 16 13:47:05 2018 -0400

    Add recreate test for bug 1798163

    Change Icae5038190ab8c7bbdb38d54ae909fcbf9048912 in Rocky
    attempts to online migrate missing consumers table records
    when listing allocations for a given resource provider. The
    problem is when it's doing an insert-from-select, it's not
    handling multiple allocations on the same provider for the
    same consumer, like you'd have with a compute instance that
    has VCPU, MEMORY_MB and DISK_GB allocations against a single
    compute node resource provider. As a result, the insert
    statement has duplicate consumer IDs in it which results in
    a unique constraint violation.

    The existing tests never caught this because they tested with
    3 unique consumers with a single allocation each.

    The functional test added here hits both online data migration
    routines: via the API when listing allocations for a resource
    provider and the direct online data migration CLI.

    Change-Id: Iba56aa6b227b6455d2437e4fabcd296b1b0f06ee
    Related-Bug: #1798163

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/611115
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=730936e535e67127c76d4f27649a16d8cf05efc9
Submitter: Zuul
Branch: master

commit 730936e535e67127c76d4f27649a16d8cf05efc9
Author: Mohammed Naser <email address hidden>
Date: Tue Oct 16 19:54:40 2018 +0200

    Use unique consumer_id when doing online data migration

    If there are multiple consumers having allocations to the same
    resource provider, with different classes, it will attempt
    multiple INSERTs with the same consumer_id which is not allowed
    because of the database constraints.

    This patch adds a simple GROUP BY in order to ensure that the
    database server only provides us with unique values to avoid
    trying to INSERT duplicate values.

    Change-Id: I1acba5e65cd562472f29e354c6077f82844fa87d
    Closes-Bug: #1798163

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611314

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611315

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611314
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=83396b325428332bd51c57f89599cf4cc074c7fb
Submitter: Zuul
Branch: stable/rocky

commit 83396b325428332bd51c57f89599cf4cc074c7fb
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 16 13:47:05 2018 -0400

    Add recreate test for bug 1798163

    Change Icae5038190ab8c7bbdb38d54ae909fcbf9048912 in Rocky
    attempts to online migrate missing consumers table records
    when listing allocations for a given resource provider. The
    problem is when it's doing an insert-from-select, it's not
    handling multiple allocations on the same provider for the
    same consumer, like you'd have with a compute instance that
    has VCPU, MEMORY_MB and DISK_GB allocations against a single
    compute node resource provider. As a result, the insert
    statement has duplicate consumer IDs in it which results in
    a unique constraint violation.

    The existing tests never caught this because they tested with
    3 unique consumers with a single allocation each.

    The functional test added here hits both online data migration
    routines: via the API when listing allocations for a resource
    provider and the direct online data migration CLI.

    Conflicts:
          nova/tests/functional/api/openstack/placement/db/test_consumer.py

    NOTE(mriedem): The conflict was due to not having change
    I7f5f08691ca3f73073c66c29dddb996fb2c2b266 in Rocky.

    Change-Id: Iba56aa6b227b6455d2437e4fabcd296b1b0f06ee
    Related-Bug: #1798163
    (cherry picked from commit 618b47627d8dc4f071032e3e63c530d3fe199b39)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611315
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6bd7e8a958f53e2288b41a27528a008e3cfb2802
Submitter: Zuul
Branch: stable/rocky

commit 6bd7e8a958f53e2288b41a27528a008e3cfb2802
Author: Mohammed Naser <email address hidden>
Date: Tue Oct 16 19:54:40 2018 +0200

    Use unique consumer_id when doing online data migration

    If there are multiple consumers having allocations to the same
    resource provider, with different classes, it will attempt
    multiple INSERTs with the same consumer_id which is not allowed
    because of the database constraints.

    This patch adds a simple GROUP BY in order to ensure that the
    database server only provides us with unique values to avoid
    trying to INSERT duplicate values.

    Conflicts:
          nova/tests/functional/api/openstack/placement/db/test_consumer.py

    NOTE(mriedem): The conflict is due to not having change
    I7f5f08691ca3f73073c66c29dddb996fb2c2b266 in Rocky.

    Change-Id: I1acba5e65cd562472f29e354c6077f82844fa87d
    Closes-Bug: #1798163
    (cherry picked from commit 730936e535e67127c76d4f27649a16d8cf05efc9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.3

This issue was fixed in the openstack/nova 18.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.