[queens-only] race in nova host discovery task

Bug #1862321 reported by Martin Schuppert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Martin Schuppert

Bug Description

TASK [Discovering nova hosts] **************************************************
Thursday 06 February 2020 06:24:10 -0500 (0:00:00.694) 0:00:05.342 *****
ok: [192.168.24.11 -> 192.168.24.11]
ok: [192.168.24.10 -> 192.168.24.11]
ok: [192.168.24.17 -> 192.168.24.11]
ok: [192.168.24.14 -> 192.168.24.11]
ok: [192.168.24.7 -> 192.168.24.11]
ok: [192.168.24.6 -> 192.168.24.11]

It is possible that this gets triggered multiple times on the same host and fails as nova-manage commands are not meant to run in parallel:
TASK [Discovering nova hosts] ******:********************************************
Tuesday 04 February 2020 12:50:12 +0000 (0:00:00.473) 0:00:04.655 ******
fatal: [192.168.24.10 -> 192.168.24.11]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "nova_compute", "nova-manage", "cell_v2", "discover_hosts", "--by-service"], "delta": "0:00:03.084736", "end": "2020-02-04 12:50:15.893532", "msg": "non-zero return code", "rc": 1, "start": "2020-02-04 12:50:12.808796", "stderr": "", "stderr_lines": [], "stdout": "An error has occurred:\
...
DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\\"Duplicate entry \'filo-cpt01.vim.ecocenter.fr\' for key \'uniq_host_mappings0host\'\\") [SQL: u\'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)\'] [parameters: {\'created_at\': datetime.datetime(2020, 2, 4, 12, 50, 15, 751805), \'cell_id\': 5, \'host\': u\'filo-cpt01.vim.ecocenter.fr\', \'updated_at\': None}] (Background on this error at: http://sqlalche.me/e/gkpj)", "stdout_lines": ["An error has occurred:", "Traceback (most recent call last):", " File \\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\", line 1657, in main", " ret = fn(*fn_args, **fn_kwargs)", " File \\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\", line 1323, in discover_hosts", " by_service)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 265, in discover_hosts", " by_service)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 224, in _check_and_create_host_mappings", " status_fn)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 211, in _check_and_create_service_host_mappings", " host_mapping.create()", " File \\"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\\", line 226, in wrapper", " return fn(self, *args, **kwargs)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 114, in create", " db_mapping = self._create_in_db(self._context, changes)", " File \\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\", line 988, in wrapper", " return fn(*args, **kwargs)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 107, in _create_in_db", " return _apply_updates(context, db_mapping, updates)", " File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 33, in _apply_updates", " db_mapping.save(context.session)", " File \\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py\\", line 50, in save", " session.flush()", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2243, in flush", " self._flush(objects)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2369, in _flush", " transaction.rollback(_capture_exception=True)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py\\", line 66, in __exit__", " compat.reraise(exc_type, exc_value, exc_tb)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2333, in _flush", " flush_context.execute()", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\\", line 391, in execute", " rec.execute(self)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\\", line 556, in execute", " uow", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\\", line 181, in save_obj", " mapper, table, insert)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\\", line 866, in _emit_insert_statements", " execute(statement, params)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 948, in execute", " return meth(self, multiparams, params)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py\\", line 269, in _execute_on_connection", " return connection._execute_clauseelement(self, multiparams, params)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1060, in _execute_clauseelement", " compiled_sql, distilled_params", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1200, in _execute_context", " context)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1409, in _handle_dbapi_exception", " util.raise_from_cause(newraise, exc_info)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py\\", line 203, in raise_from_cause", " reraise(type(exception), exception, tb=exc_tb, cause=cause)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1193, in _execute_context", " context)", " File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\\", line 507, in do_execute", " cursor.execute(statement, parameters)", " File \\"/usr/lib/python2.7/site-packages/pymysql/cursors.py\\", line 166, in execute", " result = self._query(query)", " File \\"/usr/lib/python2.7/site-packages/pymysql/cursors.py\\", line 322, in _query", " conn.query(q)", " File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 856, in query", " self._affected_rows = self._read_query_result(unbuffered=unbuffered)", " File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1057, in _read_query_result", " result.read()", " File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1340, in read", " first_packet = self.connection._read_packet()", " File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1014, in _read_packet", " packet.check_error()", " File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 393, in check_error", " err.raise_mysql_exception(self._data)", " File \\"/usr/lib/python2.7/site-packages/pymysql/err.py\\", line 107, in raise_mysql_exception", " raise errorclass(errno, errval)", "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\\"Duplicate entry \'filo-cpt01.vim.ecocenter.fr\' for key \'uniq_host_mappings0host\'\\") [SQL: u\'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)\'] [parameters: {\'created_at\': datetime.datetime(2020, 2, 4, 12, 50, 15, 751805), \'cell_id\': 5, \'host\': u\'filo-cpt01.vim.ecocenter.fr\', \'updated_at\': None}] (Background on this error at: http://sqlalche.me/e/gkpj)"]}
ok: [192.168.24.11 -> 192.168.24.11]
ok: [192.168.24.17 -> 192.168.24.11]
ok: [192.168.24.14 -> 192.168.24.11]
ok: [192.168.24.7 -> 192.168.24.11]
ok: [192.168.24.6 -> 192.168.24.11]

We should just run this task once.

[1] https://github.com/openstack/tripleo-common/blob/stable/queens/playbooks/nova_cellv2_host_discover.yaml#L15

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/706450

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.opendev.org/706450
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=fa7401afcc27c6c725330e7e0669ba2f35abb872
Submitter: Zuul
Branch: stable/queens

commit fa7401afcc27c6c725330e7e0669ba2f35abb872
Author: Martin Schuppert <email address hidden>
Date: Fri Feb 7 10:30:35 2020 +0100

    [Queens-only] make sure cellv2 host discovery task only run once

    The discovery task is correct triggered via delegate only on a single node,
    but all computes delegate the job to this single host. It is possible that
    this rund multiple times in parallel and fails as nova-manage commands are
    not meant to run multiple times at the same time.

    This makes sure the task only gets triggered once.

    Change-Id: I072baf357836c55ad37257158b83f25c16b7f46f
    Closes-Bug: #1862321

tags: added: in-stable-queens
wes hayutin (weshayutin)
Changed in tripleo:
status: New → Fix Released
importance: Undecided → High
milestone: none → ussuri-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common queens-eol

This issue was fixed in the openstack/tripleo-common queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.