DB migration is interrupted and next execution will fail

Bug #1932373 reported by uchenily
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Low
Unassigned

Bug Description

Sometimes, alembic migration is interrupted in the middle, the db table structure has changed, but alembic_version table version number in the table has not been updated, as a result, the next database migration will fail.

DB: mariadb

MariaDB [chen_test]> select * from alembic_version;
+--------------+
| version_num |
+--------------+
| c613d0b82681 |
+--------------+

(venv) root@dev1:~/workspace/neutron/neutron/db/migration# neutron-db-manage upgrade +1
DEBUG [oslo_concurrency.lockutils] Lock "context-manager" acquired by "neutron_lib.db.api._create_context_manager" :: waited 0.000s
DEBUG [oslo_concurrency.lockutils] Lock "context-manager" released by "neutron_lib.db.api._create_context_manager" :: held 0.000s
DEBUG [neutron_lib.callbacks.manager] Subscribe: <bound method RbacNeutronDbObjectMixin.validate_rbac_policy_change of <class 'neutron.objects.network.Network'>> rbac-policy before_create 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <bound method RbacNeutronDbObjectMixin.validate_rbac_policy_change of <class 'neutron.objects.network.Network'>> rbac-policy before_update 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <bound method RbacNeutronDbObjectMixin.validate_rbac_policy_change of <class 'neutron.objects.network.Network'>> rbac-policy before_delete 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <function _update_segment_host_mapping_for_agent at 0x7f3c97f9ea60> agent after_create 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <function _update_segment_host_mapping_for_agent at 0x7f3c97f9ea60> agent after_update 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <function _add_segment_host_mapping_for_segment at 0x7f3c97f9eaf0> segment precommit_create 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <function _delete_segments_for_network at 0x7f3c97f9eb80> network precommit_delete 55550000
DEBUG [neutron_lib.callbacks.manager] Subscribe: <function _prevent_segment_delete_with_port_bound at 0x7f3c97fa4d30> segment before_delete 55550000
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade c613d0b82681 -> c3e9d13c4367
Traceback (most recent call last):
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
    self.dialect.do_execute(
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
    cursor.execute(statement, parameters)
  File "/root/venv/lib/python3.8/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/root/venv/lib/python3.8/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 548, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 775, in _read_query_result
    result.read()
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 1156, in read
    first_packet = self.connection._read_packet()
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 725, in _read_packet
    packet.raise_for_error()
  File "/root/venv/lib/python3.8/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/root/venv/lib/python3.8/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1060, "Duplicate column name 'binding_index'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/venv/bin/neutron-db-manage", line 10, in <module>
    sys.exit(main())
  File "/root/workspace/neutron/neutron/db/migration/cli.py", line 687, in main
    return_val |= bool(CONF.command.func(config, CONF.command.name))
  File "/root/workspace/neutron/neutron/db/migration/cli.py", line 184, in do_upgrade
    do_alembic_command(config, cmd, revision=revision,
  File "/root/workspace/neutron/neutron/db/migration/cli.py", line 86, in do_alembic_command
    getattr(alembic_command, cmd)(config, *args, **kwargs)
  File "/root/venv/lib/python3.8/site-packages/alembic/command.py", line 294, in upgrade
    script.run_env()
  File "/root/venv/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/root/venv/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
    module = load_module_py(module_id, path)
  File "/root/venv/lib/python3.8/site-packages/alembic/util/compat.py", line 184, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/root/workspace/neutron/neutron/db/migration/alembic_migrations/env.py", line 120, in <module>
    run_migrations_online()
  File "/root/workspace/neutron/neutron/db/migration/alembic_migrations/env.py", line 114, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/root/venv/lib/python3.8/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/root/venv/lib/python3.8/site-packages/alembic/runtime/migration.py", line 561, in run_migrations
    step.migration_fn(**kw)
  File "/root/workspace/neutron/neutron/db/migration/alembic_migrations/versions/ussuri/expand/c3e9d13c4367_add_binding_index_to_.py", line 32, in upgrade
    op.add_column(NETWORK_DHCP_AGENT_BINDING,
  File "<string>", line 8, in add_column
  File "<string>", line 3, in add_column
  File "/root/venv/lib/python3.8/site-packages/alembic/operations/ops.py", line 1801, in add_column
    return operations.invoke(op)
  File "/root/venv/lib/python3.8/site-packages/alembic/operations/base.py", line 354, in invoke
    return fn(self, operation)
  File "/root/venv/lib/python3.8/site-packages/alembic/operations/toimpl.py", line 135, in add_column
    operations.impl.add_column(table_name, column, schema=schema, **kw)
  File "/root/venv/lib/python3.8/site-packages/alembic/ddl/impl.py", line 256, in add_column
    self._exec(base.AddColumn(table_name, column, schema=schema))
  File "/root/venv/lib/python3.8/site-packages/alembic/ddl/impl.py", line 146, in _exec
    return conn.execute(construct, multiparams)
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1262, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 77, in _execute_on_connection
    return connection._execute_ddl(
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1352, in _execute_ddl
    ret = self._execute_context(
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1813, in _execute_context
    self._handle_dbapi_exception(
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1992, in _handle_dbapi_exception
    util.raise_(newraise, with_traceback=exc_info[2], from_=e)
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
    self.dialect.do_execute(
  File "/root/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
    cursor.execute(statement, parameters)
  File "/root/venv/lib/python3.8/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/root/venv/lib/python3.8/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 548, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 775, in _read_query_result
    result.read()
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 1156, in read
    first_packet = self.connection._read_packet()
  File "/root/venv/lib/python3.8/site-packages/pymysql/connections.py", line 725, in _read_packet
    packet.raise_for_error()
  File "/root/venv/lib/python3.8/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/root/venv/lib/python3.8/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1060, "Duplicate column name 'binding_index'")
[SQL: ALTER TABLE networkdhcpagentbindings ADD COLUMN binding_index INTEGER NOT NULL DEFAULT '1']
(Background on this error at: http://sqlalche.me/e/14/e3q8)

The log shows that, when we try to execute this code[1], we get an error due to a conflict.

Manually changing the database is an available way, but, maybe there's a better way?

[1] https://github.com/openstack/neutron/blob/18.0.0/neutron/db/migration/alembic_migrations/versions/ussuri/expand/c3e9d13c4367_add_binding_index_to_.py#L38-L41

Tags: db
tags: added: db
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Thanks for reporting this issue.
There was some debate around this and CI issues:
https://review.opendev.org/q/69b3762dda47272513e02340e7942d5a39f825c5

The bug at that time is this:
https://bugs.launchpad.net/neutron/+bug/1849976

There was a fix which included the move of the migration script from train to ussuri folder.
By the "Duplicate column name 'binding_index'" I have the feeling You run into this migration script move/rename.

Revision history for this message
Lajos Katona (lajos-katona) wrote :

I left out the fix:
https://review.opendev.org/c/openstack/neutron/+/692285

Strange is that it was merged before the release, so there should be no problem with this migration issue.
Could you tell from which version you upgrade to which one?

Revision history for this message
uchenily (uchenily) wrote :

Thanks for your reply, Lajos.
I'm upgrading from rocky to victoria, and I can find this commit[1] from the git record.
Actually, I'm using the source code of Neutron tag 17.0.0)

[1] https://opendev.org/openstack/neutron/commit/0a3c53e411769141adc44b14af5efb854359af1a

Revision history for this message
Lajos Katona (lajos-katona) wrote :

I tried to manually reproduce the issue with migration on mysql:
1.) Created a test db (from [0]):
mysql> drop database testdb; create database testdb;

2.) used Rocky from neutron and neutron-lib and run neutron-db-manage to test-db

$ neutron-db-manage --database-connection mysql+pymysql://root:stackdb@127.0.0.1/testdb?charset=utf8 upgrade heads

3.) I changed neutron to 17.0.0 & neutron-lib to Victoria and do the same:
$ neutron-db-manage --database-connection mysql+pymysql://root:stackdb@127.0.0.1/testdb?charset=utf8 upgrade heads

Exactly how often this issue comes in Your environment?

[0] https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html#running-neutron-db-manage-without-devstack

Changed in neutron:
importance: Undecided → Low
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello uchenily:

This is because you probably tried to upgrade using a buggy branch (the one adding this column in Train) and then you are using the correct one, that tries to add it in Ussuri again.

Remove manually the column "binding_index" from "networkdhcpagentbindings" and then try again to continue with the DB migration.

Regards.

Revision history for this message
Brian Haley (brian-haley) wrote :

Will close based on last comment and since it's been idle for over a year. Please re-open if necessary.

Changed in neutron:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.