fill_virtual_interface_list migration fails on second attempt

Bug #1824435 reported by iain MacDonnell on 2019-04-11
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned
Stein
Medium
Unassigned

Bug Description

On attempting to run online_data_migrations on Stein for the second time (and beyond), fill_virtual_interface_list fails as below. I find two rows in the security_groups table which have name='default' and project_id NULL.

2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage [req-6cb533e2-58b5-41db-a455-29dae8efef31 - - - - -] Error attempting to run <function fill_virtual_interface_list at 0x7f1ed1b6f500>: TypeError: 'NoneType' object has no attribute '__getitem__'
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage Traceback (most recent call last):
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 686, in _run_migration
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage found, done = migration_meth(ctxt, count)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1012, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 279, in fill_virtual_interface_list
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage _set_or_delete_marker_for_migrate_instances(cctxt, marker)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 305, in _set_or_delete_marker_for_migrate_instances
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage instance.create()
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(self, *args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage db_inst = db.instance_create(self._context, updates)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return IMPL.instance_create(context, values)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage ectxt.value = e.inner_exc
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage self.force_reraise()
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage six.reraise(self.type_, self.value, self.tb)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1728, in instance_create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage security_group_ensure_default(context)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4039, in security_group_ensure_default
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return _security_group_ensure_default(context)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4050, in _security_group_ensure_default
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage default_group = _security_group_get_by_names(context, ['default'])[0]
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage TypeError: 'NoneType' object has no attribute '__getitem__'

mysql> select * from security_groups;
+---------------------+------------+------------+----+---------+-------------+----------------------------------+----------------------------------+---------+
| created_at | updated_at | deleted_at | id | name | description | user_id | project_id | deleted |
+---------------------+------------+------------+----+---------+-------------+----------------------------------+----------------------------------+---------+
| 2019-04-09 23:23:10 | NULL | NULL | 1 | default | default | 7ed07a8869044a49b03a6e303e548198 | 1b57d6726e574f3db7fb94ceb3955591 | 0 |
| 2019-04-11 01:11:05 | NULL | NULL | 2 | default | default | NULL | NULL | 0 |
| 2019-04-11 01:11:05 | NULL | NULL | 3 | default | default | NULL | NULL | 0 |
+---------------------+------------+------------+----+---------+-------------+----------------------------------+----------------------------------+---------+
3 rows in set (0.00 sec)

Matt Riedemann (mriedem) wrote :
tags: added: db nova-manage upgrade
Changed in nova:
status: New → Triaged
importance: Undecided → High
Matt Riedemann (mriedem) wrote :

(5:33:08 PM) mriedem: hmm yeah i'm not sure how https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/db/sqlalchemy/api.py#L3874 can't either return at least 1 or raise
(5:35:47 PM) efried: oh, that part is because there's two rows in the database with the "right" project_id (NULL)
(5:36:05 PM) efried: mriedem: So the initial check (==) fails because there's *more* db rows than expected
(5:36:18 PM) efried: and then the for loop doesn't hit because all the names match (because they're the same)
(5:36:23 PM) mriedem: ah yup
(5:36:35 PM) mriedem: i don't know why the unique constraint doesn't blow up - because NULL isn't considered unique?
(5:36:39 PM) efried: so the problem is that we're somehow creating two rows, and I don't know how that ....
(5:36:42 PM) mriedem: schema.UniqueConstraint('project_id', 'name', 'deleted',
(5:36:42 PM) mriedem: name='uniq_security_groups0project_id0'
(5:36:42 PM) mriedem: 'name0deleted'),

Matt Riedemann (mriedem) wrote :

I'm not exactly sure how to reproduce this though. I was able to recreate the unique constraint issue in the database after creating a server and then running the online_data_migrations command a few times:

http://paste.openstack.org/show/749218/

But I don't get errors from the command.

iain MacDonnell (imacdonn) wrote :

Seems there are two issues here...

1) Duplicate marker row created due to unique constraint ignoring null value

2) _security_group_get_by_names() takes a list of group names, does a DB query, and expects that the result will have the same number of items. In this case, the list of group names is just '[default'], but there are two items with the name "default", so the list length comparison fails. The subsequent attempt to find a missing item iterates over the list of group names in search of one that is not present in the query result, but there is none, so it falls out the bottom (ironically, past the "Not Reached" comment).

I wonder if (2) should be addressed, or if it's reasonable to assume that there will never be multiple query results with the same name (and make it so by fixing (1))?

Matt Riedemann (mriedem) wrote :

Iain reports that https://review.openstack.org/653065 at least works around the issue. Also a potential recreate:

(2:44:55 PM) imacdonn: mriedem: FWIW, I can reproduce my original problem with this sequence: 1) create an instance 2) run migrations 3) archive 4) run migrations

Which I'm doing in this functional test https://review.openstack.org/#/c/653098/ but that might be missing something (and uses sqlite) so I'll try Iain's recreate in devstack.

Matt Riedemann (mriedem) wrote :

imacdonn's recreate works for me in devstack: http://paste.openstack.org/show/749391/

Surya Seetharaman (tssurya) wrote :

I ran into the same bug today on my devstack.

Matt Riedemann (mriedem) on 2019-05-23
Changed in nova:
importance: High → Medium

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/653065
Reason: I'm not working on this, someone else will have to pick up working on the bug.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers