xen: live-migration without '--block-migrate" failed with "No sql_connection parameter is established" (cells v2 aggregate up-call in superconductor mode)

Bug #1709594 reported by Jianghua Wang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
High
Unassigned

Bug Description

The test in on XenServer.
nova live-migration <VM>

If we run the live-migration without the option of "--block-migrate", it will failed with error as:
   RemoteError: Remote error: RemoteError Remote error: CantStartEngineError No sql_connection parameter is established"

- The trace for nova-conductor:
Aug 09 10:12:55 DevStackOSDomU nova-conductor[13365]: line 990, in wrapper\\n with self._transaction_scope(context):\\n\', u\'
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__\\n return self.gen.next()\\n\', u\'
File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1040, in _transaction_scope\\n context=context) as resource:\\n\', u\'
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__\\n return self.gen.next()\\n\', u\'
 File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 645, in _session\\n bind=self.connection, mode=self.mode)\\n\', u\'
File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 402, in _create_session\\n self._start()\\n\', u\'
File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 496, in _start\\n engine_args, maker_args)\\n\', u\'
File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 518, in _setup_for_connection\\n "No sql_connection parameter is established")\\n\', u\'CantStartEngineError: No sql_connection parameter is established\\n\'].\n'].

- The trace from nova-compute:
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 211, in decorated_function
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server self.force_reraise()
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 199, in decorated_function
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 5253, in check_can_live_migrate_destination
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server disk_over_commit)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 5264, in _do_check_can_live_migrate_destination
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server block_migration, disk_over_commit)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 465, in check_can_live_migrate_destination
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server disk_over_commit)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2281, in check_can_live_migrate_destination
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server self._ensure_host_in_aggregate(ctxt, src)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2203, in _ensure_host_in_aggregate
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server self._get_host_uuid_from_aggregate(context, hostname)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2189, in _get_host_uuid_from_aggregate
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server aggregate_list = objects.AggregateList.get_by_host(context, CONF.host)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server args, kwargs)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server args=args, kwargs=kwargs)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server retry=self.retry)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 123, in _send
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server timeout=timeout, retry=retry)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 578, in send
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server retry=retry)
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 569, in _send
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server raise result
Aug 09 03:26:17 DevStackOSDomU nova-compute[1762]: ERROR oslo_messaging.rpc.server RemoteError: Remote error: CantStartEngineError No sql_connection parameter is established

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

I have no idea why the connection is None when trying to query AggregateList:
  rpc.server aggregate_list = objects.AggregateList.get_by_host(context, CONF.host)

stack@DevStackOSDomU:~/tempest$ nova-manage cell_v2 list_cells
+-------+--------------------------------------+------------------------------------------------------+-------------------------------------------------------------+
| Name | UUID | Transport URL | Database Connection |
+-------+--------------------------------------+------------------------------------------------------+-------------------------------------------------------------+
| cell0 | 00000000-0000-0000-0000-000000000000 | none:/ | mysql+pymysql://root:****@127.0.0.1/nova_cell0?charset=utf8 |
| cell1 | f057988d-91a0-4869-9549-b237d6e17cf6 | rabbit://stackrabbit:****@10.62.34.3:5672/nova_cell1 | mysql+pymysql://root:****@127.0.0.1/nova_cell1?charset=utf8 |
+-------+--------------------------------------+------------------------------------------------------+-------------------------------------------------------------+

- The data configure in /etc/nova/nova_cell1.conf:
[database]
connection = mysql+pymysql://root:citrix@127.0.0.1/nova_cell1?charset=utf8

- The database configure in /etc/nova/nova.conf:
connection = mysql+pymysql://root:citrix@127.0.0.1/nova_cell0?charset=utf8

description: updated
Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

At line:https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/enginefacade.py#L471
it will load the database section. The connection option loaded is None for the above test. But the configuration in the nova*.conf seems correct. And other attempts connecting database, it can get the correct connection also.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Is this a devstack configuration? I'm assuming running in superconductor mode (the default)? That is set on the CELLSV2_SETUP variable in stackrc.

Aggregates were migrated to the API database, so do you have the correct settings for the nova_api database connection in nova.conf? That would be the [api_database]/connection field.

tags: added: openstack-version.pike
Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

Matt, thanks for the comments.

Yes, this is a devstack configuration and it's running in superconductor mode.

The database relative settings in nova.conf is:

[database]
connection = mysql+pymysql://root:citrix@127.0.0.1/nova_cell0?charset=utf8

[api_database]
connection = mysql+pymysql://root:citrix@127.0.0.1/nova_api?charset=utf8

Looks like the settings are correct. Or could you advise what I should check? Thanks for the help.

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

I tried to print the stack in enginefacade.py.
Checking source code:
  https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/enginefacade.py#L471
  conf.register_opts(options.database_opts, 'database')

It seems will try to get the connection from settings database/connection; but not api_database/connection. Is this wrong by considering the Aggregates were migrated to API database?

See the following stack

  File "/opt/stack/nova/nova/objects/aggregate.py", line 541, in get_by_host
    _get_by_host_from_db(context, host, key=key)]
  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 992, in wrapper
    with self._transaction_scope(context):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1042, in _transaction_scope
    context=context) as resource:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 647, in _session
    bind=self.connection, mode=self.mode)
  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 402, in _create_session
    self._start()
  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 477, in _start
    traceback.print_stack(file=f)
conf is not None: conf=<oslo_config.cfg.ConfigOpts object at 0x7f279c064950>

url_args={'connection': None, 'slave_connection': None}

Sean Dague (sdague)
tags: added: cellsv2 xenserver
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Matt Riedemann (mriedem)
tags: added: pike-rc-potential
Revision history for this message
Matt Riedemann (mriedem) wrote :

That's your nova.conf, but what about the nova-cpu.conf? That's not going to have the [api_database]/connection set, and that's probably your issue. The AggregateList.get_by_host code is looking up aggregates in the API database, and there is no connection set in the context for that, it's an up-call from the compute service, which is running isolated in the cell, to the API DB, and that's blocked in super-conductor mode.

e.g. http://logs.openstack.org/42/492242/1/check/gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv/cfb2cad/logs/subnode-2/etc/nova/nova-cpu.conf.txt.gz

There is no database connection in there. The compute service is talking to the local cell conductor, which is using nova_cell1.conf:

http://logs.openstack.org/42/492242/1/check/gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv/cfb2cad/logs/etc/nova/nova_cell1.conf.txt.gz

Which only has access to the cell database, not the API database.

The up-call limitations are described here:

https://docs.openstack.org/nova/latest/user/cellsv2_layout.html#operations-requiring-upcalls

So to workaround this in devstack, set CELLSV2_SETUP=singleconductor in your local.conf.

Changed in nova:
status: Confirmed → Triaged
Matt Riedemann (mriedem)
tags: removed: pike-rc-potential
Matt Riedemann (mriedem)
summary: live-migration without '--block-migrate" failed with "No sql_connection
- parameter is established"
+ parameter is established" (cells v2 aggregate up-call in superconductor
+ mode)
summary: - live-migration without '--block-migrate" failed with "No sql_connection
- parameter is established" (cells v2 aggregate up-call in superconductor
- mode)
+ xen: live-migration without '--block-migrate" failed with "No
+ sql_connection parameter is established" (cells v2 aggregate up-call in
+ superconductor mode)
Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

Matt thanks for your analysis and suggestions. It works well if we put the api_database in /etc/nova/nova_cell1.conf to allow local conductor accessing API database.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/493006

Revision history for this message
Matt Riedemann (mriedem) wrote :

> Matt thanks for your analysis and suggestions. It works well if we put the api_database in /etc/nova/nova_cell1.conf to allow local conductor accessing API database.

That defeats the purpose of running in superconductor mode, so you might as well run devstack in singleconductor mode as I pointed out. The point of superconductor mode is that the cell is isolated from the top-level API database. See:

https://docs.openstack.org/nova/latest/user/cellsv2_layout.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/493006
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=904c4a1d9ac25119bb3100a349e19fef8d318527
Submitter: Jenkins
Branch: master

commit 904c4a1d9ac25119bb3100a349e19fef8d318527
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 11 09:05:14 2017 -0400

    doc: add another up-call caveat for cells v2 for xenapi aggregates

    There is an up-call from the xenapi driver when performing a live
    migration and --block-migrate is not specified such that the driver
    checks to see if the source and destination hosts are in the same
    aggregate. This fails in a super-conductor setup because the
    aggregates are now in the API database and the cell conductor
    won't be able to access that database by design.

    Change-Id: I6c880c72d87eb0116cb57371e5d600dced2915f7
    Related-Bug: #1709594

Revision history for this message
Matt Riedemann (mriedem) wrote :

Was there a reason that we needed this change?

I85047d44e388c89938f80ad8be1724bd11ed0225

In other words, why don't the other virt drivers have this same change to auto-detect if block migration should be performed during live migration?

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

Matt,
 Understood. It works well after switching to singleconductor mode.
 R.g.t. I85047d44e388c89938f80ad8be1724bd11ed0225
 Live migration without block migration requires the source and destination hosts are on the same shared storage. For xenapi driver, the implement in the above commit is to check the shared storage basing on aggregate.
 See the spec: https://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/making_live_migration_api_friendly.html#the-detection-of-block-migration

 Instead of querying the api database, I think we can using the xenapi API to determine if the hosts are in the same host pool.

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

This bug is expected to be resolved after we complete this BP:
https://blueprints.launchpad.net/nova/+spec/live-migration-in-xapi-pool

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.