Compute component jobs in master branch are failing with ERROR nova nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy

Bug #1903655 reported by Sandeep Yadav
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
puppet-nova
Won't Fix
Undecided
Unassigned
puppet-openstack-integration
Won't Fix
Undecided
Unassigned
tripleo
Fix Released
Critical
Oliver Walsh

Bug Description

Description:-

Compute component jobs in master branch are failing with ERROR nova nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy.

Jobs are failing since 08th Nov'20

Logs:-

https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-compute-master/b8ce7e2/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz

~~~
2020-11-09 08:34:36.962241 | fa163e7d-71c4-2223-cd8b-0000000046fc | TASK | Check containers status
 [ERROR]: Container(s) which finished with wrong return code:
['nova_wait_for_compute_service']
 [ERROR]: Container(s) which failed to be created by podman_container module:
['nova_wait_for_compute_service']
2020-11-09 08:34:38.744478 | fa163e7d-71c4-2223-cd8b-0000000046fc | FATAL | Check containers status | standalone | error={"changed": false, "msg": "Failed container(s): ['nova_wait_for_compute_service'], check logs in /var/log/containers/stdouts/"}
2020-11-09 08:34:38.746197 | fa163e7d-71c4-2223-cd8b-0000000046fc | TIMING | tripleo_container_manage : Check containers status | standalone | 0:41:25.278908 | 1.78s

PLAY RECAP *********************************************************************
standalone : ok=640 changed=292 unreachable=0 failed=1 skipped=209 rescued=0 ignored=0
~~~

* INFO:nova_wait_for_compute_service:Waiting for nova-compute service to register - as final log - "Nova-compute service registered" not noticed in logs

https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-compute-master/b8ce7e2/logs/undercloud/var/log/extra/podman/containers/nova_wait_for_compute_service/stdout.log.txt.gz

~~~
DEBUG:keystoneauth.session:GET call to compute for http://192.168.24.3:8774/v2.1/os-services?binary=nova-compute used request id req-13c1c788-99ce-49c1-a0ae-25dc23a410dc
INFO:nova_wait_for_compute_service:Waiting for nova-compute service to register
~~~

* https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-compute-master/b8ce7e2/logs/undercloud/var/log/extra/errors.txt.txt.gz

~~~
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova Traceback (most recent call last):
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 350, in get_or_set_cached_cell_and_set_connections
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova cell_tuple = CELL_CACHE[cell_mapping.uuid]
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova KeyError: '00000000-0000-0000-0000-000000000000'
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova During handling of the above exception, another exception occurred:
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova Traceback (most recent call last):
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/bin/nova-compute", line 10, in <module>
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova sys.exit(main())
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/compute.py", line 59, in main
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova topic=compute_rpcapi.RPC_TOPIC)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 252, in create
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova utils.raise_if_old_compute()
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1068, in raise_if_old_compute
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova ctxt, ['nova-compute'])
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/objects/service.py", line 554, in get_minimum_version_all_cells
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova binaries)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 545, in scatter_gather_all_cells
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova fn, *args, **kwargs)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 433, in scatter_gather_cells
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova with target_cell(context, cell_mapping) as cctxt:
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova return next(self.gen)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 393, in target_cell
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova set_target_cell(cctxt, cell_mapping)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 366, in set_target_cell
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova get_or_set_cached_cell_and_set_connections()
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova return f(*args, **kwargs)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/context.py", line 354, in get_or_set_cached_cell_and_set_connections
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova db_connection_string)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/db/api.py", line 79, in create_context_manager
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova return IMPL.create_context_manager(connection=connection)
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/common.py", line 48, in __call__
2020-11-09 08:24:02.769 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova raise exception.DBNotAllowed(binary=service_name)
~~~

Another example:-

https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario012-standalone-compute-master/cde6346/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz

Observation:-

Looks like it started after the merge of https://review.opendev.org/#/c/738482

Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Something is confusing in this error.

In the stacktrace the nova-compute service.get_minimum_version_all_cells(ctxt, ['nova-compute']) is called and fails. And it should fail as it is not a remotable db method. But the nova-compute should not call service.get_minimum_version_all_cells() at all. It should only call the remotable service.Service.get_minimum_version(ctxt, 'nova-compute'). The code decides which call to make based on the config[1]. If the api database is configured for the service then the code assumes that it is a controller service that can access the database directly. So I assume the tripleoo config set the [api_database]/connection config for the compute service.

[1] https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c899617e/nova/utils.py#L1064-L1072

Oliver Walsh (owalsh)
Changed in tripleo:
assignee: nobody → Oliver Walsh (owalsh)
Revision history for this message
Oliver Walsh (owalsh) wrote :

I'll work on landing the patches for https://bugs.launchpad.net/tripleo/+bug/1871482 ASAP

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

We discussed it on #openstack-nova [1]. Here is the summary of our understanding.

* What you see is the expected behavior of the nova-compute service is [api_database]/connection is configured
* Configuring [api_database]/connection for a nova-compute service is invalid. And it is already makes the service failing except if you pin the rpc version to 'auto'. So tripleoo was already broken before the recent nova change, but it was worked around in[2]. The real fix for tripleoo is not to add db credentials to the nova-compute service. There is already a patch proposed to it [3].
* The failure is not obvious from the nova logs. So I will propose patches to nova (and link them here as related fixes) that makes the error more understandable.
* Please also note that the nova service level check is being backported to stable branches but we will make sure that neither the service level incompatibility nor the api db credentials config will make nova-compute fail on startup on the stable branches.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2020-11-10.log.html#t2020-11-10T15:16:32
[2] https://review.opendev.org/#/c/737287/
[3] https://review.opendev.org/#/c/718552/

Revision history for this message
Oliver Walsh (owalsh) wrote :

> And it is already makes the service failing except if you pin the rpc version to 'auto'.

Other way around - fails when rpc version pinning is set to 'auto'.

Revision history for this message
Oliver Walsh (owalsh) wrote :

Also requires changes to puppet-nova as it includes nova::db in init.pp

Changed in tripleo:
status: Triaged → In Progress
Changed in puppet-nova:
status: New → In Progress
assignee: nobody → Oliver Walsh (owalsh)
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

I'd agree that we should remove all api_database parameters from nova.conf on compute nodes, but I'm bit concerned with current nova implementation because this would cause some issues for standalone deployment where nova-api and nova-compute are running on the same node. Also, when we use ironic we run nova-compute usually on the controller nodes where nova-api is running.

This is not a problem for containerized deployment like tripleo, but in non-containerized deployment all of the nova processes on the same node share the same config, and it sounds like these deployments are not supported because we can't start nova-compute process on the same node as the nova-api process.

It would be nice if I can get some clarifications about this.

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

> it sounds like these deployments are not supported
I meant to say;
"it sounds like these deployments are not supported in non-containerized deployment"

Revision history for this message
Artom Lifshitz (notartom) wrote :

@Takashi nova-compute can be started with a different config file than the other services. This is how devstack does it in its all-in-one deployments. The config file for n-cpu does not contain any database connection info, whereas the separate config file for the other nova services include the db connection info.

Revision history for this message
Marios Andreou (marios-b) wrote :

I am rover this week adding a note on the current status - not sure if there is something further needed?

For tripleo the job is now green at https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-compute-master

The relevant fixes were https://review.opendev.org/c/openstack/nova/+/738482 & https://review.opendev.org/c/openstack/tripleo-heat-templates/+/718552

There is some discussion in the bug however... do we need to keep it open? I am moving it to fix-released but please move back if you disagree.

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

The change in nova which triggered this problem was reverted.
I understand the suggestion to use a different config file for nova-compute but that set up is completely different from what has been installed by packages. In case nova team force this direction then this change should be communicated to all distributions (rdo, ubuntu, debian) so that they can prepare proper config layout.

I'll close this as won't fix atm because the next step is not clear and we really need to first understand how each distributions react to the new requirement.

Changed in puppet-nova:
assignee: Oliver Walsh (owalsh) → nobody
status: In Progress → Won't Fix
Changed in puppet-openstack-integration:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.