nova-compute service don't go up

Bug #1721390 reported by José Donoso
36
This bug affects 8 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Invalid
Undecided
Unassigned

Bug Description

Im having this problem with the deployment of Kolla-Ansible Openstack

TASK [nova : Waiting for nova-compute service up] *****************************************************************
FAILED - RETRYING: Waiting for nova-compute service up (20 retries left).
...
FAILED - RETRYING: Waiting for nova-compute service up (1 retries left).
fatal: [172.30.220.3 -> 172.30.220.3]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://172.30.230.3:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "3PxtKnvjKDTbPg2QT3llwig08efLoAgkdEY5VVoY", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:02.405043", "end": "2017-10-04 16:21:35.742909", "failed": true, "rc": 0, "start": "2017-10-04 16:21:33.337866", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]}

In nova-api.log it seems that nova-api cant get a response from the nova-compute service because i execute the same command for nova-conductor and it works fine.

---

docker exec kolla_toolbox openstack --os-interface internal --os-auth-url http://172.30.230.3:35357 --os-identity-api-version 3 --os-project-domain-name default --os-tenant-name admin --os-username admin --os-password 3PxtKnvjKDTbPg2QT3llwig08efLoAgkdEY5VVoY --os-user-domain-name default compute service list -f json --service nova-compute
[]

docker exec kolla_toolbox openstack --os-interface internal --os-auth-url http://172.30.230.3:35357 --os-identity-api-version 3 --os-project-domain-name default --os-tenant-name admin --os-username admin --os-password 3PxtKnvjKDTbPg2QT3llwig08efLoAgkdEY5VVoY --os-user-domain-name default compute service list -f json --service nova-conductor
[
  {
    "Status": "enabled",
    "Binary": "nova-conductor",
    "Zone": "internal",
    "State": "down",
    "Host": "server1.domain.local",
    "Updated At": null,
    "ID": 10
  },
  {
    "Status": "enabled",
    "Binary": "nova-conductor",
    "Zone": "internal",
    "State": "down",
    "Host": "server2.domain.local",
    "Updated At": null,
    "ID": 12
  }
]

---
In the nova-api.log
---
## nova-conductor ##

2017-10-04 17:09:34.938 39 DEBUG nova.osapi_compute.wsgi.server [req-5a3572f7-6ce6-42e2-8715-6c10b9ede9db - - - - -] (39) accepted ('172.30.230.3', 44344) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
2017-10-04 17:09:35.160 39 DEBUG nova.api.openstack.wsgi [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Calling method '<bound method ServiceController.index of <nova.api.openstack.compute.services.ServiceController object at 0x7145650>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:612
2017-10-04 17:09:35.165 39 DEBUG oslo_concurrency.lockutils [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-10-04 17:09:35.165 39 DEBUG oslo_concurrency.lockutils [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2017-10-04 17:09:35.193 39 DEBUG oslo_concurrency.lockutils [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "02be5acb-a4d9-4b89-8245-8a1fc689012c" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-10-04 17:09:35.194 39 DEBUG oslo_concurrency.lockutils [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "02be5acb-a4d9-4b89-8245-8a1fc689012c" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2017-10-04 17:09:35.221 39 DEBUG nova.servicegroup.drivers.db [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Seems service nova-conductor on host server1.domain.local is down. Last heartbeat was 2017-10-04 19:16:42. Elapsed time is 3173.221375 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:79
2017-10-04 17:09:35.222 39 DEBUG nova.servicegroup.drivers.db [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Seems service nova-conductor on host server2.domain.local is down. Last heartbeat was 2017-10-04 19:16:42. Elapsed time is 3173.222094 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:79
2017-10-04 17:09:35.223 39 INFO nova.osapi_compute.wsgi.server [req-9605b348-0260-4c49-9826-9ff726b75138 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] 172.30.230.3 "GET /v2.1/b75d457f41d848858717c7ab443820ee/os-services?binary=nova-conductor HTTP/1.1" status: 200 len: 766 time: 0.2835221

## nova-compute ##

2017-10-04 17:11:05.550 39 DEBUG nova.osapi_compute.wsgi.server [req-5a3572f7-6ce6-42e2-8715-6c10b9ede9db - - - - -] (39) accepted ('172.30.230.3', 44446) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
2017-10-04 17:11:05.625 39 DEBUG nova.api.openstack.wsgi [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Calling method '<bound method ServiceController.index of <nova.api.openstack.compute.services.ServiceController object at 0x7145650>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:612
2017-10-04 17:11:05.630 39 DEBUG oslo_concurrency.lockutils [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-10-04 17:11:05.631 39 DEBUG oslo_concurrency.lockutils [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2017-10-04 17:11:05.656 39 DEBUG oslo_concurrency.lockutils [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "02be5acb-a4d9-4b89-8245-8a1fc689012c" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-10-04 17:11:05.656 39 DEBUG oslo_concurrency.lockutils [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] Lock "02be5acb-a4d9-4b89-8245-8a1fc689012c" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2017-10-04 17:11:05.682 39 INFO nova.osapi_compute.wsgi.server [req-fcc097d5-a1ac-4a55-9bfb-3d5b6f42d0c5 e3c98600a43841b88aad3ff06808a36f b75d457f41d848858717c7ab443820ee - default default] 172.30.230.3 "GET /v2.1/b75d457f41d848858717c7ab443820ee/os-services?binary=nova-compute HTTP/1.1" status: 200 len: 414 time: 0.1309791

It looks that im not getting heartbeats from nova-compute service

Tags: nova-compute
Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

This uses to be an issue with nova-compute connecting to rabbitmq or cells not created properly, is normal that conductor appears in service-list and compute not if is not synced into main cell.

please share logs from nova-compute. Also question, what release? 5.0.0 from pip or master | stable/pike from github? Also what distros/version using?

Revision history for this message
José Donoso (jose.manuel.akainix) wrote :

I am using Centos 7 and the pip installation for kolla-ansible. I have already resolved the issue though, i was using the same two hosts as control and compute nodes and that seems to interfere with the connection, so what i do is to separate them to use one as control node and the other one as compute node and that work fine.

Another question, the deployment host can be used as monitoring and network node separated with the control node? Or is better to have the network and control node in the same host?

Revision history for this message
Han Manjong (aksmj8855) wrote :

Did you check that kolla and kolla-ansible are the same version?
$ pip show kolla
$ pip show kolla-ansible

Revision history for this message
José Donoso (jose.manuel.akainix) wrote :

Yes, kolla and kolla-ansible are the same version 5.0.0.

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

You can use control, network, compute, storage, monitoring in separate nodes or in the same nodes (all in one). I dont know what was causing the issue with compute on same host as control, that should no be failing. At least it is working for me and in CI too.

Changed in kolla-ansible:
status: New → Invalid
Revision history for this message
Sabbir Sakib (sakibsys) wrote :

Not working for me .

TASK [nova : Waiting for nova-compute service up] **************************************************************************************************************************************************************************************************************************************************************************
FAILED - RETRYING: Waiting for nova-compute service up (20 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (19 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (18 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (17 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (16 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (15 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (14 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (13 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (12 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (11 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (10 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (9 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (8 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (7 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (6 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (5 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (4 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (3 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (2 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (1 retries left).
fatal: [oscontroller01.xyz.pvt -> oscontroller01.xyz.pvt]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://10.50.164.20:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "7x9cgtL8Th6tfJvJlfdNTTyALlwrcwnOdEeUT0dX", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:01.689926", "end": "2018-03-07 12:26:56.514003", "rc": 0, "start": "2018-03-07 12:26:54.824077", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines":

Revision history for this message
cally725 (christian-ally) wrote :

Hi, I have the same problem with kola queens 6.0.0

Any idea on this problem ?

Revision history for this message
Mark Goddard (mgoddard) wrote :

Can you provide the nova compute logs? Is the nova_compute container running? We have new queens releases which you might wish to try also - latest is 6.2.0.

Revision history for this message
Pride Njukia (cnjukia) wrote :

Hi, I have the same problem with stein

Revision history for this message
Pride Njukia (cnjukia) wrote :

Its 2019, am still get this error; on ubuntu 16.04 kolla-ansible *FAILED - RETRYING: Waiting for nova-compute service up * any help, pointers, much appreciated

Revision history for this message
Pride Njukia (cnjukia) wrote :
Download full text (8.3 KiB)

2019-05-22 09:12:35.124 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:13:07.164 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:13:39.204 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:14:11.246 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:14:43.286 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:15:15.329 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:15:47.370 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:16:19.411 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:16:51.448 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:17:23.491 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:17:55.532 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:18:27.573 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:18:59.613 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:19:31.654 7 ERROR oslo.messaging._drivers.impl_rabbit [req-748bbd7a-6d94-4a6b-90f1-8a6862c62a15 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 32.0 seconds): error: [Errno 111] ECONNREFUSED
2019-05-22 09:20:03.695 7 ERROR oslo.messaging._dr...

Read more...

Revision history for this message
Mark Goddard (mgoddard) wrote :

The error message here is quite generic, and simply means that a nova-compute service has not started correctly. There could be many causes.

Pride, in the logs you have posted nova-compute cannot connect to RabbitMQ. I would recommend trying to work out if RabbitMQ is working. Is the rabbitmq container up? Is there anything unexpected in 'docker logs rabbitmq' or in the rabbitmq log file in the kolla_logs volume?

Revision history for this message
Farid Lahdiri (faridl) wrote :

Hi
I had the same issue: TASK [nova : Waiting for nova-compute service up].. then timeout.
I was using :
  ansible version 2.5.1
  kolla-ansible version 6.0.0

I have upgraded to :
  ansible 2.6.0
  kolla-ansible 6.2.1

Problem solved.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.