**Bug Report**
What happened:
Multinode openstack deployed with Zun service. Private (demo-net) and public network were created using init-runone script. Instance was created and started. Zun container was created and started without specifying a network. Thus, container started successfully, and it was assigned to private demo-net network. However, network name was changed to 7629c76e6b80443e033554fb9f3098937e311934e2650586f7c895a64bebcd75 where 7629c76e6b80 (docker network ls). Everything was working fine and I could create other instances and containers as well. After hosts reboot, errors started coming up (I could not start some instances; some of containers could not be started as well with errors):
VM (nova-conductor.log) -
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] Failed to schedule instances: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager Traceback (most recent call last):
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1356, in schedule_and_build_instances
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 810, in _schedule_instances
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager return_alternates=return_alternates)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 178, in call
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager retry=self.retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/transport.py", line 128, in _send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager retry=retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager call_monitor_timeout, retry=retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 634, in _send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager call_monitor_timeout)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 520, in wait
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager message = self.waiters.get(msg_id, timeout=timeout)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in get
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager 'to message ID %s' % msg_id)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager
2019-11-01 16:03:06.514 19 WARNING nova.scheduler.utils [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] Failed to compute_task_build_instances: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:06.519 19 WARNING nova.scheduler.utils [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] [instance: 7d4f4b95-515a-4efc-85ef-9d6019c0a34d] Setting instance to ERROR state.: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
Zun (zun-compute.log) -
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager [req-3b945ea8-db1d-42e3-bfeb-0746877cd711 db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e default - -] Unexpected exception: Cannot act on container in 'Error' state: Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager Traceback (most recent call last):
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/compute/manager.py", line 730, in container_logs
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager since=since)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 243, in decorated_function
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager return function(*args, **kwargs)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 99, in decorated_function
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager handle_not_found(e, context, container)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 86, in handle_not_found
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager "Cannot act on container in '%s' state") % container.status)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server [req-3b945ea8-db1d-42e3-bfeb-0746877cd711 db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e default - -] Exception during message handling: Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 222, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/compute/manager.py", line 730, in container_logs
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server since=since)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 243, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return function(*args, **kwargs)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 99, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server handle_not_found(e, context, container)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 86, in handle_not_found
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server "Cannot act on container in '%s' state") % container.status)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server
Moreover, I have tried to create new instance and was getting the same error:
Traceback (most recent call last): File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1356, in schedule_and_build_instances instance_uuids, return_alternates=True) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 810, in _schedule_instances return_alternates=return_alternates) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations instance_uuids, return_objects, return_alternates) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 178, in call retry=self.retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/transport.py", line 128, in _send retry=retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send call_monitor_timeout, retry=retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 634, in _send call_monitor_timeout) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 520, in wait message = self.waiters.get(msg_id, timeout=timeout) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in get 'to message ID %s' % msg_id) MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
However, when I created new container - its started without any issues, it was just old containers showing the errors.
In conclusion, once I ran kolla-ansible -i multinode reconfigure and its finished without any issues, I was able to create and start new vm instances. However, I was not able to start instances which showed errors after I rebooted the hosts therefore, they cannot be restored. I could not start containers which were showing errors after reboot either but I could create new container as it was even before reconfiguration.
P.S. Maybe I did not configured networks correctly, even though I could not find documentation what is the right way to prepare environment/networking for zun and nova so they can work together without any issues.
Thank you
What you expected to happen:
How to reproduce it (minimal and precise): Create instance in private network. Create container without specifying a network. Reboot hosts. Check your results.
**Environment**:
* OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS
* Kernel (e.g. `uname -a`): 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
* Docker version if applicable (e.g. `docker version`): 19.03.4
* Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): 2.8
* Docker image Install type (source/binary): source
* Docker image distribution: ubuntu
* Are you using official images from Docker Hub or self built? official
* Share your inventory file, globals.yml and other configuration files if relevant
kolla_base_distro: "ubuntu"
kolla_install_type: "source"
openstack_release: "stein"
kolla_internal_vip_address: "10.0.225.254"
network_interface: "enp2s0f0"
neutron_external_interface: "enp2s0f1"
enable_barbican: "yes"
enable_cinder: "yes"
enable_cinder_backup: "yes"
enable_fluentd: "yes"
enable_zun: "yes"
enable_kuryr: "yes"
enable_etcd: "yes"
docker_configure_for_zun: "yes"
enable_magnum: "yes"
enable_ceph: "no"
glance_backend_ceph: "yes"
cinder_backend_ceph: "yes"
nova_backend_ceph: "yes"
glance_enable_rolling_upgrade: "no"
barbican_crypto_plugin: "simple_crypto"
barbican_library_path: "/usr/lib/libCryptoki2_64.so"
ironic_dnsmasq_dhcp_range:
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
horizon_port: 48000
----
[control]
localhost ansible_connection=local become=true
[network]
localhost ansible_connection=local become=true
[compute]
localhost ansible_connection=local become=true
10.0.2.1 ansible_user=root ansible_become=true
10.0.3.1 ansible_user=root ansible_become=true
[monitoring]
localhost ansible_connection=local become=true
[storage]
localhost ansible_connection=local become=true
10.0.2.1 ansible_user=root ansible_become=true
10.0.3.1 ansible_user=root ansible_become=true
[deployment]
localhost ansible_connection=local become=true
Thanks for a comprehensive bug report. Notifying Zun about a possible bug. The error from Nova Conductor, hovewer, looks like the issue is somewhere else. Could you check if all containers are up on all the nodes (docker ps -a) and not constantly restarting or down. Then attach logs from b0rken containers.