Zun

VM and zun container errors after host reboot

Bug #1850937 reported by BN
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zun
New
Undecided
Unassigned

Bug Description

**Bug Report**

What happened:

Multinode openstack deployed with Zun service. Private (demo-net) and public network were created. Instance was created and started. Zun container was created and started without specifying a network. Thus, container started successfully, and it was assigned to private demo-net network. However, network name was changed to 7629c76e6b80443e033554fb9f3098937e311934e2650586f7c895a64bebcd75 where 7629c76e6b80 (docker network ls). Everything was working fine and I could create other instances and containers as well. After hosts reboot, errors started coming up (I could not start some instances; some of containers could not be started as well with errors):

VM (nova-conductor.log) -

2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] Failed to schedule instances: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager Traceback (most recent call last):
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1356, in schedule_and_build_instances
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 810, in _schedule_instances
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager return_alternates=return_alternates)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 178, in call
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager retry=self.retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/transport.py", line 128, in _send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager retry=retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager call_monitor_timeout, retry=retry)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 634, in _send
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager call_monitor_timeout)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 520, in wait
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager message = self.waiters.get(msg_id, timeout=timeout)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in get
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager 'to message ID %s' % msg_id)
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:05.919 19 ERROR nova.conductor.manager
2019-11-01 16:03:06.514 19 WARNING nova.scheduler.utils [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] Failed to compute_task_build_instances: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58
2019-11-01 16:03:06.519 19 WARNING nova.scheduler.utils [req-0ef3fa2c-1db7-4588-9e30-df729a6c64dd db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e - default default] [instance: 7d4f4b95-515a-4efc-85ef-9d6019c0a34d] Setting instance to ERROR state.: MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58

Zun (zun-compute.log) -

2019-11-01 16:09:04.272 6 ERROR zun.compute.manager [req-3b945ea8-db1d-42e3-bfeb-0746877cd711 db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e default - -] Unexpected exception: Cannot act on container in 'Error' state: Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager Traceback (most recent call last):
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/compute/manager.py", line 730, in container_logs
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager since=since)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 243, in decorated_function
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager return function(*args, **kwargs)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 99, in decorated_function
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager handle_not_found(e, context, container)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 86, in handle_not_found
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager "Cannot act on container in '%s' state") % container.status)
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.272 6 ERROR zun.compute.manager
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server [req-3b945ea8-db1d-42e3-bfeb-0746877cd711 db41ed54317a4f6e96ebbaf14a750ba0 02f1fcd1831845ff9c89cdb6906d052e default - -] Exception during message handling: Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 222, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/compute/manager.py", line 730, in container_logs
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server since=since)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/common/utils.py", line 243, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server return function(*args, **kwargs)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 99, in decorated_function
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server handle_not_found(e, context, container)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/zun/container/docker/driver.py", line 86, in handle_not_found
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server "Cannot act on container in '%s' state") % container.status)
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server Conflict: Cannot act on container in 'Error' state
2019-11-01 16:09:04.275 6 ERROR oslo_messaging.rpc.server

Moreover, I have tried to create new instance and was getting the same error:

Traceback (most recent call last): File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1356, in schedule_and_build_instances instance_uuids, return_alternates=True) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 810, in _schedule_instances return_alternates=return_alternates) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations instance_uuids, return_objects, return_alternates) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 178, in call retry=self.retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/transport.py", line 128, in _send retry=retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send call_monitor_timeout, retry=retry) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 634, in _send call_monitor_timeout) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 520, in wait message = self.waiters.get(msg_id, timeout=timeout) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in get 'to message ID %s' % msg_id) MessagingTimeout: Timed out waiting for a reply to message ID 06e283038df54a43a4dc21626eff0b58

However, when I created new container - its started without any issues, it was just old containers showing the errors.

P.S. Maybe I did not configured networks correctly, even though I could not find documentation what is the right way to prepare environment/networking for zun and nova so they can work together without any issues.

Thank you

Revision history for this message
hongbin (hongbin034) wrote :

@BN,

First, the VM error looks like a different error and I don't have much idea about that.

For the container error, I need to find a time to reproduce the problem. So, the error will happen if I perform the following steps?

* Use Kolla to install Zun (master? or a stable branch)
* Create some containers
* Reboot the compute node? controller node?

Please confirm.

Revision history for this message
BN (zatoichy) wrote :

Hi hongbin,

Thank you very much for your response. I specified everything here as well - https://bugs.launchpad.net/kolla-ansible/+bug/1850936 ; there is more details.

P.S. I have 3 servers, 1 of them is controller/network/monitoring, the rest are working as compute/storage.

Regards

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.