can not nova reboot vm

Bug #1478450 reported by hiyonger-ZTE_TECS
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Low
Unassigned

Bug Description

I have 10 vms, run nova reboot the 10 vms. After reboot 285 times,can not nova reboot again.The detail information:

nova list:
+--------------------------------------+-------------------------------------------+--------+----------------+-------------+--------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------------------------+--------+----------------+-------------+--------------------+
| 26d74bba-2e9a-4594-b6a1-1b27f337759b | test-26d74bba-2e9a-4594-b6a1-1b27f337759b | REBOOT | reboot_started | Running | net01=192.168.0.8 |
| 7d129f4d-0af7-4ea8-9de5-3483cdcb7a68 | test-7d129f4d-0af7-4ea8-9de5-3483cdcb7a68 | REBOOT | reboot_started | Running | net01=192.168.0.13 |
| 8d2e097b-e478-4a7f-b06d-7b578dfaf7c0 | test-8d2e097b-e478-4a7f-b06d-7b578dfaf7c0 | REBOOT | reboot_started | Running | net01=192.168.0.9 |
| a2d264f0-c1d0-449c-9ba3-32041247490c | test-a2d264f0-c1d0-449c-9ba3-32041247490c | REBOOT | reboot_started | Running | net01=192.168.0.12 |
| a3d9448d-25d8-4c01-925c-825a19164970 | test-a3d9448d-25d8-4c01-925c-825a19164970 | REBOOT | reboot_started | Running | net01=192.168.0.11 |
| a993222f-4e41-483e-8e91-300be0d525a4 | test-a993222f-4e41-483e-8e91-300be0d525a4 | REBOOT | reboot_started | Running | net01=192.168.0.5 |
| ace0736a-0371-4a3d-8a7f-525e299a924b | test-ace0736a-0371-4a3d-8a7f-525e299a924b | REBOOT | reboot_started | Running | net01=192.168.0.6 |
| e5515b31-8edb-4558-bb9f-f6f8b1142db2 | test-e5515b31-8edb-4558-bb9f-f6f8b1142db2 | REBOOT | reboot_started | Running | net01=192.168.0.4 |
| f380346a-bc24-4d63-a6e0-e36b5a508a59 | test-f380346a-bc24-4d63-a6e0-e36b5a508a59 | REBOOT | reboot_started | Running | net01=192.168.0.10 |
| fc396066-7ec6-4fe9-b7bf-70f162e293fe | test-fc396066-7ec6-4fe9-b7bf-70f162e293fe | REBOOT | reboot_started | Running | net01=192.168.0.7 |
+--------------------------------------+-------------------------------------------+--------+----------------+-------------+--------------------+

The nova-compute.log:

2015-07-22 14:51:43.808 15940 WARNING nova.openstack.common.deadlock_monitor [-] service function nova.servicegroup.drivers.db._report_state is deadlocked!
2015-07-22 14:51:43.984 15940 WARNING nova.openstack.common.deadlock_monitor [-] deadlock traceback is [' File "/usr/lib/python2.7/site-packages/nova/openstack/common/loopingcall.py", line 78, in _inner\n self.f(*self.args, **self.kw)\n', ' File "/usr/lib/python2.7/site-packages/nova/openstack/common/deadlock_monitor.py", line 58, in __deco\n ret = func(*args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 116, in _report_state\n service.service_ref, state_catalog)\n', ' File "/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in service_update\n return self._manager.service_update(context, service, values)\n', ' File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 330, in service_update\n service=service_p, values=values)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call\n wait_for_reply=True, timeout=timeout)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send\n timeout=timeout)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send\n return self._send(target, ctxt, message, wait_for_reply, timeout)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 400, in _send\n conn.topic_send(topic, msg, timeout=timeout)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqp.py", line 147, in __exit__\n self._done()\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqp.py", line 136, in _done\n self.connection.reset()\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 656, in reset\n self.channel.close()\n', ' File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 163, in close\n (20, 41), # Channel.close_ok\n', ' File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait\n self.channel_id, allowed_methods)\n', ' File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 204, in _wait_method\n self.method_reader.read_method()\n', ' File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method\n self._next_method()\n', ' File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 112, in _next_method\n frame_type, channel, payload = read_frame()\n', ' File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 147, in read_frame\n frame_type, channel, size = unpack(\'>BHI\', read(7, True))\n', ' File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 245, in _read\n s = recv(131072)\n', ' File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 264, in recv\n timeout_exc=socket.timeout("timed out"))\n', ' File "/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 155, in trampoline\n return hub.switch()\n', ' File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n']
2015-07-22 14:51:44.007 15940 WARNING nova.openstack.common.deadlock_monitor [-] nova compute exit!

But the openstack-nova-compute status is always active:
[root@slot13 drivers(keystone_admin)]# systemctl status openstack-nova-compute.service
openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled)
   Active: active (running) since Tue 2015-07-21 22:18:38 CST; 19h ago
 Main PID: 15940 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           忖15940 /usr/bin/python /usr/bin/nova-compute

Jul 22 14:44:25 slot13 sudo[22639]: nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/nova-rootwrap /etc/nova/rootwrap.conf ovs-vsctl --tim...8af615-3b
Jul 22 14:44:25 slot13 sudo[22642]: nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/nova-rootwrap /etc/nova/rootwrap.conf ip link set qbr...1fa-53 up
Jul 22 14:44:25 slot13 sudo[22646]: nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/nova-rootwrap /etc/nova/rootwrap.conf ovs-vsctl --tim...48a15b-ae
Jul 22 14:44:25 slot13 sudo[22666]: nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/nova-rootwrap /etc/nova/rootwrap.conf ovs-vsctl --tim...2021fa-53

Tags: reboot
tags: added: reboot
Revision history for this message
Matt Riedemann (mriedem) wrote :

What release of nova? Are there errors in the nova-compute log?

Changed in nova:
status: New → Incomplete
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

I can't seem to find nova/openstack/common/deadlock_monitor.py anywhere....

Revision history for this message
hiyonger-ZTE_TECS (huanghuayong) wrote :
Download full text (58.2 KiB)

the I version,and nova is 2.17.0.
[root@slot13 qemu(keystone_admin)]# nova --version
2.17.0

but the nova/openstack/common/deadlock_monitor.pynova/openstack/common/deadlock_monitor.py is added by persional to check the nova-comupte service status. but the nova-compute proccess is not killed. if not killed, maybe our added code deadlock_monitor.py has bug. but the first bug come from the nova. the log is here.thks

2015-07-22 14:39:13.553 15940 AUDIT nova.compute.manager [req-5747a10e-74b6-4dc2-aaa5-429fbb6c3a9b 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: e5515b31-8edb-4558-bb9f-f6f8b1142db2] Rebooting instance
2015-07-22 14:39:13.571 15940 AUDIT nova.compute.manager [req-0ab6aa09-1ead-427e-a251-a633fd083260 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: ace0736a-0371-4a3d-8a7f-525e299a924b] Rebooting instance
2015-07-22 14:39:13.587 15940 AUDIT nova.compute.manager [req-2e757b4f-200e-41e7-b8ac-e5ca3e4b1b6f 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: 26d74bba-2e9a-4594-b6a1-1b27f337759b] Rebooting instance
2015-07-22 14:39:13.593 15940 AUDIT nova.compute.manager [req-30e78818-7801-4800-8726-c28b0f08276a 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: 7d129f4d-0af7-4ea8-9de5-3483cdcb7a68] Rebooting instance
2015-07-22 14:39:13.812 15940 AUDIT nova.compute.manager [req-8172d0d4-c1a3-465e-8039-90f5a4dae6e8 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: f380346a-bc24-4d63-a6e0-e36b5a508a59] Rebooting instance
2015-07-22 14:39:13.819 15940 AUDIT nova.compute.manager [req-baf4bc5a-8f92-455b-9027-7e88372b4eb8 92c4efc35a0a431b963e10f4552f668f 7ca2c6a04b45496cab9a41849af0263c] [instance: a993222f-4e41-483e-8e91-300be0d525a4] Rebooting instance
2015-07-22 14:39:40.669 15940 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2015-07-22 14:39:41.742 15940 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 18334
2015-07-22 14:39:41.742 15940 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 85
2015-07-22 14:39:41.742 15940 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 6
2015-07-22 14:39:41.786 15940 INFO nova.compute.resource_tracker [-] Compute_service record updated for slot13:slot13
2015-07-22 14:40:01.624 15940 INFO nova.compute.manager [-] [instance: e5515b31-8edb-4558-bb9f-f6f8b1142db2] During sync_power_state the instance has a pending task. Skip.
2015-07-22 14:40:01.625 15940 INFO nova.compute.manager [-] [instance: a993222f-4e41-483e-8e91-300be0d525a4] During sync_power_state the instance has a pending task. Skip.
2015-07-22 14:40:01.625 15940 INFO nova.compute.manager [-] [instance: ace0736a-0371-4a3d-8a7f-525e299a924b] During sync_power_state the instance has a pending task. Skip.
2015-07-22 14:40:01.625 15940 INFO nova.compute.manager [-] [instance: fc396066-7ec6-4fe9-b7bf-70f162e293fe] During sync_power_state the instance has a pending task. Skip.
2015-07-22 14:40:01.625 15940 INFO nova.compute.manager [-] [instance: 26d74bba-2e9a-4594-b6a1-1b27f337759b] During sync_power_state the instance has a pending task. Skip.
2015-07-22 14:40:01....

Changed in nova:
status: Incomplete → New
Revision history for this message
Sean Dague (sdague) wrote :

So it looks like if you tight loop reboot all of your instances continously, we're hitting an update lock issue. That's not great, and investigating it further would be good, but it's also a low likelihood issue in the field, so marking as Low.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Sean Dague (sdague) wrote :

Closing this out as Invalid as it's a really low likelyhood event, and requires just stressing the whole environment to the point of breakdown. We know that if you do that stuff will shake out. Exactly why isn't hugely clear here. Without a better tight reproduce or problem determination this bug just isn't actionable.

Changed in nova:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.