server can not launch while there's a new nova compute node registed failed

Bug #1515870 reported by lingchen
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

1. Exact version of Nova/OpenStack you are running:
kilo 2015.1.0
2.Relevant log files:
2015-11-03 16:00:29.990 3568 ERROR oslo_messaging.rpc.dispatcher [req-ce8d5d3d-6a79-4827-b472-02940be546bc 60ca5cf0e1bf44b985ee5ceae440fcfc b2a5638f40fd43a59a9be1e9c12f7d89 - - -] Exception during message handling: unsupported operand type(s) for *: 'NoneType' and 'int'
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher return func(*args, **kwargs)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 86, in select_destinations
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher filter_properties)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 67, in select_destinations
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher filter_properties)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 131, in _schedule
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher hosts = self._get_all_host_states(elevated)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 176, in _get_all_host_states
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher return self.host_manager.get_all_host_states(context)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 561, in get_all_host_states
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher host_state = self.host_state_cls(host, node, compute=compute)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 318, in host_state_cls
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher return HostState(host, node, **kwargs)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 163, in __init__
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher self.update_from_compute_node(compute)
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 208, in update_from_compute_node
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher free_disk_mb = free_gb * 1024
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
2015-11-03 16:00:29.990 3568 TRACE oslo_messaging.rpc.dispatcher
2015-11-03 16:00:29.991 3568 ERROR oslo_messaging._drivers.common [req-ce8d5d3d-6a79-4827-b472-02940be546bc 60ca5cf0e1bf44b985ee5ceae440fcfc b2a5638f40fd43a59a9be1e9c12f7d89 - - -] Returning exception unsupported operand type(s) for *: 'NoneType' and 'int' to caller
2015-11-03 16:00:29.991 3568 ERROR oslo_messaging._drivers.common [req-ce8d5d3d-6a79-4827-b472-02940be546bc 60ca5cf0e1bf44b985ee5ceae440fcfc b2a5638f40fd43a59a9be1e9c12f7d89 - - -] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n executor_callback))\n', ' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n executor_callback)\n', ' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch\n result = func(ctxt, **new_args)\n', ' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner\n return func(*args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 86, in select_destinations\n filter_properties)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 67, in select_destinations\n filter_properties)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 131, in _schedule\n hosts = self._get_all_host_states(elevated)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 176, in _get_all_host_states\n return self.host_manager.get_all_host_states(context)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 561, in get_all_host_states\n host_state = self.host_state_cls(host, node, compute=compute)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 318, in host_state_cls\n return HostState(host, node, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 163, in __init__\n self.update_from_compute_node(compute)\n', ' File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 208, in update_from_compute_node\n free_disk_mb = free_gb * 1024\n', "TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'\n"]

3.Reproduce steps:
I can launch VM success,then add a nova compute.

But the info of nova compute registed to mariadb is not enough.

eg. the fram_ram_mb in the table compute_nodes is null.

MariaDB [nova]> select created_at ,updated_at ,deleted_at ,service_id ,free_ram_mb , free_disk_gb ,id ,current_workload , hypervisor_hostname from compute_nodes where host='NailSBCJslot3rackN9';
+---------------------+------------+------------+------------+-------------+--------------+----+------------------+---------------------+
| created_at | updated_at | deleted_at | service_id | free_ram_mb | free_disk_gb | id | current_workload | hypervisor_hostname |
+---------------------+------------+------------+------------+-------------+--------------+----+------------------+---------------------+
| 2015-11-03 07:31:57 | NULL | NULL | 8 | NULL | NULL | 2 | NULL | NailSBCJslot3rackN9 |
+---------------------+------------+------------+------------+-------------+--------------+----+------------------+---------------------+

Expected result:
* the VM can launch success.
There are other nova computes in the system, where can launch VM success.
The "bad" nova compute should not cause the new VM launched failed.

Actual result:
* VM launch failed.

4. Bonus points for reproducing shell script / test:
When launch VM, check whether the items in the table of compute_nodes is null.

Tags: scheduler
lingchen (lingchen1314)
description: updated
lingchen (lingchen1314)
summary: - server can not launch while there's a nova compute node registed failed
+ server can not launch while there's a new nova compute node registed
+ failed
tags: added: scheduler
Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

Do you have logs to explore what happened while register the new compute node?

Changed in nova:
assignee: nobody → Zhenzan Zhou (zhenzan-zhou)
Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

Two issues here:
1. How did that compute_nodes db entry fall into that state?
2. The host_manager should filter compute nodes without valid resource.

I think we may open another bug for item 1.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/261145

Changed in nova:
status: New → In Progress
Changed in nova:
assignee: Zhenzan Zhou (zhenzan-zhou) → Eric Brown (ericwb)
aginwala (aginwala-f)
Changed in nova:
importance: Undecided → Low
status: In Progress → Confirmed
milestone: none → ongoing
Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Eric Brown (ericwb) → Tracy Jones (tjones-i)
Tracy Jones (tjones-i)
Changed in nova:
assignee: Tracy Jones (tjones-i) → Eric Brown (ericwb)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/261145
Reason: This patch is quite old, so I am abandoning it to keep the review queue manageable. Feel free to restore the change if you're still interested in working on it.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (6.6 KiB)

I'm seeing something similar for the Ironic host manager here:

http://logs.openstack.org/20/407220/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial-nv/0779f94/logs/screen-n-sch.txt.gz#_2017-01-04_19_05_16_398

2017-01-04 19:05:16.398 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210', u'21da4933-a128-45f6-a765-7e6bc071e0f3')" released by "nova.scheduler.host_manager._locked_update" :: held 0.002s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2017-01-04 19:05:16.398 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210-359401', u'00d62acd-6f3b-4cb8-9668-12517c84b3b9')" acquired by "nova.scheduler.host_manager._locked_update" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270
2017-01-04 19:05:16.398 20709 DEBUG nova.scheduler.host_manager [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Update host state from compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='',created_at=2017-01-04T19:02:54Z,current_workload=None,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=10,free_disk_gb=None,free_ram_mb=None,host='ubuntu-xenial-2-node-rax-ord-6478210-359401',host_ip=10.210.37.31,hypervisor_hostname='00d62acd-6f3b-4cb8-9668-12517c84b3b9',hypervisor_type='ironic',hypervisor_version=1,id=8,local_gb=10,local_gb_used=0,memory_mb=384,memory_mb_used=0,metrics=None,numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=None,service_id=None,stats={cpu_arch='x86_64'},supported_hv_specs=[HVSpec],updated_at=2017-01-04T19:04:51Z,uuid=521cf775-5a16-4111-bf26-c30fb6725716,vcpus=1,vcpus_used=0) _locked_update /opt/stack/new/nova/nova/scheduler/host_manager.py:168
2017-01-04 19:05:16.400 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210-359401', u'00d62acd-6f3b-4cb8-9668-12517c84b3b9')" released by "nova.scheduler.host_manager._locked_update" :: held 0.002s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Exception during message handling
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2017-01-04 19:0...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Right before the scheduler failure in comment 5 I see this in the n-cpu logs:

http://logs.openstack.org/20/407220/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial-nv/0779f94/logs/screen-n-cpu.txt.gz#_2017-01-04_19_03_55_573

2017-01-04 19:03:55.573 28458 INFO nova.compute.manager [req-a0b07a5e-2c5a-4550-a0b5-6d1fe2b1d9e0 - -] Deleting orphan compute node 1 hypervisor host is 00d62acd-6f3b-4cb8-9668-12517c84b3b9, nodes are set([u'74396195-3bd0-4bc3-9f51-2f9d9ff32616', u'dd8dcc23-04f4-4718-a079-e1b77d70bad9', u'36827858-740c-4f1d-a0e9-17339f441338', u'86939305-5678-4796-a250-76295faf1890', u'0f813337-25e3-4c12-a2b0-cf2222f3cc7f', u'21da4933-a128-45f6-a765-7e6bc071e0f3'])

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: In Progress → Confirmed
assignee: Eric Brown (ericwb) → nobody
importance: Low → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

For the non-Ironic case that this bug was originally reported against, it might be a duplicate of bug 1610679 which was fixed in Ocata and backported to Newton:

https://review.openstack.org/#/q/Ia68298a3f01d89bbf302ac734389f7282176c553,n,z

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.