new hypervisor should appear with available resources set to 0

Bug #1567434 reported by Vasyl Saienko
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Medium
Unassigned

Bug Description

If Nova tries to check resources on newly added ironic hypervisor during instance spawning, where resources are not updated yet, instance build failed with the following error:

http://paste.openstack.org/show/493321/
http://paste.openstack.org/show/493322/

The following operation is failed, since free_disk_space = None.

self.free_disk_mb = compute.free_disk_gb * 1024

It was reproduced on Liberty nova code.

Tags: ironic
Vasyl Saienko (vsaienko)
description: updated
tags: added: ironic
Changed in nova:
assignee: nobody → Sarafraj Singh (sarafraj-singh)
Changed in nova:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

So, the nova-compute ResourceTracker is updating this field and persisting it in DB (so that the scheduler can get it) in 3 cases :
 - when the ResourceTracker (there is one per node) is initiated
 - everytime a new instance is claimed for that node
 - every 60 secs

What is strange is that the scheduler is getting the list of compute nodes when a request comes in, but that list is created by each ResourceTracker, which means that the RT wouldn't run update_available_resources.

Could you please provide us both the nova-compute logs and the nova-scheduler logs at the same time so we could verify if free_disk_gb is set ?

Changed in nova:
status: New → Incomplete
Revision history for this message
Vasyl Saienko (vsaienko) wrote :

The logs are huge, and can't be added to attach here. I have uploaded the to dropbox. There is the link https://www.dropbox.com/sh/3k1j0ewurizayc4/AADEzGQPpqpP4v83EdOijDwBa?dl=0

Changed in nova:
status: Incomplete → New
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Couldn't you just truncate the n-cpu and n-sch logs by the time (within a 5-min time window) you saw the stack trace ?

I don't need a full log, just to get the audit and debug traces at the time the issue is appearing.

TBH, I don't really see how you can possibly set a value to None, the code is pretty defensive about that.

Changed in nova:
status: New → Incomplete
summary: - new hypervisor should apper with available resources set to 0
+ new hypervisor should appear with available resources set to 0
Changed in nova:
assignee: Sarafraj Singh (sarafraj-singh) → nobody
Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :
Download full text (9.3 KiB)

From a duplicate bug https://bugs.launchpad.net/nova/+bug/1572472

There is a timeout issue that I see when running the Cisco Ironic Third party CI:

    Traceback (most recent call last):
      File "tempest/test.py", line 113, in wrapper
        return f(self, *func_args, **func_kwargs)
      File "/opt/stack/ironic/ironic_tempest_plugin/tests/scenario/test_baremetal_basic_ops.py", line 113, in test_baremetal_server_ops
        self.boot_instance()
      File "/opt/stack/ironic/ironic_tempest_plugin/tests/scenario/baremetal_manager.py", line 150, in boot_instance
        self.wait_node(self.instance['id'])
      File "/opt/stack/ironic/ironic_tempest_plugin/tests/scenario/baremetal_manager.py", line 116, in wait_node
        raise lib_exc.TimeoutException(msg)
    tempest.lib.exceptions.TimeoutException: Request timed out
    Details: Timed out waiting to get Ironic node by instance id 0a252cd1-a020-40da-911b-2becd1820306

On further investigation into the issue I see this error in the n-sch.log, which I expect is leading to the node not being available in nova, so the instance never gets assigned to the Ironic node:

2016-04-20 08:41:02.953 ERROR oslo_messaging.rpc.dispatcher [req-e5ae7418-e311-491e-99b3-86aeaa0b4e3c tempest-BaremetalBasicOps-171486249 tempest-BaremetalBasicOps-1004295680] Exception during message handling: unsupported operand type(s) for *: 'NoneType' and 'int'
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher incoming.message))
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 150, in inner
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher return func(*args, **kwargs)
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/manager.py", line 104, in select_destinations
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher dests = self.driver.select_destinations(ctxt, spec_obj)
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 53, in select_destinations
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher selected_hosts = self._schedule(context, spec_obj)
2016-04-20 08:41:02.953 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 104, in _schedule
2016-...

Read more...

Revision history for this message
Sam Betts (sambetts) wrote :

n-cpu.log:

2016-04-20 08:41:02.913 INFO nova.compute.resource_tracker [req-fd249d93-8a71-41cb-881e-734c1edc4911 None None] Final resource view: name=e5657c2c-f776-4c19-b537-4957376409d9 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-04-20 08:41:02.918 DEBUG oslo_messaging._drivers.amqpdriver [req-fd249d93-8a71-41cb-881e-734c1edc4911 None None] CALL msg_id: 14b7af32ef66444da0f5e5118e5f741d exchange 'nova' topic 'conductor' from (pid=9434) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:454
2016-04-20 08:41:03.063 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 5d487f2909f24005ae31cc6632bad181 exchange 'nova' topic 'conductor' from (pid=9434) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:454
2016-04-20 08:41:03.071 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 14b7af32ef66444da0f5e5118e5f741d from (pid=9434) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-04-20 08:41:03.072 INFO nova.compute.resource_tracker [req-fd249d93-8a71-41cb-881e-734c1edc4911 None None] Compute_service record updated for ironic-ucsm-ironic-ucsm-provider-33459:e5657c2c-f776-4c19-b537-4957376409d9

n-sch.log:

2016-04-20 08:41:02.953 ERROR oslo_messaging.rpc.dispatcher [req-e5ae7418-e311-491e-99b3-86aeaa0b4e3c tempest-BaremetalBasicOps-171486249 tempest-BaremetalBasicOps-1004295680] Exception during message handling: unsupported operand type(s) for *: 'NoneType' and 'int'

Based on comparing the n-sch and n-cpu logs, I can see that the nova scheduler starts dealing with this node, inbetween the the logs from the resource tracker handling this ComputeNode.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/308348

Changed in nova:
assignee: nobody → Sam Betts (sambetts)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/308348
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

Revision history for this message
Vasyl Saienko (vsaienko) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/413631

Changed in nova:
assignee: Sam Betts (sambetts) → Vasyl Saienko (vsaienko)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Vasyl Saienko (<email address hidden>) on branch: master
Review: https://review.openstack.org/413631

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Incomplete
assignee: Vasyl Saienko (vsaienko) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.