OpenStack Compute (nova)

Ironic: TypeError: unsupported operand type(s) for *: 'NoneType' and 'int' - during select_destinations()

Series newton
Bug #1654102

Bug #1654102 reported by Matt Riedemann on 2017-01-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Matt Riedemann
	Newton	Fix Committed	Medium	Roman Podoliaka

Bug Description

This looks like a duplicate of bug 1610679 but that's for the non-Ironic scheduler host manager, this is for the Ironic host manager.

Seen here:

http://logs.openstack.org/20/407220/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial-nv/0779f94/logs/screen-n-sch.txt.gz#_2017-01-04_19_05_16_398

2017-01-04 19:05:16.398 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210', u'21da4933-a128-45f6-a765-7e6bc071e0f3')" released by "nova.scheduler.host_manager._locked_update" :: held 0.002s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2017-01-04 19:05:16.398 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210-359401', u'00d62acd-6f3b-4cb8-9668-12517c84b3b9')" acquired by "nova.scheduler.host_manager._locked_update" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270
2017-01-04 19:05:16.398 20709 DEBUG nova.scheduler.host_manager [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Update host state from compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='',created_at=2017-01-04T19:02:54Z,current_workload=None,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=10,free_disk_gb=None,free_ram_mb=None,host='ubuntu-xenial-2-node-rax-ord-6478210-359401',host_ip=10.210.37.31,hypervisor_hostname='00d62acd-6f3b-4cb8-9668-12517c84b3b9',hypervisor_type='ironic',hypervisor_version=1,id=8,local_gb=10,local_gb_used=0,memory_mb=384,memory_mb_used=0,metrics=None,numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=None,service_id=None,stats={cpu_arch='x86_64'},supported_hv_specs=[HVSpec],updated_at=2017-01-04T19:04:51Z,uuid=521cf775-5a16-4111-bf26-c30fb6725716,vcpus=1,vcpus_used=0) _locked_update /opt/stack/new/nova/nova/scheduler/host_manager.py:168
2017-01-04 19:05:16.400 20709 DEBUG oslo_concurrency.lockutils [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Lock "(u'ubuntu-xenial-2-node-rax-ord-6478210-359401', u'00d62acd-6f3b-4cb8-9668-12517c84b3b9')" released by "nova.scheduler.host_manager._locked_update" :: held 0.002s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server [req-6ce318a2-53e5-4b80-84a5-f896ab48d627 tempest-BaremetalMultitenancy-1820943769 tempest-BaremetalMultitenancy-1820943769] Exception during message handling
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 218, in inner
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server return func(*args, **kwargs)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/manager.py", line 84, in select_destinations
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server dests = self.driver.select_destinations(ctxt, spec_obj)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 51, in select_destinations
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server selected_hosts = self._schedule(context, spec_obj)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 96, in _schedule
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server hosts = self._get_all_host_states(elevated)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 134, in _get_all_host_states
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server return self.host_manager.get_all_host_states(context)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 610, in get_all_host_states
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server self._get_instance_info(context, compute))
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 180, in update
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server return _locked_update(self, compute, service, aggregates, inst_dict)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 169, in _locked_update
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server self._update_from_compute_node(compute)
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/ironic_host_manager.py", line 46, in _update_from_compute_node
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server self.free_disk_mb = compute.free_disk_gb * 1024
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
2017-01-04 19:05:16.400 20709 ERROR oslo_messaging.rpc.server

Right before the scheduler failure I see this in the n-cpu logs:

http://logs.openstack.org/20/407220/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial-nv/0779f94/logs/screen-n-cpu.txt.gz#_2017-01-04_19_03_55_573

2017-01-04 19:03:55.573 28458 INFO nova.compute.manager [req-a0b07a5e-2c5a-4550-a0b5-6d1fe2b1d9e0 - -] Deleting orphan compute node 1 hypervisor host is 00d62acd-6f3b-4cb8-9668-12517c84b3b9, nodes are set([u'74396195-3bd0-4bc3-9f51-2f9d9ff32616', u'dd8dcc23-04f4-4718-a079-e1b77d70bad9', u'36827858-740c-4f1d-a0e9-17339f441338', u'86939305-5678-4796-a250-76295faf1890', u'0f813337-25e3-4c12-a2b0-cf2222f3cc7f', u'21da4933-a128-45f6-a765-7e6bc071e0f3'])

I'm not really sure what's going on with the compute node record though, if it's getting deleted or what, because once the resource tracker creates the compute node record it should be setting free_disk_gb shortly thereafter and then updating the record in the database. There is definitely a window of time between when the compute node record is first created and when it's updated, but that doesn't appear to be the case here since the updated_at field is set:

updated_at=2017-01-04T19:04:51Z

But the failure is about 25 seconds after that.

This also seems to be happening quite a bit since January 2nd 2017:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22_update_from_compute_node%5C%22%20AND%20message%3A%5C%22TypeError%3A%20unsupported%20operand%20type(s)%20for%20*%3A%20'NoneType'%20and%20'int'%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22&from=7d

See original description

Tags:

Matt Riedemann (mriedem) on 2017-01-04

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium
description:	updated

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-01-05:

Actually when looking at logstash, the failures are mostly in the non-Ironic jobs which are using keystone v3, and those are failing because we create the compute_nodes record in the database from the resource tracker in the nova-compute service, but we fail to update the compute node record which sets the free_disk_gb value:

http://logs.openstack.org/65/416765/1/check/gate-tempest-dsvm-neutron-identity-v3-only-full-ubuntu-xenial-nv/faf1363/logs/screen-n-cpu.txt.gz#_2017-01-04_23_15_06_674

And it looks like that's happening because of some keystone v3 auth issue when talking to the placement service.

As for the 1/2 date, ~1/3 is when we made the placement-api run in all jobs in devstack:

https://review.openstack.org/#/c/412537/

Which would include the keystone v3 API job.

Why we're hitting the auth issue with keystone v3 I don't know, but it's when we're trying to talk to the placement API for the resource provider for the compute node, which should be handled with a safe_connect decorator...

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-01-05:

This is the auth failure on the keystone side:

http://logs.openstack.org/65/416765/1/check/gate-tempest-dsvm-neutron-identity-v3-only-full-ubuntu-xenial-nv/faf1363/logs/apache/keystone.txt.gz#_2017-01-04_23_15_06_670

2017-01-04 23:15:06.670 3310 WARNING keystone.auth.plugins.core [req-38ce6df3-1dca-4b66-a680-d2691bb4955f - - - - -] Could not find user: placement
2017-01-04 23:15:06.670 3310 WARNING keystone.common.wsgi [req-38ce6df3-1dca-4b66-a680-d2691bb4955f - - - - -] Authorization failed. The request you have made requires authentication. from 10.18.76.58

We don't handle Unauthorized in safe_connect in the nova scheduler report client, we probably should.

Also, it looks like this devstack patch will fix the placement stuff for keystone v3:

https://review.openstack.org/#/c/416599/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-01-05:

I see that we check for Unauthorized responses in the nova-status upgrade check command, but not in safe_connect in the scheduler report client, so we need to handle that.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-01-05:

I've created bug 1654107 to handle the Unauthorized exception separately from the Ironic host manager update race.

Changed in nova:
importance:	Medium → Low

Matt Riedemann (mriedem) on 2017-01-05

Changed in nova:
assignee:	nobody → Matt Riedemann (mriedem)
importance:	Low → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-05: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/416822

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Shivanand Tendulker (stendulker) wrote on 2017-01-05:

Download full text (13.1 KiB)

This issue is seen in Ironic HPE CI jobs
https://15.184.0.52/28/372428/18/check/gate-tempest-ironic-ilo-driver-iscsi_ilo/3d53e55/

See relevant portions of n-sch.log, n-cpu.log and key.log below.

This issue is seen in Ironic HPE CI jobs
https://15.184.0.52/28/372428/18/check/gate-tempest-ironic-ilo-driver-iscsi_ilo/3d53e55/

See relevant portions of n-sch.log, n-cpu.log and key.log below.

n-sch.log
----------
2017-01-03 23:35:05.933 25579 DEBUG oslo_concurrency.lockutils [req-7ac1f5ec-68a1-47e1-b4a6-0e850839da16 tempest-BaremetalBasicOps-1350158805 tempest-BaremetalBasicOps-1350158805] Lock "(u'jenkins-slave1', u'c25e2761-3b47-4dee-9737-8bb99c469a4b')" acquired by "nova.scheduler.host_manager._locked_update" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270
2017-01-03 23:35:05.935 25579 DEBUG nova.scheduler.host_manager [req-7ac1f5ec-68a1-47e1-b4a6-0e850839da16 tempest-BaremetalBasicOps-1350158805 tempest-BaremetalBasicOps-1350158805] Update host state from compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='',created_at=2017-01-03T23:32:43Z,current_workload=None,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=99,free_disk_gb=None,free_ram_mb=None,host='jenkins-slave1',host_ip=10.13.120.197,hypervisor_hostname='c25e2761-3b47-4dee-9737-8bb99c469a4b',hypervisor_type='ironic',hypervisor_version=1,id=1,local_gb=99,local_gb_used=0,memory_mb=16384,memory_mb_used=0,metrics=None,numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=None,service_id=None,stats={boot_mode='bios',cpu_arch='x86_64'},supported_hv_specs=[HVSpec],updated_at=2017-01-03T23:34:44Z,uuid=93e176d6-dca2-46c4-bd41-b41570e61e2c,vcpus=2,vcpus_used=0) _locked_update /opt/stack/nova/nova/scheduler/host_manager.py:168
2017-01-03 23:35:05.938 25579 DEBUG oslo_concurrency.lockutils [req-7ac1f5ec-68a1-47e1-b4a6-0e850839da16 tempest-BaremetalBasicOps-1350158805 tempest-BaremetalBasicOps-1350158805] Lock "(u'jenkins-slave1', u'c25e2761-3b47-4dee-9737-8bb99c469a4b')" released by "nova.scheduler.host_manager._locked_update" :: held 0.005s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server [req-7ac1f5ec-68a1-47e1-b4a6-0e850839da16 tempest-BaremetalBasicOps-1350158805 tempest-BaremetalBasicOps-1350158805] Exception during message handling
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 218, in inner
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     return func(*args, **kwargs)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/manager.py", line 84, in select_destinations
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     dests = self.driver.select_destinations(ctxt, spec_obj)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 51, in select_destinations
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     selected_hosts = self._schedule(context, spec_obj)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 96, in _schedule
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     hosts = self._get_all_host_states(elevated)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 134, in _get_all_host_states
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     return self.host_manager.get_all_host_states(context)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/host_manager.py", line 610, in get_all_host_states
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     self._get_instance_info(context, compute))
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/host_manager.py", line 180, in update
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     return _locked_update(self, compute, service, aggregates, inst_dict)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/host_manager.py", line 169, in _locked_update
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     self._update_from_compute_node(compute)
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/scheduler/ironic_host_manager.py", line 46, in _update_from_compute_node
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server     self.free_disk_mb = compute.free_disk_gb * 1024
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
2017-01-03 23:35:05.938 25579 ERROR oslo_messaging.rpc.server

n-cpu.log
-----------
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager [req-004a8db8-093a-485c-b429-b22c1a69be83 - -] Error updating resources for node c25e2761-3b47-4dee-9737-8bb99c469a4b.
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager Traceback (most recent call last):
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 6537, in update_available_resource_for_node
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 540, in update_available_resource
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return f(*args, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 564, in _update_available_resource
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 451, in _init_compute_node
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     self.scheduler_client.update_resource_stats(self.compute_node)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/__init__.py", line 60, in update_resource_stats
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     self.reportclient.update_resource_stats(compute_node)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/__init__.py", line 37, in __run_method
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/report.py", line 369, in update_resource_stats
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     compute_node.hypervisor_hostname)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/report.py", line 258, in _ensure_resource_provider
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     rp = self._get_resource_provider(uuid)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/report.py", line 44, in wrapper
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return f(self, *a, **k)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/report.py", line 171, in _get_resource_provider
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     resp = self.get("/resource_providers/%s" % uuid)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/opt/stack/nova/nova/scheduler/client/report.py", line 136, in get
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     endpoint_filter=self.ks_filter, raise_exc=False)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py", line 710, in get
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return self.request(url, 'GET', **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/positional/__init__.py", line 101, in inner
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return wrapped(*args, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py", line 467, in request
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     auth_headers = self.get_auth_headers(auth)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py", line 770, in get_auth_headers
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return auth.get_headers(self, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/plugin.py", line 90, in get_headers
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     token = self.get_token(session)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/identity/base.py", line 90, in get_token
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return self.get_access(session).auth_token
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/identity/base.py", line 136, in get_access
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     self.auth_ref = self.get_auth_ref(session)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/identity/generic/base.py", line 198, in get_auth_ref
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return self._plugin.get_auth_ref(session, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/identity/v3/base.py", line 167, in get_auth_ref
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     authenticated=False, log=False, **rkwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py", line 718, in post
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return self.request(url, 'POST', **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/positional/__init__.py", line 101, in inner
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     return wrapped(*args, **kwargs)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py", line 607, in request
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager     raise exceptions.from_response(resp, method, url)
2017-01-03 23:32:43.667 29444 ERROR nova.compute.manager Unauthorized: The request you have made requires authentication. (HTTP 401) (Request-ID: req-5ff4d23d-9909-4694-b441-d25c36fde7a2)

key.log
-----------
2017-01-03 23:32:43.632 9144 INFO keystone.common.wsgi [req-aa116edf-2720-4d03-a94a-999feb18e46a - - - - -] GET http://10.13.120.197:35357/v3/
2017-01-03 23:32:43.639 9138 DEBUG keystone.middleware.auth [req-5ff4d23d-9909-4694-b441-d25c36fde7a2 - - - - -] There is either no auth token in the request or the certificate issuer is not trusted. No auth context will be set. fill_context /opt/stack/keystone/keystone/middleware/auth.py:188
2017-01-03 23:32:43.640 9138 INFO keystone.common.wsgi [req-5ff4d23d-9909-4694-b441-d25c36fde7a2 - - - - -] POST http://10.13.120.197/identity/v3/auth/tokens
2017-01-03 23:32:43.658 9138 WARNING keystone.auth.plugins.core [req-5ff4d23d-9909-4694-b441-d25c36fde7a2 - - - - -] Could not find user: placement

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-07: Fix merged to nova (master)

Reviewed: https://review.openstack.org/416822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f1245a56c238646a056a20caa543f2a254b83d7
Submitter: Jenkins
Branch: master

commit 2f1245a56c238646a056a20caa543f2a254b83d7
Author: Matt Riedemann <email address hidden>
Date: Wed Jan 4 22:06:23 2017 -0500

Fix TypeError in _update_from_compute_node race

    A compute node record is created in the database from the
    _init_compute_node method in the resource tracker in the
    nova-compute service. It is created without the updated_at
    or free_disk_gb fields being set.

    Shortly after it's created, update_resource_stats() is called
    on the scheduler report client which calls compute_node.save()
    which calls the update_compute_node() method in the DB API which
    *always* updates the updated_at field even if nothing else
    was changed.

    So at this point, we have a compute node with updated_at set
    but not free_disk_gb. The free_disk_gb field gets set by the
    _update_usage_from_instances() method in the resource tracker
    and then that value is later saved off when the compute node
    is updated in the database on a second call to the
    update_resource_stats() method in the scheduler report client.
    At that point the free_disk_gb field is set on the compute
    node record.

    There is a race in between the compute node create and initial
    update but before free_disk_gb is set where the scheduler host
    manager can be attempting to update the HostState object from
    the same compute node when selecting host destinations during
    a server build. If that picks up the compute node before its
    free_disk_gb field is set it will result in a TypeError when
    trying to multiply None * 1024.

    Change 36a0ba9c8141b445f2c6bfc093fde4cc98d229b2 was an earlier
    attempt at fixing this bug and shortened the race window but
    didn't handle the problem where updated_at is set but free_disk_gb
    is not yet.

    This change builds on that by simply checking for the thing
    the scheduler host manager actually cares about, which is the
    free_disk_gb field.

Closes-Bug: #1654102
Closes-Bug: #1610679

Change-Id: I37b75dabb3ea7ec2d5678550d9aff30b1a0c29e6

Reviewed:  https://review.openstack.org/416822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f1245a56c238646a056a20caa543f2a254b83d7
Submitter: Jenkins
Branch:    master

commit 2f1245a56c238646a056a20caa543f2a254b83d7
Author: Matt Riedemann <mriedem@us.ibm.com>
Date:   Wed Jan 4 22:06:23 2017 -0500

Fix TypeError in _update_from_compute_node race
    
    A compute node record is created in the database from the
    _init_compute_node method in the resource tracker in the
    nova-compute service. It is created without the updated_at
    or free_disk_gb fields being set.
    
    Shortly after it's created, update_resource_stats() is called
    on the scheduler report client which calls compute_node.save()
    which calls the update_compute_node() method in the DB API which
    *always* updates the updated_at field even if nothing else
    was changed.
    
    So at this point, we have a compute node with updated_at set
    but not free_disk_gb. The free_disk_gb field gets set by the
    _update_usage_from_instances() method in the resource tracker
    and then that value is later saved off when the compute node
    is updated in the database on a second call to the
    update_resource_stats() method in the scheduler report client.
    At that point the free_disk_gb field is set on the compute
    node record.
    
    There is a race in between the compute node create and initial
    update but before free_disk_gb is set where the scheduler host
    manager can be attempting to update the HostState object from
    the same compute node when selecting host destinations during
    a server build. If that picks up the compute node before its
    free_disk_gb field is set it will result in a TypeError when
    trying to multiply None * 1024.
    
    Change 36a0ba9c8141b445f2c6bfc093fde4cc98d229b2 was an earlier
    attempt at fixing this bug and shortened the race window but
    didn't handle the problem where updated_at is set but free_disk_gb
    is not yet.
    
    This change builds on that by simply checking for the thing
    the scheduler host manager actually cares about, which is the
    free_disk_gb field.
    
    Closes-Bug: #1654102
    Closes-Bug: #1610679
    
    Change-Id: I37b75dabb3ea7ec2d5678550d9aff30b1a0c29e6

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-27: Fix included in openstack/nova 15.0.0.0b3

This issue was fixed in the openstack/nova 15.0.0.0b3 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-31: Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/427080

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-11: Fix merged to nova (stable/newton)

#10

Reviewed: https://review.openstack.org/427080
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e30b75097840019c38e0619e70924ddc9f9487a0
Submitter: Jenkins
Branch: stable/newton

commit e30b75097840019c38e0619e70924ddc9f9487a0
Author: Matt Riedemann <email address hidden>
Date: Wed Jan 4 22:06:23 2017 -0500

Fix TypeError in _update_from_compute_node race

    This change builds on that by simply checking for the thing
    the scheduler host manager actually cares about, which is the
    free_disk_gb field.

Closes-Bug: #1654102
Closes-Bug: #1610679

(cherry-picked from 2f1245a56c238646a056a20caa543f2a254b83d7)
Change-Id: I37b75dabb3ea7ec2d5678550d9aff30b1a0c29e6

Reviewed:  https://review.openstack.org/427080
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e30b75097840019c38e0619e70924ddc9f9487a0
Submitter: Jenkins
Branch:    stable/newton

commit e30b75097840019c38e0619e70924ddc9f9487a0
Author: Matt Riedemann <mriedem@us.ibm.com>
Date:   Wed Jan 4 22:06:23 2017 -0500

Fix TypeError in _update_from_compute_node race
    
    A compute node record is created in the database from the
    _init_compute_node method in the resource tracker in the
    nova-compute service. It is created without the updated_at
    or free_disk_gb fields being set.
    
    Shortly after it's created, update_resource_stats() is called
    on the scheduler report client which calls compute_node.save()
    which calls the update_compute_node() method in the DB API which
    *always* updates the updated_at field even if nothing else
    was changed.
    
    So at this point, we have a compute node with updated_at set
    but not free_disk_gb. The free_disk_gb field gets set by the
    _update_usage_from_instances() method in the resource tracker
    and then that value is later saved off when the compute node
    is updated in the database on a second call to the
    update_resource_stats() method in the scheduler report client.
    At that point the free_disk_gb field is set on the compute
    node record.
    
    There is a race in between the compute node create and initial
    update but before free_disk_gb is set where the scheduler host
    manager can be attempting to update the HostState object from
    the same compute node when selecting host destinations during
    a server build. If that picks up the compute node before its
    free_disk_gb field is set it will result in a TypeError when
    trying to multiply None * 1024.
    
    Change 36a0ba9c8141b445f2c6bfc093fde4cc98d229b2 was an earlier
    attempt at fixing this bug and shortened the race window but
    didn't handle the problem where updated_at is set but free_disk_gb
    is not yet.
    
    This change builds on that by simply checking for the thing
    the scheduler host manager actually cares about, which is the
    free_disk_gb field.
    
    Closes-Bug: #1654102
    Closes-Bug: #1610679
    
    (cherry-picked from 2f1245a56c238646a056a20caa543f2a254b83d7)
    Change-Id: I37b75dabb3ea7ec2d5678550d9aff30b1a0c29e6

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-27: Fix included in openstack/nova 14.0.4

#11

This issue was fixed in the openstack/nova 14.0.4 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.