boot n-cpu from a clean database lead to pci passthrough stop work

Bug #1536509 reported by Yongli He
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Yongli He

Bug Description

first time boot nova or clean the pci devices from nova database then reboot n-cpu, will leading to scheduling the VM with pci devices fail. if then reboot the n-cpu, everything is ok.

this bug now block the third-party pci-test.

Yongli He (yongli-he)
description: updated
Yongli He (yongli-he)
description: updated
Revision history for this message
Yongli He (yongli-he) wrote :
Download full text (4.5 KiB)

the reason of this problem because reported pci devices's "status" is "none" when saved to pci_devices

2016-01-21 16:49:15.830 DEBUG oslo_concurrency.lockutils [req-a2b7a457-c473-4bff-92b1-ef15b3d63b4a None None] Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" :: held 0.159s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
2016-01-21 16:49:15.830 ERROR nova.compute.manager [req-a2b7a457-c473-4bff-92b1-ef15b3d63b4a None None] Error updating resources for node shci-pci-1.
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager Traceback (most recent call last):
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 6300, in update_available_resource
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager rt.update_available_resource(context)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 492, in update_available_resource
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager self._update_available_resource(context, resources)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager return f(*args, **kwargs)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 554, in _update_available_resource
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager self._update(context)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 664, in _update
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager self.pci_tracker.save(context)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/pci/manager.py", line 87, in save
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager dev.save()
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 205, in wrapper
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager ctxt, self, fn.__name__, args, kwargs)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/opt/stack/nova/nova/conductor/rpcapi.py", line 246, in object_action
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager objmethod=objmethod, args=args, kwargs=kwargs)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager retry=self.retry)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager timeout=timeout, retry=retry)
2016-01-21 16:49:15.830 796 ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send
2016-01-21 16:49:15.830 796 ERROR nova.com...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/270695

Changed in nova:
assignee: nobody → Eli Qiao (taget-9)
status: New → In Progress
Eli Qiao (taget-9)
Changed in nova:
assignee: Eli Qiao (taget-9) → nobody
status: In Progress → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/271925

Changed in nova:
assignee: nobody → Yongli He (yongli-he)
status: New → In Progress
Revision history for this message
Yongli He (yongli-he) wrote :

root cause:
 when pci_devices table is empty, the pci devices will detached from DB because it's _context is None. this will leading to allocation pci devices fail, refer to: pci/stats.py

 _handle_device_dependents

Revision history for this message
Yongli He (yongli-he) wrote :

debugging informations about the root cause:

2016-01-25 13:56:59.277 DEBUG nova.compute.utils [req-c3dcb81e-18d7-45f2-a5cc-7254f037b088 tempest-ServersWithSpecificFlavorTestJSON-1773823374 tempest-ServersWithSpecificFlavorTestJSON-573174746] ***********fault_payload is None 'NoneType' object has no attribute 'to_dict' notify_about_instance_usage /opt/stack/nova/nova/compute/utils.py:285
Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 2007, in _build_and_run_instance
    with rt.instance_claim(context, instance, limits):
  File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
    return f(*args, **kwargs)
  File "/opt/stack/nova/nova/compute/resource_tracker.py", line 181, in instance_claim
    overhead=overhead, limits=limits)
  File "/opt/stack/nova/nova/compute/claims.py", line 94, in __init__
    self._claim_test(resources, limits)
  File "/opt/stack/nova/nova/compute/claims.py", line 152, in _claim_test
    self._test_pci()]
  File "/opt/stack/nova/nova/compute/claims.py", line 194, in _test_pci
    self.instance)
  File "/opt/stack/nova/nova/pci/manager.py", line 211, in claim_instance
    devs = self._claim_instance(context, instance)
  File "/opt/stack/nova/nova/pci/manager.py", line 184, in _claim_instance
    instance_cells)
  File "/opt/stack/nova/nova/pci/stats.py", line 175, in consume_requests
    self._handle_device_dependents(pci_dev)
  File "/opt/stack/nova/nova/pci/stats.py", line 202, in _handle_device_dependents
    pci_dev.parent_addr)
  File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 172, in wrapper
    args, kwargs)
  File "/opt/stack/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
    args=args, kwargs=kwargs)
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 146, in call
    msg_ctxt = self.serializer.serialize_context(ctxt)
  File "/opt/stack/nova/nova/rpc.py", line 139, in serialize_context
    return context.to_dict()
AttributeError: 'NoneType' object has no attribute 'to_dict'

Revision history for this message
lvmxh (shaohef) wrote :

This bug should be critical. It block PCI and SRIOV CI.
Also block nova users with pci device pass through.

Revision history for this message
Yongli He (yongli-he) wrote :

this been fixed by another patch, so no block bug now. PCI CI is back on line.

Yongli He (yongli-he)
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Yongli He (<email address hidden>) on branch: master
Review: https://review.openstack.org/271925
Reason: merged this one:https://review.openstack.org/#/c/269764/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.