table nova.pci_devices lost device status every time. && PciDeviceList.get_by_compute_node pass a wrong parameter

Bug #1333498 reported by Young
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Yongli He

Bug Description

I'm trying to use SR-IOV in openstack havana.

After a pci device(virtual function in my case) is allocated to a vm, the status of according record in table 'nova.pci_devices' is updated to allocated.
However, when I restart the openstack services, the devices' records are updated to available again. Actually, the pci devices are allocated to vm.

I looked into the code and found the problem below.

In the __init__ function of PciDevTracker in pci/pci_manager.py , it requires node_id.
        If a node_id is passed in, it will fetch pci devices information
        from database, otherwise, it will create an empty devices list

However, the code initiating PciDevTracker (in compute/resource_tracker.py) never passes node_id. So it will never fetch pci devices information from database and the status will be updated to 'available' every time we restart services.

=================

Then I try do add the node id and want to see what will happen.

Then I got this error
     self.pci_tracker = pci_manager.PciDevTracker(node_id=1)
   File "/usr/lib/python2.6/site-packages/nova/pci/pci_manager.py", line 67, in __init__
     context, node_id)
   File "/usr/lib/python2.6/site-packages/nova/objects/base.py", line 106, in wrapper
     args, kwargs)
   File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 492, in object_class_action
     objver=objver, args=args, kwargs=kwargs)
   File "/usr/lib/python2.6/site-packages/nova/rpcclient.py", line 85, in call
     return self._invoke(self.proxy.call, ctxt, method, **kwargs)
   File "/usr/lib/python2.6/site-packages/nova/rpcclient.py", line 63, in _invoke
     return cast_or_call(ctxt, msg, **self.kwargs)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/proxy.py", line 126, in call
     result = rpc.call(context, real_topic, msg, timeout)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/__init__.py", line 139, in call
     return _get_impl().call(CONF, context, topic, msg, timeout)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 783, in call
     rpc_amqp.get_connection_pool(conf, Connection))
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 572, in call
     rv = multicall(conf, context, topic, msg, timeout, connection_pool)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 558, in multicall
     pack_context(msg, context)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 308, in pack_context
     for (key, value) in context.to_dict().iteritems()])
 AttributeError: 'module' object has no attribute 'to_dict'

It pass the module context to pci_device_obj.PciDeviceList.get_by_compute_node. But to_dict is a function of RequestContext in module context. It seems that it should pass a RequestContext instance instead of the module context.

Tags: compute pci
Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
Young (afe-young) wrote :
Revision history for this message
Anne Gentle (annegentle) wrote :

Thanks for your bug report. There is already a patch proposed for removing cn_id in PCITracker here: https://review.openstack.org/102298

Sorry that doesn't triage your bug any further but I believe it will address the issue.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → jiang, yunhong (yunhong-jiang)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

Upstream patch is stalled in merge conflict for a week, not sure that it's really in progress any more.

Changed in nova:
status: In Progress → Confirmed
assignee: jiang, yunhong (yunhong-jiang) → nobody
importance: Medium → Low
tags: added: pci-passthrough
tags: added: icehouse-backport-potential
Yongli He (yongli-he)
Changed in nova:
assignee: nobody → Yongli He (yongli-he)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/103759
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Yongli He (yongli-he) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/102298
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

tags: added: pci
removed: pci-passthrough
Alan Pevec (apevec)
tags: removed: icehouse-backport-potential
Revision history for this message
Augustina Ragwitz (auggy) wrote :

A previous comment indicated this bug had been addressed by https://review.openstack.org/#/c/148904/ so marking this as Fix Released.

Changed in nova:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.