Comment 2 for bug 1961587

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote (last edit ):

I dig around in the code and I found plenty of coupling between the ComputeManager (that should be host OS independent) and sysf code in the nova.pci module.

The main coupling point is nova.pci.whitelist.Whitelist() and nova.pci.devspec.PciDeviceSpec()

Whitelist() is parses the [pci]passthrough_whitelist config option. It creates PciDeviceSpec object for each whitelist item. To validate remote_managed configs it reaches out to sysfs.
The Whitelist object is instantiated in multiple places outside of the virt driver triggering the coupling:

1) ComputeManager.init_host: The intention here is to catch config errors early and kill the compute service. It was introduced in [1] fixing a bug where an invalid config stopped the resource tracker but did not kill the compute. It is not clear why that bug exists as the ComputeManager.pre_start_hook calls the resource tracker update_available_resources that initialize the pci tracker that creates a Whitelist object.

Solution proposal: remove the Whitelist parsing from the init_host the resource tracker should be enough.

2) nova.network.neutron.API creates a Whitelist instance. It is obviously wrong as this API code is used not just from the compute but from the nova-api code as well where not whitelist config exits. It uses to get PciDeviceSpec objects that matches a certain PciDevice object

Solution proposal: The information that is read from the PciDeviceSpec should be part of the PciDevice. The PciDevice.extra_info is an unversioned dict that can store it. The PciDevice object is created via the virt driver so that would move the sysfs coupling to behind the virt driver interface.

3) PciDevTracker use both Whitelist and PciDeviceSpec to track the device usage. The PciDevTracker is used by the ResourceTrack that is instantiate by the ComputeManager.

Solution proposal: No easy solution exists this is probably a necessary coupling. However we should make sure that there is a clear interface (similar to the virt driver interface) that abstracts out the sysfs dependency. So that it can be reimplemented on Windows easily.

+1) Besides the Whitelist coupling point the nova.network.neutron.API also calls the nova.pci.utils module directly [2][3][4]

Solution proposal: This coupling might disappear when #2) is resolved as the PciDevice object will have enough information without calling out to sysfs.

Some of these were discussed over IRC extensively[5]

[1] https://review.opendev.org/c/openstack/nova/+/342301
[2] https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/network/neutron.py#L1572
[3] https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/network/neutron.py#L1584
[4] https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/network/neutron.py#L1684
[5] https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html#t2022-02-21T14:55:08