Comment 4 for bug 1804262

Revision history for this message
Matt Riedemann (mriedem) wrote :

Ah I see from that "get_storage_users" code that each compute service writes to the "instances_path" config option location a dict keyed by hostname to the last time that the check ran for that host. Then _run_image_cache_manager_pass will run over all of those instances across all nodes.

From the MessagingTimeout traceback, it looks like this is what's killing you since it's pulling BDMs for 705 instances from the database:

https://github.com/openstack/nova/blob/56811efa3583dfa3c03f9e43a9802bbe21e45bbd/nova/virt/imagecache.py#L53

That should probably be chunked for paging somehow, maybe process in groups of 50 or something?

Alternatively, just get the BDMs per instance here:

https://github.com/openstack/nova/blob/56811efa3583dfa3c03f9e43a9802bbe21e45bbd/nova/virt/imagecache.py#L80

Also note that code only cares about swap BDMs which are defined with this filter:

https://github.com/openstack/nova/blob/56811efa3583dfa3c03f9e43a9802bbe21e45bbd/nova/block_device.py#L439

So that BDM query could also be optimized to only return swap BDMs.