libvirt calls aren't reliably using tpool.Proxy

Bug #1840912 reported by Matthew Booth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Matthew Booth

Bug Description

A customer is hitting an issue with symptoms identical to bug 1045152 (from 2012). Specifically, we are frequently seeing the compute host being marked down. From log correlation, we can see that when this occurs the relevant compute is always in the middle of executing LibvirtDriver._get_disk_over_committed_size_total(). The reason for this appears to be a long-running libvirt call which is not using tpool.Proxy, and therefore blocks all other greenthreads during execution. We do not yet know why the libvirt call is slow, but we have identified the reason it is not using tpool.Proxy.

Because eventlet, we proxy libvirt calls at the point we create the libvirt connection in libvirt.Host._connect:

        return tpool.proxy_call(
            (libvirt.virDomain, libvirt.virConnect),
            libvirt.openAuth, uri, auth, flags)

This means: run libvirt.openAuth(uri, auth, flags) in a native thread. If the returned object is a libvirt.virDomain or libvirt.virConnect, wrap the returned object in a tpool.Proxy with the same autowrap rules.

There are 2 problems with this. Firstly, the autowrap list is incomplete. At the very least we need to add libvirt.virNodeDevice, libvirt.virSecret, and libvirt.NWFilter to this list as we use all of these objects in Nova. Currently none of our interactions with these objects are using the tpool proxy.

Secondly, and the specific root cause of this bug, it doesn't understand lists:

https://github.com/eventlet/eventlet/blob/ca8dd0748a1985a409e9a9a517690f46e05cae99/eventlet/tpool.py#L149

In LibvirtDriver._get_disk_over_committed_size_total() we get a list of running libvirt domains with libvirt.Host.list_instance_domains, which calls virConnect.listAllDomains(). listAllDomains() returns a *list* of virDomain, which the above code in tpool doesn't match. Consequently, none of the subsequent virDomain calls use the tpool proxy, which starves all other greenthreads.

Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/677736

Changed in nova:
assignee: nobody → Matthew Booth (mbooth-9)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/683922

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/683927

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/683930

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/683927

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/683930

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/stein)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/683922

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers