Failover Cluster WMI provider issue

Bug #1798069 reported by Lucian Petrut
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
compute-hyperv
Fix Released
Undecided
Lucian Petrut
os-win
Fix Released
Undecided
Lucian Petrut

Bug Description

The Failover Cluster WMI provider is quite unreliable, especially in recent Windows Server 2016 versions.

After failovers, some valid WMI queries fail with "Invalid Property" errors, which really looks like a WMI provider bug. Event listeners are affected as well, for which reason we're missing VM failover events.

We should switch to the underlying C library, avoiding the WMI provider as much as possible. This way, we're going down a layer, increasing the reliability as well as performance of cluster related operations.

Changed in os-win:
assignee: nobody → Lucian Petrut (petrutlucian94)
Changed in compute-hyperv:
assignee: nobody → Lucian Petrut (petrutlucian94)
Changed in os-win:
status: New → In Progress
Changed in compute-hyperv:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-win (master)

Reviewed: https://review.openstack.org/607611
Committed: https://git.openstack.org/cgit/openstack/os-win/commit/?id=6c6955f9f23f3fabf71d06f0a52d6d770aedab2c
Submitter: Zuul
Branch: master

commit 6c6955f9f23f3fabf71d06f0a52d6d770aedab2c
Author: Lucian Petrut <email address hidden>
Date: Fri Sep 28 16:57:19 2018 +0300

    Limit Failover Cluster WMI provider usage

    The Failover Cluster WMI provider is quite unreliable, especially
    in recent Windows Server 2016 versions.

    After failovers, some valid WMI queries fail with "Invalid Property"
    errors, which really looks like a WMI provider bug. Event listeners
    are affected as well, for which reason we're missing VM failover events.

    This change refactors "clusterutils", switching to the underlying
    C library.

    The new failover event listener cannot provide the source of the
    migrated instances, so we're keeping the old one as well for backwards
    compatibility, for now.

    The only other places in which we're still using WMI are the methods
    that add or destroy VM cluster roles.

    The reason is that we'd have to explicitly create the the cluster
    resources and group, probably set some resource dependencies and
    properties. For now, it's easier/safer to just stick with the WMI
    method (which really shouldn't fail).

    Also, destroying VM cluster groups using clusapi's DestroyClusterGroup
    function behaves strange. VMs get recreated asyncronuously and put in
    suspended state, breaking everything. We're avoiding it for now.

    Closes-Bug: #1798069

    Change-Id: I63d1aa3a6f9fb12d08478eb41fe973b1582b540c

Changed in os-win:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-win 4.1.1

This issue was fixed in the openstack/os-win 4.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to compute-hyperv (master)

Reviewed: https://review.openstack.org/607861
Committed: https://git.openstack.org/cgit/openstack/compute-hyperv/commit/?id=feb4bae60fd86d107b1f6479bdf38c16d623398d
Submitter: Zuul
Branch: master

commit feb4bae60fd86d107b1f6479bdf38c16d623398d
Author: Lucian Petrut <email address hidden>
Date: Thu Oct 4 12:14:13 2018 +0300

    Switch to new instance failover listener

    We had a bunch of issues with the WMI based event listener, which
    appear to be bugs in the WMI provider.

    We've added a new listener that uses the underlying C library. The
    only issue is that this new listener cannot tell us the source of
    the migrated VMs, so we'll have to rely on the information from
    Nova.

    Closes-Bug: #1798069
    Depends-On: I63d1aa3a6f9fb12d08478eb41fe973b1582b540c

    Change-Id: I9e777351eca84274cafc3f04b6bc27c3f6b572ca

Changed in compute-hyperv:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/compute-hyperv 9.0.0.0rc1

This issue was fixed in the openstack/compute-hyperv 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.