Incorrect status check during clustered vm migration

Bug #1628938 reported by Lucian Petrut
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-win
Fix Released
Undecided
Lucian Petrut

Bug Description

When migrating a clustered VM, we rely on the cluster resource group status in order to determine whether the VM was properly migrated.

After a migration is requested, the resource group immediately enters a 'pending' state. At the moment, the method polling the resource group state uses a list of valid transition states, incorrectly appending the desired state of the resource group to this list.

This causes issues if the migration fails, as we're not going to detect this and we'll continue polling indefinitely (unless a timeout is passed).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-win (master)

Reviewed: https://review.openstack.org/379486
Committed: https://git.openstack.org/cgit/openstack/os-win/commit/?id=d4ad19dc0fa90466d2b0ac0b9daa0830fdb6c467
Submitter: Jenkins
Branch: master

commit d4ad19dc0fa90466d2b0ac0b9daa0830fdb6c467
Author: Lucian Petrut <email address hidden>
Date: Thu Sep 29 16:59:09 2016 +0300

    Fix clustered VM migration status polling

    When migrating a clustered VM, we rely on the cluster resource
    group status in order to determine whether the VM was properly
    migrated.

    After a migration is requested, the resource group immediately
    enters a 'pending' state. At the moment, the method polling the
    resource group state uses a list of valid transition states,
    incorrectly appending the desired state of the resource group
    to this list.

    This causes issues if the migration fails, as we're not going to
    detect this and we'll continue polling indefinitely (unless a
    timeout is passed).

    This change fixes this issue by removing the desired state from
    the valid transition states list.

    Change-Id: Id1bdd6ccc6a2a6abc99e86ca362e03eb5adb66a2
    Closes-Bug: #1628938

Changed in os-win:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-win (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/384896

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-win (stable/newton)

Reviewed: https://review.openstack.org/384896
Committed: https://git.openstack.org/cgit/openstack/os-win/commit/?id=031f349d495f7e5437c30d4501419b94ad6122c0
Submitter: Jenkins
Branch: stable/newton

commit 031f349d495f7e5437c30d4501419b94ad6122c0
Author: Lucian Petrut <email address hidden>
Date: Thu Sep 29 16:59:09 2016 +0300

    Fix clustered VM migration status polling

    When migrating a clustered VM, we rely on the cluster resource
    group status in order to determine whether the VM was properly
    migrated.

    After a migration is requested, the resource group immediately
    enters a 'pending' state. At the moment, the method polling the
    resource group state uses a list of valid transition states,
    incorrectly appending the desired state of the resource group
    to this list.

    This causes issues if the migration fails, as we're not going to
    detect this and we'll continue polling indefinitely (unless a
    timeout is passed).

    This change fixes this issue by removing the desired state from
    the valid transition states list.

    Change-Id: Id1bdd6ccc6a2a6abc99e86ca362e03eb5adb66a2
    Closes-Bug: #1628938
    (cherry picked from commit d4ad19dc0fa90466d2b0ac0b9daa0830fdb6c467)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-win 1.2.1

This issue was fixed in the openstack/os-win 1.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-win 1.3.0

This issue was fixed in the openstack/os-win 1.3.0 release.

Revision history for this message
Lucian Petrut (petrutlucian94) wrote :

Looks like we're still having issues with this. We incorrectly assumed that the group state will immediately become 'Pending' when a migration is requested, judging by the expected 'ErrorPending' return code of the move function. This is not the case, as the group can be reported as 'Online' while being on the source node, before a live migration is actually performed successfully.

Changed in os-win:
status: Fix Released → In Progress
assignee: nobody → Lucian Petrut (petrutlucian94)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-win (master)

Fix proposed to branch: master
Review: https://review.openstack.org/441250

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-win (master)

Reviewed: https://review.openstack.org/441250
Committed: https://git.openstack.org/cgit/openstack/os-win/commit/?id=e09f672e228379dbeeebad4156891ddb75a06769
Submitter: Jenkins
Branch: master

commit e09f672e228379dbeeebad4156891ddb75a06769
Author: Lucian Petrut <email address hidden>
Date: Fri Mar 3 19:15:06 2017 +0200

    Fix cluster group migration status checks

    We incorrectly assumed that the group state will immediately become
    'Pending' when a migration is requested, judging by the expected
    'ErrorPending' return code of the move function.

    This is not the case, as the group can be reported as 'Online' while
    being on the source node, before a live migration is actually
    performed successfully. This is incorrectly treated as an exception
    at the moment.

    This change refactors the way in which we check pending migrations.
    We rely on cluster group state change events in this matter.

    In case of migration timeouts, we're now attempting to cancel pending
    migrations.

    Also, some of the private methods functionality is now publicly
    exposed (e.g. canceling migrations, detecting queued migrations).

    In the future, we're planning to implement a custom cluster resource
    dll, having a finer control over this.

    Closes-Bug: #1628938

    Change-Id: I6c268355213718d4d66e0e77a89c558cde3213b6

Changed in os-win:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-win 2.0.0

This issue was fixed in the openstack/os-win 2.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-win (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/473313

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-win (stable/ocata)

Change abandoned by Claudiu Belu (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/473313

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.