Transient mismatch between OS-EXT-STS:vm_state and OS-EXT-STS:power_state

Bug #1486475 reported by Jordan Pittier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned
tempest
Fix Released
Undecided
Unassigned

Bug Description

Hi,

I got this weird Tempest run here: http://logs.openstack.org/35/206935/1/gate/gate-tempest-dsvm-neutron-full/ef5a1a9/console.html.gz#_2015-08-18_18_20_26_980 where the server has "OS-EXT-STS:vm_state": "active" and "OS-EXT-STS:power_state": 3.

Power_state 3 is "PAUSED" according to nova/compute/power_state.py So it seems there"s a kind of mismatch with the vm_state being "active".

The discrepancy last only a fraction of second, but as Tempest is hitting hard on Nova, Tempest got into this intermediary state and my build was marked as "failed".

Could someone confirm that right after the VM transitioned from BUILD/spawning to ACTIVE/None there could be a window of inconsistency ?

Tags: compute
Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

Adding tempest because it prevented https://review.openstack.org/#/c/206935/ to be smoothly merged. I am going to submit a match to relax the check we do on the "EXT-STS:power_state" and "updated" fields.

Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

So Tempest expected EXT-STS:power_state = 3 because that's what it got right after https://github.com/openstack/tempest/blob/23e642498e50f4527847fe1cdf5c46c05454d955/tempest/scenario/test_minimum_basic.py#L55 but then Nova updated (asynchronously ?) the server to put the power_state to 1 (and update the "update" field). So when, Tempest get the server again (https://github.com/openstack/tempest/blob/23e642498e50f4527847fe1cdf5c46c05454d955/tempest/scenario/test_minimum_basic.py#L65) there's the mismatch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/214562

tags: added: compute
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master)

Reviewed: https://review.openstack.org/214562
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=5d367151817bd612117707e86f099a8f52e646f4
Submitter: Jenkins
Branch: master

commit 5d367151817bd612117707e86f099a8f52e646f4
Author: Jordan Pittier <email address hidden>
Date: Wed Aug 19 12:03:49 2015 +0200

    scenario/test_minimum_basic: relax a MatchesDictExceptForKeys check

    In the test_minimum_basic scenario, we check that a server that was
    created a moment ago is the same as the server we just got. To do
    that each fields of the expected vs actual servers are compared including
    the "update" field. But the "update" could be updated asynchronously
    by Nova, without Tempest knowing, causing a mismatchError.

    This patch exclude the 'update' and 'OS-EXT-STS:power_state' (that
    can be updated in background) fields from the comparison.

    Change-Id: I50d1319b690453923b470733e94f3a11fd1cd249
    Related-Bug: #1486475

Revision history for this message
melanie witt (melwitt) wrote :

FWIW, in nova, I don't know if it's expected for an instance to go from power state 0 (None) to 3 (Paused) to 1 (Running) during provisioning. Power state information comes from the virt driver and there's a periodic sync task in compute manager that attempts to make vm_state and power_state match if there's a discrepancy. The periodic task runs by default every 10 minutes (and can be disabled). Note that there's no sync attempted when vm_state is "building" -- that case is ignored. When vm_state is "active", the "paused" power state is ignored because of [1].

So even if the sync periodic task fired while you were waiting, it wouldn't have done anything to the instance. So, I think the question is whether it's "normal" for the virt driver to report power state as "paused" during the domain bringup.

[1] https://bugs.launchpad.net/nova/+bug/1097806

Revision history for this message
Augustina Ragwitz (auggy) wrote :

Marking this as confirmed as the behavior is clearly happening, but this needs further triaging per melwitt's comments above of whether this is a realistic expectation.

Changed in nova:
status: New → In Progress
status: In Progress → Confirmed
status: Confirmed → New
status: New → Confirmed
Changed in tempest:
status: New → Fix Committed
Revision history for this message
John Garbutt (johngarbutt) wrote :

So during VM create the libvirt driver always starts the instance paused, then turns active after the neutron event comes in. That is expected.

sounds like the power state sync races the create somehow, we do have instance locks in there to try and prevent that and power state sync usually skips instance that are busy doing some task.

Changed in nova:
importance: Undecided → Low
Changed in tempest:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers