resource metadata not reflecting actual state of resource

Bug #1208547 reported by Thomas Maddox
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
Thomas Maddox

Bug Description

I ran into this specifically when spinning up new instances, I found that as the instance was well into the active state from the horizon perspective, the resource-show on that instance was showing 'scheduling' as the state, which is incorrect. It even persisted that way after a delete on the same instance.

Related bugs:

1. Glance metadata mismatch: https://bugs.launchpad.net/ceilometer/+bug/1201701

2. Too low precision on timestamps: https://bugs.launchpad.net/ceilometer/+bug/1215676

Here's Ceilometer client output for an instance where this was happening
http://paste.openstack.org/show/42963/

The thinking is that we are having the metadata overwritten because we don't check the timestamp to prevent overwriting the current state with an old message's metadata.

If message 2 arrives at CM before message 1 because it ended up on a faster route and AMQP doesn't guarantee order, then message 1 will overwrite the metadata from message 2 and we record an incorrect state.

description: updated
description: updated
description: updated
Changed in ceilometer:
assignee: nobody → Thomas Maddox (thomas-maddox)
Julien Danjou (jdanjou)
Changed in ceilometer:
status: New → Triaged
importance: Undecided → Medium
milestone: none → havana-3
Changed in ceilometer:
status: Triaged → In Progress
tags: added: grizzly-backport-potential
description: updated
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Does nova emit notifications when the state of the VM changes?

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Yes. Nova emits notifications when it does any update on a VM. So, creates, deletes, power offs, resizes, rebuilds, etc. all emit notifications.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

We get the notifications for creating the VM (create.start and create.end). The question is whether we get notifications for the state of the VM changing while it is still being created.

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Yes, but it depends on your Nova configuration:

'To enable notifications when the VM state changes, set configuration flag "notify_on_state_change" to "vm_state".' (https://wiki.openstack.org/wiki/SystemUsageData)

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Check out the entry for 'compute.instance.update'.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Ah, OK, so maybe this is just an issue with our documentation not explaining all of the settings the deployer may want to configure.

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

I'm not sure I follow. Which setting(s) would you suggest deployers have?

The problem seems to be a byproduct of how MySQL decided to handle retrieving non-aggregated values from an aggregated query. SQLite was returning the last value when I used an ORDER BY before the grouping, MySQL was returning the first. It sounds like PostgreSQL would say it's an invalid query, though I haven't tested; that was the first thing some other teammates said immediately when I asked for some help. This still seems like a bug to me. =\

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

It was failing in Mongo because it was using $first on a query that was just ordering by project_id and user_id and not timestamp, AFAICT.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Based on your comment about the notify_on_state_change , I thought the data wasn't there at all because nova wasn't configured to send it. Does our documentation include instructions for setting notify_on_state_change?

If that's on and the queries still return the wrong values, then we need to fix the code as well (obviously).

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Thanks for the clarification, I gotcha now. =]

Anyone using devstack should have this already set for them, if they go with the default (this is what I'm using): https://github.com/openstack-dev/devstack/blob/master/lib/nova#L502-L507. I haven't tried without this configuration, though I probably will just to be thorough.

I wasn't able to find it in the CM documentation, so it'd be good to add a bit about it.

Revision history for this message
Julien Danjou (jdanjou) wrote :
Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Aha! Sorry, I was looking in the wrong spot. Thanks, Julien!

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

HBase is currently overwriting historical metadata, so with that driver we can't get the latest resource state within a time period without addressing the schema issue there: https://bugs.launchpad.net/ceilometer/+bug/1217412

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/44277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/45008

Changed in ceilometer:
assignee: Thomas Maddox (thomas-maddox) → Tong Li (litong01)
Changed in ceilometer:
assignee: Tong Li (litong01) → Thomas Maddox (thomas-maddox)
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: havana-3 → havana-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/44277
Committed: http://github.com/openstack/ceilometer/commit/64f17d6552cde85e87ba394815fd27ceeebdb103
Submitter: Jenkins
Branch: master

commit 64f17d6552cde85e87ba394815fd27ceeebdb103
Author: Thomas Maddox <email address hidden>
Date: Tue Aug 20 18:10:17 2013 +0000

    Fix to return latest resource metadata

    Addresses the latest resource metadata not being returned in
    the MongoDB, SQLAlchemy, DB2, and HBase drivers. A schema
    change was required for HBase, because it was overwriting
    historical metadata.

    Closes-Bug: #1208547
    Related-Bug: #1201701
    Implements: blueprint hbase-meter-table-enhancement
    Change-Id: Ib09e21cbc7bbd45a6ecc321403e9947df837e14b

Changed in ceilometer:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
Revision history for this message
Alan Pevec (apevec) wrote :

Not sure why, but Grizzly backport was done in separate bug 1229395

tags: removed: grizzly-backport-potential
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: havana-rc1 → 2013.2
Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

@Alan: Sorry, yeah it was a misunderstanding, I think. I'm relatively new to this, so I asked someone (I can't remember who) where how I should track the stable bug and they told me to open a new bug against Grizzly. I probably didn't do a very good job of explaining what was up, and I think I tagged this one too late.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.